A Mathematical Appendix

# Causal Inference by Quantile Regression Kink Designs

## Abstract

The quantile regression kink design (QRKD) is proposed by empirical researchers as a potential method to assess heterogeneous treatment effects under suitable research designs, but its causal interpretation remains unknown. We propose a causal interpretation of the QRKD estimand. Under flexible heterogeneity and endogeneity, the QRKD estimand measures a weighted average of heterogeneous marginal effects at respective conditional quantiles of outcome given a designed kink point. In addition, we develop weak convergence results for the QRKD estimator as a local quantile process for the purpose of conducting statistical inference on heterogeneous treatment effects using the QRKD. Applying our methods to the Continuous Wage and Benefit History Project (CWBH) data, we find significantly heterogeneous positive causal effects of unemployment insurance benefits on unemployment durations in Louisiana between 1981 and 1983. These effects are larger for individuals with longer unemployment durations.

Keywords: causal inference, heterogeneous treatment effects, identification, regression kink design, quantile regression, unemployment duration.

## 1 Introduction

Some recent empirical research papers, including Nielsen, Sørensen and Taber (2010), Landais (2015), Simonsen, Skipper and Skipper (2015), Card, Lee, Pei and Weber (2016), and Dong (2016), conduct causal inference via the regression kink design (RKD). A natural extension of the RKD with a flavor of unobserved heterogeneity is the quantile RKD (QRKD), which is the object that we explore in this paper. Specifically, consider the quantile derivative Wald ratio of the form

 QRKD(τ)=limx↓x0∂∂xQY∣X(τ∣x)−limx↑x0∂∂xQY∣X(τ∣x)limx↓x0ddxb(x)−limx↑x0ddxb(x) (1.1)

at a design point of a running variable , where denotes the -th conditional quantile function of given , and is a policy function. Note that it is analogous to the RKD estimand of Card, Lee, Pei and Weber (2016):

 RKD=limx↓x0∂∂xE[Y∣X=x]−limx↑x0∂∂xE[Y∣X=x]limx↓x0ddxb(x)−limx↑x0ddxb(x), (1.2)

except that the conditional expectations in the numerator are replaced by the corresponding conditional quantiles. While the QRKD estimand (1.1) is of potential interest in the empirical literature for assessment of heterogeneous treatment effects, little seems known about its econometric theories. Specifically, Landais (2011) considers (1.1), but no formal theories of identification, estimation, and inference are provided. This paper develops causal interpretation (identification) and estimation theories for the QRKD estimand (1.1). In addition, we also present a practical guideline of robust inference by pivotal simulations, a procedure for bandwidth selection, and statistical testing of heterogeneous treatment effects based on the QRKD.

To understand our objective, consider a structural relation , where the outcome is determined by observed factors and unobserved factors . The marginal causal effect of on for individual with is quantified by , where denotes the partial derivative of with respect to the first argument. An estimand has a causal interpretation at if it admits

 θ=∫g1(b,x,ϵ)dμ(ϵ) (1.3)

for some probability measure whose support is contained in that of . The literature has proposed this way of causal interpretations for major statistical estimands. Examples include the OLS slope (Yitzhaki, 1996), the two stage least squares estimand under multivalued discrete treatments (Angrist and Imbens, 1995), an IV estimand under partial equilibrium (Angrist, Graddy and Imbens, 2000), a list of most common treatment effects (Heckman and Vytlacil, 2005), and the slope of the quantile regression (Kato and Sasaki, 2017). In a similar spirit, we argue in the present paper that the QRKD estimand (1.1) can be reconciled with the causal interpretation of the form (1.3).

Making causal interpretations of the QRKD estimand (1.1) in the form (1.3) is perhaps more challenging than the mean RKD estimand (1.2) because the differentiation operator and the conditional quantile do not ‘swap.’ For the mean RKD estimand (1.2), the interchangeability of the differentiation operator and the expectation (integration) operator allows each term of the numerator in (1.2) to be additively decomposed into two parts, namely the causal effects and the endogeneity effects. Taking the difference of two terms in the numerator then cancels out the endogeneity effects, leaving only the causal effects. This trick allows the mean RKD estimand (1.2) to have causal interpretations in the presence of endogeneity. Due to the lack of such interchangeability for the case of quatiles, this trick is not straightforwardly inherited by the quantile counterpart (1.1). Having said this, we show in Section 2 that a similar decomposition is possible for the QRKD estimand (1.1), and therefore argue that its causal interpretations are possible even under the lack of monotonicity. Specifically, we show that the QRKD estimand corresponds to the quantile marginal effect under monotonicity and to a weighted average of marginal effects under non-monotonicity.

For estimation of the causal effects, we propose a sample-counterpart estimator for the QRKD estimand (1.1) in Sections 3. To derive its asymptotic properties, we take advantage of the existing literature on uniform Bahadur representations for quantile-type loss functions, including Kong, Linton and Xia (2010), Guerre and Sabbah (2012), Sabbah (2014), and Qu and Yoon (2015a). Qu and Yoon (2015b) apply the results of Qu and Yoon (2015a) to develop methods of statistical inference with quantile regression discontinuity designs (QRDD), which are closely related to our QRKD framework. We take a similar approach with suitable modifications to derive asymptotic properties of our QRKD estimator. Weak convergence results for the estimator as quantile processes are derived. Applying the weak convergence results, we propose procedures for testing treatment significance and treatment heterogeneity following Koenker and Xiao (2002), Chernozhukov and Fernández-Val (2005) and Qu and Yoon (2015b). Simulation studies presented in Section 4 support the theoretical properties.

Literature: The method studied in this paper falls in the broad framework of design-based causal inference, including RDD and RKD. There is an extensive body of literature on RDD by now – see a historical review by Cook (2008) and surveys in the special issue of Journal of Econometrics edited by Imbens and Lemieux (2008), Imbens and Wooldridge (2009; Sec. 6.4), Lee and Lemieux (2010), and Volume 38 of Advances in Econometrics edited by Cattaneo and Escanciano (2016), as well as the references cited therein. The first extension to quantile treatment effects in the RDD framework was made by Frandsen, Frölich and Melly (2012). More recently, Qu and Yoon (2015b) develop uniform inference methods with QRDD that empirical researchers can use to test a variety of important empirical questions on heterogeneous treatment effects. While the RDD has a rich set of empirical and theoretical results including the quantile extensions, the RKD method which developed more recently does not have a quantile counterpart in the literature yet, despite potential demands for it by empirical researchers (e.g., Landais, 2011). Our paper can be seen as a quantile extension to Card, Lee, Pei and Weber (2016) and a RKD counterpart of Qu and Yoon (2015b).

## 2 Causal Interpretation of the QRKD Estimand

In this section, we develop some causal interpretations of the QRKD estimand (1.1). For the purpose of illustration, we first present a simple case with rank invariance in Section 2.1. It is followed by a formal argument for general cases in Section 2.2.

### 2.1 Illustration: Causal Interpretation under Rank Invariance

The causal relation of interest is represented by the structural equation

 y=g(b,x,ϵ).

The outcome is determined through the structural function by two observed factors, and , and a scalar unobserved factor, . We assume that is monotone increasing in , effectively imposing the rank invariance; causal interpretations in a more general setup with non-monotone and/or multivariate is established in Section 2.2. The factor is a treatment input, and is in turn determined by the running variable through the structural equation

 b=b(x)

for a known policy function . We say that has a kink at if is true, where and mean and , respectively. Throughout this paper, we assume that the location, , of the kink is known from a policy-based research design, as is the case with Card, Lee, Pei and Weber (2016).

###### Assumption 1.

holds, and is continuous on and differentiable on .

The structural partial effects are , and . In particular, a researcher is interested in which measures heterogeneous partial effects of the treatment intensity on an outcome . While the structural partial effect is of interest, it is not clear if the QRKD estimand (1.1) provides any information about . In this section, we argue that (1.1) does have a causal interpretation in the sense that it measures the structural causal effect at the -th conditional quantile of given .

Under regularity conditions (to be discussed in Section 2.2 in detail), some calculations yield the decomposition

 ∂∂xQY∣X(τ∣x)=g1(b(x),x,ϵ)⋅b′(x)+g2(b(x),x,ϵ)−∫ϵ−∞∂∂xfε∣X(e∣x)defε∣X(ϵ∣x)⋅g3(b(x),x,ϵ), (2.1)

where . The first term on the right-hand side is the partial effect of the running variable on the outcome through the policy function . The second term is the direct partial effect of the running variable on the outcome . The third term measures the effect of endogeneity in the running variable . We can see that this third term is zero under exogeneity, . In order to get the causal effect of interest through the QRKD estimand (1.1), therefore, we want to remove the last two terms in (2.1).

Suppose that the designed kink condition of Assumption 1 is true, but all the other functions, , , , and , in the right-hand side of (2.1) are continuous in at . Then, (2.1) yields

 ∂∂xQY∣X(τ∣x+0)−∂∂xQY∣X(τ∣x−0)b′(x+0)−b′(x−0)=g1(b(x0),x0,ϵ), (2.2)

showing that the QRKD estimand (1.1) measures the structural causal effect of on for the subpopulation of individuals at the -th conditional quantile of given . This section provides only an informal argument for ease of exposition, but Section 2.2 provides a formal mathematical argument under a general setup without the rank invariance assumption.

### 2.2 General Result: Causal Interpretation without Rank Invariance

In this section, we continue to use the basic settings from Section 2.1 except that the unobserved factors are now allowed to be -dimensional, as opposed to be a scalar, and that is now allowed to be non-monotone with respect to any coordinate of . As such, we can consider general structural functions without the rank invariance. In this case, there can exist multiple values of corresponding to a single conditional quantile of given , and therefore the simple identifying equality (2.2) for the case of rank invariance cannot be established in general. Furthermore, even fails to equal the average of the structural derivatives for those that coincide with the -th conditional quantile of given . Nonetheless, we argue that represents a weighted average of the structural derivatives for those that coincide with the -th conditional quantile of given .

Define the lower contour set of evaluated by below a given level of as follows:

 V(y,x)={ϵ∈RM|g(b(x),x,ϵ)≤y}.

Its boundary is denoted by . Furthermore, the velocities of the boundary at with respect to a change in and a change in are denoted by and , respectively. For a short hand notation, we write and . Under regularity conditions to be stated below, the implicit function theorem allows the velocities defined above to be explicitly written as and for all . Let denote an -dimensional rectangle, and we parameterize the manifold by for all . We refer to Padula (2011) for further details of these objects and notations. Let and denote the Lebesgue measure on and the Hausdorff measure1 on , respectively. Letting , we make the following assumptions.

###### Assumption 2.

(i) is continuously differentiable on for all and is continuously differentiable for all . (ii) on for all . (iii) The conditional distribution of given is absolutely continuous with respect to , is continuously differentiable, and is continuous.2 (iv) for all .

###### Assumption 3.

(i) For : is a finite set, and is locally invertible with a continuously differentiable local inverse function in a neighborhood of each point in . (ii) For : is continuous for all , and is continuous for all . is continuous for all .3 is continuous for all .4

###### Assumption 4.

Let . There exist and satisfying such that and hold for all .

###### Assumption 5.

(i) There exists such that and for all for all . (ii) There exists such that for all for all .

Assumptions 2, 3 and 4 are used to derive a structural decomposition of the quantile partial derivative – see Sasaki (2015) for detailed discussions of these assumptions. Assumption 3 branches into two cases, depending on (i) or (ii) . We note that case (i) accommodates a non-monotone structure in a scalar unobservable , whereas case (ii) concerns about non-monotonicity due to multi-dimensional unobservables . These two cases are stated separate because the restriction in case (ii) among others entails that is a connected set, which is too strong for case (i) with non-monotonicity. In Assumptions 2 (iv) and 4, statements concern about integration of on . This manifold has a Lebesgue measure zero, i.e., . On the other hand, the Hausdorff measure evaluates this Lebesgue null set positively, i.e., . Hence these assumptions are nontrivial statements.

The regularity conditions in Assumption 5 facilitate the dominated convergence theorem to make a structural sense of the QRKD estimand (1.1). Specifically, by the dominated convergence theorem, Assumption 5 (i) and (ii) together with Assumption 2 (iv) and 4 are sufficient for the existence of the reduced-form expressions and . With denoting the collection of Borel subsets of , we define the function by

 Extra open brace or missing close brace

Because the zero-dimensional Hausdorff measure is a counting measure, the case of yields

 μ0y,x({ϵ}):=1∥∇ϵh(x,ϵ)∥fε|X(ϵ|x)∑ϵ∈∂V(y,x)1∥∇ϵh(x,ϵ)∥fε|X(ϵ|x)for all ϵ∈∂V(y,x). (2.3)

The next theorem claims that this is a probability measure and gives weights with respect to which the QRKD estimand (1.1) measures the average structural causal effect of the treatment intensity on an outcome for those individuals at the -th conditional quantile of given .

###### Theorem 1.

Suppose that Assumptions 1, 2, 3, 4 and 5 hold. Let and . Then, is a probability measure on , and

 QRKD(τ) = ∫∂V(y,x0)g1(b(x0),x0,ϵ)dμM−1y,x0(ϵ) = EμM−1y,x0[g1(b(x0),x0,ε)]. (2.4)
###### Proof.

For the first part of the proof, we branch into two cases: (i) and (ii) .
(i) For : That is a probability measure on follows from (2.3) under Assumption 4. By Leibniz integral rule and the implicit function theorem under Assumptions 2, 3 (i) and 4, the QPD exists and

 ∂∂xQY∣X(τ∣x) =∑ϵ∈∂V(y,x)hx(x,ϵ)|hϵ(x,ϵ)|fε∣X(ϵ∣x)−∫V(y,x)∂∂xfε∣X(ϵ∣x)dϵ∑ϵ∈∂V(y,x)1|hϵ(x,ϵ)|fε∣X(ϵ∣x) =Eμ0y,x[hx(x,ε)]−A(y,x),

where is defined by

 A(y,x):=∫V(y,x)∂∂xfε∣X(ϵ∣x)dϵ∑ϵ∈∂V(y,x)1|hϵ(x,ϵ)|fε∣X(ϵ∣x)

(ii) For : That is a probability measure on follows from Lemma 2 of Sasaki (2015) under Assumption 4. Next, by Lemma 1 of Sasaki (2015) under Assumptions 2, 3 (ii) and 4, the QPD exists and

 ∂∂xQY∣X(τ∣x) =∫∂V(y,x)hx(x,ϵ)∥∇ϵh(x,ϵ)∥fε∣X(ϵ∣x)⋅Mπ(M−1)/22M−1Γ(M+12)dHM−1(ϵ)−∫V(y,x)∂∂xfε∣X(ϵ∣x)dmM(ϵ)∫∂V(y,x)1∥∇ϵh(x,ϵ)∥fε∣X(ϵ∣x)⋅Mπ(M−1)/22M−1Γ(M+12)dHM−1(ϵ) =EμM−1y,x[hx(x,ε)]−A(y,x),

where is the Gamma function and is defined by

 A(y,x):=∫V(y,x)∂∂xfε∣X(ϵ∣x)dmM(ϵ)∫∂V(y,x)1∥∇ϵh(x,ϵ)∥fε∣X(ϵ∣x)⋅Mπ(M−1)/22M−1Γ(M+12)dHM−1(ϵ)

From this point on, we treat both cases (i) and (ii) together. Note that is continuous in by Assumption 2 (i). Also, is continuous in for each fixed according to parts (i), (ii) and (iii) of Assumption 2. Furthermore, Assumption 2 (i), (ii), (iii) and (iv) imply that is well-defined and is continuous in for all . Therefore, applying the dominated convergence theorem under Assumptions 2 (iv), 4 and 5 yields

 limx→x+0∂∂xQY∣X(τ∣x) = limx→x+0∫∂V(y,x){hx(x,ϵ)}dμM−1y,x(ϵ)−limx→x+0A(y,x) = ∫∂V(y,x0)limx→x+0∂∂x{g(b(x),x,ϵ)}dμM−1y,x0(ϵ)−A(y,x0) = ∫∂V(y,x0)limx→x+0{g1(b(x),x,ϵ)b′(x)+g2(b(x),x,ϵ)}dμM−1y,x0(ϵ)−A(y,x0) = ∫{g1(b(x0),x0,ϵ)b′(x+0)+g2(b(x0),x0,ϵ)}dμM−1y,x0(ϵ)−A(y,x0)

Similarly, taking the limit from the left, we have

 limx→x−0∂∂xQY∣X(τ∣x)= ∫∂V(y,x0){g1(b(x0),x0,ϵ)b′(x−0)+g2(b(x0),x0,ϵ)}dμM−1y,x0(ϵ)−A(y,x0).

Taking the difference of the right and left limits eliminates , and thus produces

 limx→x+0∂∂xQY∣X(τ∣x)−limx→x−0∂∂xQY∣X(τ∣x) =[b′(x+0)−b′(x−0)]EμM−1y,x0[g1(b(x0),x0,ε)].

Finally, note that Assumption 1 has , and hence we can divide both sides of the above equality by . This gives the desired result. ∎

As is often the case in the treatment literature (e.g., Angrist and Imbens, 1995), this theorem shows a causal interpretation in terms of a weighted average. Specifically, (2.4) shows that the QRKD estimand (1.1) measures a weighted average of the heterogeneous causal effects displayed on the right-hand side of (2.4). Since the weights are positive on the support of the conditional distribution of given , the QRKD estimand is a strict convex combination of the ceteris paribus causal effects of on for those individuals at the -th conditional quantile of given .

The weights given in the definition of are proportional to Since is the conditional density of the unobservables given , the discrepancy between the weighted and unweighted averages is imputed to the denominator, . For example, larger weights are assigned to those locations of at which is smaller. In other words, the QRKD emphasizes those locations of at which the effects of unobservables on the structure are smaller in magnitude. On the other hand, the QRKD de-emphasizes those locations of at which the effects of unobservables on the structure are larger in magnitude.

One may worry about the obscurity of the causal interpretations under the ‘weighted’ averages. Note that the weighted average becomes an unweighted average when is constant in . There are some cases where the weight is constant. As an example which is often relevant to empirical practices, the polynomial random coefficient models of the form

 g(b,x,ϵ)=ϵ00+pb∑ν=1ϵν0bν+px∑ν=1ϵ0νxν+pb∑νb=1px∑νx=1ϵνbνxbνbxνx (2.5)

satisfies that is constant in . Therefore, we obtain the following unweighted average causal interpretation for the QRKD estimand under this model.

###### Corollary 1.

Suppose that the assumptions for Theorem 1 hold with (2.5). Let and . Then,

 QRKD(τ) = ∫∂V(y,x0)g1(b(x0),x0,ϵ)dμM−1y,x0(ϵ) = EμM−1y,x0[g1(b(x0),x0,ε)].

where

 μM−1y,x(S):=∫sfε|X(ϵ|x)dHM−1(ϵ)∫∂V(y,x)fε|X(ϵ|x)dHM−1(ϵ)for all S∈B(y,x).

When the unobservable is a scalar random variable (i.e., ), the Hausdorff measure becomes a counting measure on the zero-dimensional manifold . In that case, (2.4) may be rewritten as

 QRKD(τ) = ∑ϵ∈∂V(y,x0)g1(b(x0),x0,ϵ)⋅μM−1y,x0({ϵ}) = EμM−1y,x0[g1(b(x0),x0,ε)]. (2.6)

In particular, the case where is a singleton allows for the following straightforward causal interpretation for the QRKD estimand.

###### Corollary 2.

Suppose that the assumptions for Theorem 1 hold with (2.5). Let and . If is a sclar radom variable (i.e., ) and is a singleton, then

 QRKD(τ) = g1(b(x0),x0,ϵ(y,x0)),

where is the sole element of .

Note that this corollary is a generalization of (2.2), and admits the straightforward causal interpretation without requiring the ‘global’ monotonicity of in . To see the point in case, consider the structural function given by

 g(b,x,ϵ)=−9bϵ+13bϵ3−9xϵ+13xϵ3.

If , then this structure is not globally monotone in at . However, is a singleton (i.e., is locally monotone) for each value of , and hence the causal interpretation of Corollary 2 applies. On the other hand, for each value of , we can interpret the QRKD at most in terms of the weighted sum of the form (2.6).

In either of these cases, heterogeneity in values of the QRKD estimand across quantiles can be used as evidence for heterogeneity in treatment effects. Therefore, we can still conduct statistical inference for heterogeneous treatment effects based on the weak convergence results presented below in Section 3.

## 3 Estimation and Inference

### 3.1 The Estimator and Its Asymptotic Distribution

We propose to estimate the QRKD estimand (1.1) by its sample counterpart

 ˆQRKD(τ) = ^β+1(τ)−^β−1(τ)b′(x+0)−b′(x−0), (3.1)

where the two terms in the numerator are given by the -th order local polynomial quantile smoothers

 ^β+1(τ)= ι′2argmin(α,β+1,β−1,...,β+p,β−p)∈\mathdsR2p+1n∑i=1K(xi−x0hn,τ)ρτ(yi−α−p∑v=1(β+vd+i+β−vd−i)(xi−x0)vv!) ^β−1(τ)= ι′3argmin(α,β+1,β−1,...,β+p,β−p)∈\mathdsR2p+1n∑i=1K(xi−x0hn,τ)ρτ(yi−α−p∑v=1(β+vd+i+β−vd−i)(xi−x0)vv!)

for , where is a closed interval, is a kernel function, , , , and , for a fixed integer of polynomial order. Notice that we are imposing the constraint that conditional quantile function is continuous at like the estimator of Landais (2011). A researcher observing a sample of observations can compute (3.1) to estimate (1.1).

Our motivation to include the higher order terms in the local polynomial estimation is to implement a one-step bias correction for a local linear estimation that can accommodate optimal bandwidths – see Remark 7 in Calonico, Cattaneo and Titiunik (2014) and Remark S.A.7 in the supplementary appendix of Calonico, Cattaneo and Titiunik (2014). That is, this estimator can be considered as the one-step bias corrected version of the local linear quantile smoother ():

 argmin(α,β+1,β−1)∈\mathdsR3n∑i=1K(xi−x0hn,τ)ρτ(yi−α−(β+vd+i+β−vd−i)(xi−x0)).

In the remainder of this section, we obtain weak convergence results for the quantile processes of , which in turn yield a weak convergence result for the quantile process of the QRKD estimator of treatment effects. Using these results, we propose methods to test hypotheses concerning heterogeneous treatment effects in Section 3.2. Define the kernel-dependent constant matrix , where , and . We assume that there exist constants and such that the following conditions are satisfied.

###### Assumption 6.

(i) (a) The density function exists and is continuously differentiable in a neighborhood of and . (b) is an i.i.d. sample of observations of the bivariate random vector . (ii) (a) is Lipschitz on . (b) There exist finite constants , , and , such that lies between and for all , and (iii) (a) , , and exist and are Lipschitz continuous on . (b) is continuous at . For , exists and is Lipschitz continuous on and . (iv) The kernel is compactly supported, Lipschitz, differentiable, and satisfying ,