IVQR

# Quantile Models with Endogeneity

V. Chernozhukov and C. Hansen
First version: September 2011, this version July 11, 2019. We would like to thank the editor, Isaiah Andrews, Denis Chetverikov, and Ye Luo for excellent comments and much help.
###### Abstract.

In this article, we review quantile models with endogeneity. We focus on models that achieve identification through the use of instrumental variables and discuss conditions under which partial and point identification are obtained. We discuss key conditions, which include monotonicity and full-rank-type conditions, in detail. In providing this review, we update the identification results of ?. We illustrate the modeling assumptions through economically motivated examples. We also briefly review the literature on estimation and inference.

Key Words: identification, treatment effects, structural models, instrumental variables

## 1. Introduction

Quantile regression is a tool for estimating conditional quantile models that has been used in many empirical studies and has been studied extensively in theoretical econometrics; see ? and ?. One of quantile regression’s most appealing features is its ability to estimate quantile-specific effects that describe the impact of covariates not only on the center but also on the tails of the conditional outcome distribution. While the central effects, such as the mean effect obtained through conditional mean regression, provide interesting summary statistics of the impact of a covariate, they fail to describe the full distributional impact unless the conditioning variables affect the central and the tail quantiles in the same way. In addition, researchers are interested in the impact of covariates on points other than the center of the conditional distribution in many cases. For example, in a study of the effectiveness of a job training program, the effect of training on the lower tail of the earnings distribution conditional on worker characteristics may be of more interest than the effect of training on the mean of the distribution.

In observational studies, the variables of interest (e.g. education or prices) are often endogenous. Just as with the conventional linear model, endogeneity of covariates renders the conventional quantile regression inconsistent for estimating the causal (structural) effects of covariates on the quantiles of economic outcomes. One approach to addressing this problem is to generalize the instrumental variables framework to allow for estimation of quantile models. In this paper, we review developments in instrumental variables approaches to modeling and estimating quantile treatment (structural) effects (QTE) in the presence of endogeneity.

We focus our review on the modeling framework of ? which provides conditions for identification of the QTE without functional form assumptions. The principal identifying assumption of the model is the imposition of conditions which restrict how rank variables (structural errors) may vary across treatment states. These conditions allow the use of instrumental variables to overcome the endogeneity problem and recover the true QTE. This framework also ties naturally to simultaneous equations models, corresponding to a structural simultaneous equation model with non-additive errors. Within this framework, estimation and inference procedures for linear quantile models have been developed by ?, ?, ?, and ?; nonparametric estimation has been considered by ?, ?, and ?; and inference with discrete outcomes has been explored by ?. Moreover, the modeling framework provides a foundation for other estimation methods based on IV median-independence and more general quantile-independence conditions as in ?, ?, ?, ?, ?, and ?. It is also important to note that the modeling framework we review can be used to study nonparametric identification of structural economic models in cases where quantile effects are not necessarily the chief objects of interest. ? provide an excellent example of this in the context of discrete choice models with endogeneity.

We also briefly review other modeling approaches for quantile effects with endogenous covariates. ? consider a QTE model for the sub-population of “compliers” which applies to binary endogenous variables with binary instruments. ?, ?, ?, and ? use models with triangular structures and show how control functions can be constructed and used to estimate structural objects of interest. While these models share some features with the model of ?, the three approaches are non-nested in general.

Quantile models with endogeneity have been used in many empirical studies in economics. See ?; ?; ?; ?; ?; ?; ?; ?; ?; ?; and ? among others. We do not provide a review of empirical applications but note these papers provide further discussion of how the instrumental variables quantile model relates to their specific framework and illustrate some of the rich effects that one can estimate using quantile methods.

## 2. An IV Quantile Model

In this section, we present an instrumental variable model for quantile treatment effects (QTE), its main econometric implication, and the principal identification result.

### 2.1. Framework

Our model is developed within the conventional potential (latent) outcome framework, e.g. ?. Potential real-valued outcomes which vary among individuals or observational units are indexed against potential treatment states and denoted . The potential outcomes are latent because, given the selected treatment , the observed outcome for each individual or observational unit is only one component

 Y:=YD

of the potential outcomes vector . Throughout the paper, capital letters denote random variables, and lower case letters denote the potential values they may take. We do not explicitly state various technical measurability assumptions as these can be deduced from the context.111 For simplicity, we could assume that takes on a countable set of values or make separability assumptions which imply that the stochastic process is defined from its definition over a countable subset . See ?.

The objective of causal or structural analysis is to learn about features of the distributions of potential outcomes . Of primary interest to us are the -th quantiles of potential outcomes under various treatments , conditional on observed characteristics , denoted as

 q(d,x,τ).

We will refer to the function as the quantile treatment response (QTR) function. We are also interested in the quantile treatment effects (QTE), defined as

 q(d1,x,τ)−q(d0,x,τ),

that summarize the differences in the impact of treatments on the quantiles of potential outcomes (?, ?).

Typically, the realized treatment is selected in relation to potential outcomes, inducing endogeneity. This endogeneity makes the conventional quantile regression of observed on observed , which relies upon the restriction

 P[Y⩽θ(D,X,τ)|X,D]=τ  a.s.,

inappropriate for measuring and the QTE. Indeed the function solving these equations will not be equal to under endogeneity. The model presented next states conditions under which we can identify and estimate the quantiles of latent outcomes through the use of instruments that affect but are independent of potential outcomes and the nonlinear quantile-type conditional moment restrictions

 P[Y⩽q(D,X,τ)|X,Z]=τ  a.s.

### 2.2. The Instrumental Quantile Treatment Effects (IVQT) Model.

Having conditioned on the observed characteristics , each latent outcome can be related to its quantile function as222This follows by Fisher-Skorohod representation of random variables which states that given a collection of variables , each variable can be represented as , for some , cf. ?, where denotes the -quantile of variable .

 Yd=q(d,x,Ud), where Ud∼U(0,1) (2.1)

is the structural error term. We note that representation (2.1) is essential to what follows.

The structural error is responsible for heterogeneity of potential outcomes among individuals with the same observed characteristics . This error term determines the relative ranking of observationally equivalent individuals in the distribution of potential outcomes given the individuals’ observed characteristics, and thus we refer to as the rank variable. Since drives differences in observationally equivalent individuals, one may think of as representing some unobserved characteristic, e.g. ability or proneness.333? uses the term proneness as in “prone to learn fast” or “prone to grow taller”. This interpretation makes quantile analysis an interesting tool for describing and learning the structure of heterogeneous treatment effects and accounting for unobserved heterogeneity; see ?, ?, and ?.

For example, consider a returns-to-training model, where ’s are potential earnings under different training levels , and is the conditional earnings function which describes how an individual having training , characteristics , and the latent “ability” is rewarded by the labor market. The earnings function may be different for different levels of , implying heterogeneous effects of training on earnings of people that have different levels of “ability”. For example, it may be that the largest returns to training accrue to those in the upper tail of the conditional distribution, that is, to the “high-ability” workers.444It is important to note that the quantile index, , in refers to the quantile of potential outcome given that exogenous variables are set at and not to the unconditional quantile of . For example, suppose that one of the control variables in the earnings example is years of schooling. An individual at the 30 percentile of the distribution of given say 20 years of schooling is not necessarily low income as even a relatively low earner with that level of education may still earn above the median earnings in the overall population.

Formally, the IVQT model consists of five conditions (some are representations) that hold jointly.

Main Conditions of the Model: Consider a common probability space and the set of potential outcome variables , the covariate variables , and the instrumental variables . The following conditions hold jointly with probability one:

• Potential Outcomes. Conditional on and for each , , where is non-decreasing on and left-continuous and .

• Independence. Conditional on and for each , is independent of instrumental variables .

• Selection. for some unknown function and random vector .

• Rank Similarity. Conditional on ,

• Observed random vector consists of   and

The following is the main econometric implication of the model.

###### Theorem 1 (Main Statistical Implication).

Suppose conditions A1-A5 hold. (i) Then we have for , with probability one,

 Y=q(D,X,U),  U∼U(0,1)|X,Z. (2.2)

(ii) If (2.2) holds and is strictly increasing for each , then for each , a.s

 P[Y⩽q(D,X,τ)|X,Z]=τ. (2.3)

(iii) If (2.2) holds, then for any closed subset of , a.s.

 P(U∈I)⩽P[Y∈q(D,X,I)|X,Z], (2.4)

where is the image of under the mapping .

The first result states that the main consequence of A1-A5 is a simultaneous equation model (2.2) with non-separable error that is independent of , and normalized so that . The second result considers econometric implications when is strictly increasing, which requires that is non-atomic conditional on and . In this case, we obtain the conditional moment restriction (2.3). This implication follows from the first result and the fact that

 {Y⩽q(D,X,τ)} is equivalent to {U⩽τ},

when is strictly increasing in . The final result deals with the case where may have atoms conditional on and , e.g. when is a count or discrete response variable. The first two results were obtained in ?, and the third result is in the spirit of results given in ?; ?; and ?. The latter results are related to random set/optimal transport methods for identification analysis; see ?; ?; ?; and ?.

The model and the results of Theorem 1 are useful for two reasons. First, Theorem 1 serves as a means of identifying the QTE in a reasonably general heterogeneous effects model. Second, by demonstrating that the IVQT model leads to the conditional moment restrictions (2.3) and (2.4), Theorem 1 provides an economic and causal foundation for estimation based on these restrictions.

### 2.3. The Identification Regions.

The conditions presented above yield the following identification region for the structural quantile function . The identification region for the case of strictly increasing can be stated as the set of functions that satisfy the following relations, for all

 P[Y

This representation of the identification region is implicit. Nevertheless, statistical inference about can be based on (2.5) and can be carried out in practice using weak-identification robust inference as described in ?, ?, ?, ?, or ?. Under conditions that yield point identification, these regions collapse to a singleton, and the aforementioned weak-identification-robust inference procedures retain their validity.

The identification region for the case of weakly increasing can be stated as the set of functions that satisfy the following relations: For any closed subset of ,

 P(U∈I)⩽P[Y∈m(D,X,I)|X,Z] a.s., (2.6)

where is the image of under the mapping . The inference problem here falls in the class of conditional moment inequalities and approaches such as those described in ? or ?, for example, can be used. The sets to be checked could be reduced by determining approximate core-determining subsets; see ?, ?, ? for further discussion.

### 2.4. Discussion of the Model

Condition A1 imposes monotonicity on the structural function of interest which makes its relation to the QTR apparent. Condition A2 states that potential outcomes are independent of , given , which is a conventional independence restriction. Condition A3 is a convenient representation of a treatment selection mechanism, stated for the purposes of discussion. In A3, the unobserved random vector is responsible for the difference in treatment choices across observationally identical individuals. Dependendence between and is the source of endogeneity that makes the conventional exogeneity assumption break down. This failure leads to inconsistency of exogenous quantile methods for estimating the structural quantile function. Within the model outlined above, this breakdown is resolved through the use of instrumental variables.

The independence imposed in A2 and A3 is weaker than the commonly made assumption that both the disturbances in the outcome equation and the disturbances in the selection equation are jointly independent of the instrument ; e.g. ? and ?. The latter assumption may be violated when the instrument is measured with error as discussed in ? or the instrument is not assigned exogenously relative to the selection equation as in Example 2 in ?.

Condition A4 restricts the variation in ranks across potential outcomes and is key for identifying the QTR and associated QTE. Its simplest, though strongest, form is rank invariance, when ranks do not vary with potential treatment states :555Notice that under rank invariance, condition A3 is a pure representation, not a restriction, since nothing restricts the unobserved information component .

 Ud=U for each d∈D. (2.7)

For example, under rank invariance, people who are strong (highly ranked) earners without a training program () remain strong earners having done the training (). Indeed, the earnings of a person with characteristics and rank in the training state “0” is and in the state “1” is .666Rank invariance is used in many interesting models without endogeneity. See e.g. ?, ?, and ?. Thus, rank invariance implies that a common unobserved factor , such as innate ability, determines the ranking of a given person across treatment states.

Rank invariance implies that the potential outcomes are jointly degenerate which may be implausible on logical grounds, as pointed out by ?. Also, the rank variables may be determined by many unobserved factors. Thus, it is desirable to allow the rank to change across , reflecting some unobserved, asystematic variation. Rank similarity A4 achieves this property while managing to preserve the useful moment restriction (2.3).

Rank similarity A4 relaxes exact rank invariance by allowing asystematic deviations, “slippages” in the terminology of ?, in one’s rank away from some common level . Conditional on , which may enter disturbance in the selection equation, we have the following condition on the slippages777Conditioning is required to be on all components of in the selection equation A3.

 Ud−U  are identically distributed across d∈D. (2.8)

In this formulation, we implicitly assume that one selects the treatment without knowing the exact potential outcomes; i.e. one may know and even the distribution of slippages, but does not know the exact slippages . This assumption is consistent with many empirical situations where the exact latent outcomes are not known before receipt of treatment. We also note that conditioning on appropriate covariates may be important to achieve rank similarity.

In summary, rank similarity is an important restriction of the IVQT model that allows us to address endogeneity. This restriction is absent in conventional endogenous heterogeneous treatment effect models. However, similarity enables a more general selection mechanism, A3, and weaker independence conditions on instruments than often are assumed in nonseparable IV models. The main force of rank similarity and the other stated assumptions is the implied moment restriction (2.3) of Theorem 1, which is useful for identification and estimation of the quantile treatment effects.

### 2.5. Examples

We present some examples that highlight the nature of the model, its strengths, and its limitations.

###### Example 1 (Demand with Non-Separable Error).

The following is a generalization of the classic supply-demand example. Consider the model

 Yp=q(p,U),~Yp=ρ(p,z,U),P ∈{p:ρ(p,Z,U)=q(p,U)}, (2.9)

where functions and are increasing in the last argument. The function is the random demand function, and is the random supply function. Additionally, functions and may depend on covariates , but this dependence is suppressed.

Random variable is the level of demand and describes the demand curve at different states of the world. Demand is maximal when and minimal when , holding fixed. Note that we imposed rank invariance (2.7), as is typical in classic supply-demand models, by making invariant to .

Model (2.9) incorporates traditional additive error models for demand which have where . The model is much more general in that the price can affect the entire distribution of the demand curve, while in traditional models it only affects the location of the distribution of the demand curve.

The -quantile of the demand curve is given by Thus, the curve lies below the curve with probability . Therefore, the various quantiles of the potential outcomes play an important role in describing the distribution and heterogeneity of the stochastic demand curve. The quantile treatment effect may be characterized by or by an elasticity For example, consider the Cobb-Douglas model which corresponds to a Cobb-Douglas model for demand with non-separable error The log transformation gives and the quantile treatment effect for the log-demand equation is given by the elasticity of the original -demand curve

The elasticity is random and depends on the state of the demand and may vary considerably with . For example, this variation could arise when the number of buyers varies and aggregation induces a non-constant elasticity across the demand levels. ? estimate a simple demand model based on data from a New York fish market that was first collected and used by ?. They find point estimates of the demand elasticity, , that vary quite substantially from for low quantiles to for high quantiles of the demand curve.

The third condition in (3.3), , is the equilibrium condition that generates endogeneity; the selection of the clearing price by the market depends on the potential demand and supply outcomes. As a result we have a representation that is consistent with A3, where consists of and and may include ”sunspot” variables if the equilibrium price is not unique. Thus what we observe can be written as

 Y:=q(P,U),  P:=δ(Z,V),  U is independent of Z. (2.10)

Identification of the -quantile of the demand function, is obtained through the use of instrumental variables , like weather conditions or factor prices, that shift the supply curve and do not affect the level of the demand curve, , so that independence assumption A2 is met. Furthermore, the IVQT model allows arbitrary correlation between and . This property is important as it allows, for example, to be measured with error or to be exogenous relative to the demand equation but endogeneous relative to the supply equation.

###### Example 2 (Savings).

? use the framework of the IVQT model to examine the effects of participating in a 401(k) plan on an individual’s accumulated wealth. Since wealth is continuous, wealth, , in the participation state can be represented as

 Yd=q(d,X,Ud),  Ud∼U(0,1)

where is the conditional quantile function of and is an unobserved random variable. is an unobservable that drives differences in accumulated wealth conditional on under participation state . Thus, one might think of as the preference for saving and interpret the quantile index as indexing rank in the preference for saving distribution. One could also model the individual as selecting the 401(k) participation state to maximize expected utility:

 (2.11)

where is the random indirect utility derived under participation state .888It may depend on both observables in as well as realized and unrealized unobservables. Only dependence on and is highlighted. As a result, the participation decision is represented by

 D=δ(Z,X,V),

where and are observed, is an unobserved information component that may be related to ranks and includes other unobserved variables that affect the participation state, and function is unknown. This model fits into the IVQT model with the independence condition A2 requiring that is independent of , conditional on .

The simplest form of rank similarity is rank invariance (2.7), under which the preference for saving vector may be collapsed to a single random variable In this case, a single preference for saving is responsible for an individual’s ranking across all treatment states. The rank similarity condition A4 is a more general form of rank invariance. It relaxes the exact invariance of ranks across by allowing noisy, unsystematic variations of across , conditional on . This relaxation allows for variation in rank across the treatment states, requiring only an “expectational rank invariance.” Similarity implies that given the information in employed to make the selection of treatment , the expectation of any function of rank does not vary across the treatment states. That is, ex-ante, conditional on , the ranks may be considered to be the same across potential treatments, but the realized, ex-post, rank may be different across treatment states.

From an econometric perspective, the similarity assumption is nothing but a restriction on the unobserved heterogeneity component which precludes systematic variation of across the treatment states. To be more concrete, consider the following simple example where

 Ud=FV+ηd(V+ηd),

where is the distribution function of and are mutually iid conditional on , , and . The variable represents an individual’s “mean” saving preference, while is a noisy adjustment.999Clearly similarity holds in this case, given , , and . This more general assumption leaves the individual optimization problem (2.11) unaffected, while allowing variation in an individual’s rank across different potential outcomes.

While we feel that similarity may be a reasonable assumption in many contexts, imposing similarity is not innocuous. In the context of 401(k) participation, matching practices of employers could jeopardize the validity of the similarity assumption. To be more concrete, let as before but let for random variable that depends on the match rate and is independent of , , and . Then conditional on , , and , is degenerate but is not. Therefore, is not equal to in distribution. Similarity may still hold in the presence of the employer match if the rank, , in the asset distribution is insensitive to the match rate. The rank may be insensitive if, for example, individuals follow simple rules of thumb such as target saving when they make their savings decisions. Also, if the variation of match rates is small relative to the variation of individual heterogeneity or if the covariates capture most of the variation in match rates, then similarity may be satisfied approximately.

###### Example 3 (Discrete Choice Model with Market-Level Data).

? show that a general model for market-level data realized from a discrete-choice problem can fit within the IVQT model. To keep notation and exposition simple, we consider a much-simplified version of the model from ? in which consumer ’s indirect utility from choosing product is

 Uijt=u(Xjt,Pjt,ξjt,Vijt)=u(δj(Xjt,ξjt),Pjt,Vijt),

where indexes markets, are observed exogenous product-market characteristics, is the observed price of product in market which is treated as endogenous, are product-market specific unobservables, and are individual-product-market specific unobservables that have density . Thus, the model imposes that unobserved product-market specific effects and observed variables may only affect utility through the index , where may differ arbitrarily across products but is the same across all markets. That unobserved product characteristics affect utility only through a scalar index is a substantive restriction but is common in the literature on discrete choice models where, for example, one can interpret the index as an aggregate representing product quality.

An individual will then choose the product that maximizes individual utility. Letting denote the observed choice of individual , we have that

 Yit=argmaxj≤JUijt,

where we assume the same products are available in each market for simplicity.101010Obviously, identification of the model requires normalizations. For example, the utility from one of the options is generally normalized to zero. As this model is not the focus of this review, we do not discuss these normalizations which are discussed in detail in a more general context in ?. The market share of each product will then be given as

 Sjt =∫1{u(δjt,Pjt,v)=maxk⩽Ju(δkt,Pkt,v)}f(v)dv :=sj({δjt,Pjt}Jj=1)=sj(δt,Pt),

where and .

To fit this model into the instrumental variables quantile regression model, ? make several assumptions to produce a structural relationship which is monotonic in a scalar unobservable. First, they assume that the utility function is strictly increasing in . This assumption is standard in the discrete choice literature and coincides with the interpretation of as product quality where higher quality products are associated with higher utility all else equal. Monotonicity of the utility function is not sufficient due to the fact that all that is observed is the market share which depends on the utility of each potential choice. Thus, ? make an additional assumption that they term “connected substitutes.” Intuitively, this condition implies that an increase in the quality of every good within some strict subset of the available choices will be associated with the total market share of all goods not in the subset decreasing as long as the quality of no good outside of the subset increases. ? show that the connected substitutes condition is satisfied in usual random utility discrete choice models and that it can hold fairly generally. Using these assumptions, ? use a result from ? which shows that the system of equations

 Sjt=sj(δt,Pt)

has a unique solution for the vector as long as all goods present in equilibrium have positive market shares. Thus, we may write

 δjt=gj(St,Pt) (2.12)

for some function where .

From (2.12), we have that . To complete the argument, ? assume that the function is strictly increasing in its second argument, , which represents unobserved product attributes. This condition rules out the case where can represent attributes that would increase utility for some individuals but decrease utility for others and again corresponds to the notion that represents unobserved product quality in which an increase unambiguously makes the product more desirable. With the assumed monotonicity in the function , one obtains

 ξjt=δ−1j(gj(St,Pt);Xjt)=hj(St,Pt,Xjt).

It is also clear that is strictly increasing in , which is proven in Lemma 5 of ?, from which it follows that

 Sjt=qj(S−jt,Pt,Xjt,ξjt),

where denotes the set of market shares for each product in market excluding product and is an unknown function that is strictly increasing in . Then, can be taken as the structural function in the instrumental variables quantile model after the normalization that follows a , assuming that has an atomless distribution. The model is then completed by assuming the existence of instruments, , that are independent of conditional on and are related to the endogenous variables through for some function and unobservables . Finally, note that the model assumes rank invariance in its construction.

## 3. The Identifying Power of IV Quantile Restrictions

The purpose of this section is to examine the identifying power of conditional moment restrictions (2.3). Specifically, we give various conditions for point identification in this section, summarizing and updating some of the results known in the literature. We remark here that point identification is not required in applications in principle as there exist inference methods that apply without point identification. However, it is useful to know and understand conditions under which moment conditions are informative enough that the identification region shrinks to a single point; in such cases the inference methods will also produce very informative confidence sets. We present point-identifying conditions first for the binary case, and , and then consider the case of taking a finite number of values, and finally consider the continuous case.

### 3.1. Conditions for point identification in the binary case

Here we consider the cases where and . The following analysis is all conditional on and for a given quantile , but we suppress this dependence for ease of notation. Under the conditions of Theorem 1, we know that there is at least one function that solves The function can be equivalently represented by a vector of its values . Therefore, for vectors of the form , we have a vector of moment equations

 Π(y):=( P[Y⩽yD|Z=0]−τ, P[Y⩽yD|Z=1]−τ )′ (3.13)

where . We say that is identified in some parameter space, , if is the only solution to among all .

We require that the Jacobian of with respect to exists and that it takes the form

 ∂Π(y) := [fY(y0|D=0,Z=0)P[D=0|Z=0]fY(y1|D=1,Z=0)P[D=1|Z=0]fY(y0|D=0,Z=1)P[D=0|Z=1]fY(y1|D=1,Z=1)P[D=1|Z=1]] (3.14) =: [fY,D(y0,0|Z=0)fY,D(y1,1|Z=0)fY,D(y0,0|Z=1)fY,D(y1,1|Z=1)].

For local identification, we take as an open neighborhood of . For global identification, we shall use some definitions from Mas-Collell to define . In what follows, for every proper (non-null) subspace , let denote the perpendicular projection map. A convex, compact polytope is a bounded convex set formed by an intersection of a finite number of closed half-spaces. Such a polytope is of full dimension in if it has a non-empty interior in . A face of a polytope is the intersection of any supporting hyperplane of with , so that faces of a polytope necessarily include the polytope itself. For instance, a rectangle in has one 2-dimensional face given by itself, four 1-dimensional faces given by its edges, and four 0-dimensional faces gives by its vertices. A subspace spanned by a non-empty face of is the translation to the origin of the minimal affine space containing that face.

###### Theorem 2 (Identification by Full Rank Conditions).

Suppose that , the support of is and the support of is . Assume that the conditional density exists for each and . (i) (Local) Suppose the Jacobian given by (3.14) is continuous and has full rank at , then the -quantiles of potential outcomes, , are identified in the region given by a sufficiently small open neighborhood of in . (ii) (Global) Assume that region contains and can be covered by a finite number of compact convex 2-dimensional polytopes , each containing . Assume that for each , is a Jacobian of , and that, possibly after rearranging the rows of , for each and each subspace spanned by a face of that includes , the linear map

 projL∘∂Π(y):L↦L

has a positive determinant. Then is identified in .

The first result is a simple local identification condition of the type considered in ? which we provide to fix ideas. The second result is a global identification condition which extends the result in ? by allowing non-rectangular sets . This result is based on the global univalence theorems of ?. As explained below, the positive determinant condition requires the impact of instrument on the joint distribution of to be sufficiently rich. In particular, the instrument should not be independent of the endogenous variable . We note that existence of the conditional density is only required for in the support of . Outside the support we can define the conditional density as 0, so the existence condition is not very restrictive. Moreover, the condition is formulated so that can take on relatively rich shapes that can carry useful economic restrictions. For instance, in the training context, a useful restriction on the parameters is that training weakly increases the potential earning quantiles. This restriction can be implemented by taking some natural parameter space and intersecting it with the half-space . Specifically, a cube intersected with the halfspace is an example of a region permitted by the global identification result (ii).

###### Comment 3.1 (Simple Sufficient Conditions).

To illustrate the conditions of the theorem, let us consider the parameter space as either , i.e. a cube centered at , or , i.e. intersection of a cube centered at with the halfspace . Consider the trivial covering of by itself, i.e. . Then the positive determinant condition of the theorem is implied by the following simple conditions:

 fY,D(y1,1|Z=1)fY,D(y0,0|Z=1)>fY,D(y1,1|Z=0)fY,D(y0,0|Z=0) for all y=(y0,y1) ∈L, (3.15)

and

 fY,D(y1,1|Z=1)>0,  fY,D(y0,0|Z=0)>0,   for all y=(y0,y1) ∈L. (3.16)

Alternatively, since we can rearrange the rows of , which corresponds to reordering elements of vector , the positive determinant condition of the theorem is implied by the following simple conditions:

 fY,D(y1,1|Z=1)fY,D(y0,0|Z=1)

and

 fY,D(y1,1|Z=0)>0,  fY,D(y0,0|Z=1)>0,   for all y=(y0,y1) ∈L. (3.18)

The proof that these are sufficient conditions is given in the appendix, and below we discuss the economic plausibility of these conditions.

###### Comment 3.2 (Plausibility of (3.15) and (3.16)).

The condition (3.16) seems quite mild, so we focus on (3.15). We can illustrate (3.15) by considering the problem of evaluating a training program where ’s are earnings, ’s are training states, and ’s are offers of training service. Condition (3.15) may be interpreted as a monotone likelihood ratio condition. That is, the instrument should have a monotonic impact on the likelihood ratio specified in (3.15). This monotonicity may be a weak condition in some contexts and a strong condition in others. For instance, if is a cube , then this condition may be considered relatively strong. On the other hand, if we impose monotonicity of the training impact on earning quantiles, so that , i.e. , then condition (3.15) would be trivially satisfied in many empirical settings. Indeed, it would suffice that the instrument , the offer of training services, increases the relative joint likelihood of receiving higher earnings and receiving the training service. In many instances, we also have ; e.g. those not offered training services do not receive that training. When , the right-hand side of (3.15) equals which makes the identification condition (3.15) satisfied trivially even for the less convenient parameter sets such as .

### 3.2. Identification with Multiple Points of Support

We generalize the result of Theorem 2 to more general discrete treatments with discrete instruments. Consider the case when has the support and has the support (). Note that function can be represented by a vector . Under the conditions of Theorem 1, there is at least one function that solves Therefore, for vectors of the form and the vector of moment equations

 Π(y)=(P[Y⩽yD|Z=z]−τ,  z=1,...,r)′, (3.19)

where , the model is identified if uniquely solves .

We define matrix as the matrix with element given by where and . We require this to be the Jacobian matrix of the map and impose full-rank-type conditions on submatrices of this Jacobian. To this end, let denote any permutation of distinct integers from , called -permutations, and be a collection of all such permutations. Let , which maps to , be a subvector of formed by selecting -th elements of according to their order in .111111Note that this formulation allows reordering elements of which may be needed to achieve the required positive determinant condition as discussed in the binary case. Let denote the corresponding Jacobian matrix of . The following theorem generalizes Theorem 2.

###### Theorem 3 (Identification for Discrete D).

Suppose , the support of is and of is . Assume that the conditional density exists for each , and . (i) (Local) Suppose the Jacobian defined above is continuous and has rank at . Then the -quantiles of potential outcomes, , are identified in the region given by a sufficiently small open neighborhood of in . (ii) (Global) Assume that region contains and can be covered by a finite number of compact convex -dimensional polytopes , each containing and having the following properties: For each there is an -permutation , such that is the Jacobian of , and for each and each subspace spanned by a face of that includes , the linear map

 projL∘∂Πm(j)(y):L↦L

has a positive determinant. Then is identified in .

We note that in the theorem existence of the conditional density is only required for in the support of . This density can be defined to take on an arbitrary value for outside the support. The first result is a simple local identification condition provided to fix ideas. The second result is a global identification condition based on Global Univalence Theorem 1 of ?. This result complements a similar result given in ? based on Global Univalence Theorem 2 of ?. The positive determinant condition requires the impact of instrument on the joint distribution of to be sufficiently rich.

###### Comment 3.3 (An Alternative Sufficient Condition).

Here we recall an alternative sufficient condition from ?, which is based on the Global Univalence Theorem 2 of ?. Assume that region contains and can be covered by a finite number of compact convex -dimensional sets , each containing and having the following properties: (i) For each , there is a permutation such that is Jacobian of ; (ii) for each ,

 det[∂Πm(j)(y)]>0;

(iii) possesses a -smooth boundary ; and (iv) for each , for each where is the subspace tangent to at point . Then is identified in . This condition seems to require slightly stronger conditions on the boundary than the condition used in Theorem 3. The advantage of the conditions from ? is that they more transparently convey the full-rank nature of the conditions imposed.

### 3.3. Identification with general D

Finally we consider conditions for point identification in the case of more general and that may take on a continuum of values. We let denote elements in the support of and denote elements in the support of . Without loss of much generality, we restrict attention to the case where both and have bounded support. We require the parameter space to be a collection of bounded (measurable) functions containing . We say that such that a.s. is identified in if for any other such that a.s., a.s. Below, we use to denote the norm.

###### Theorem 4 (Identification with General D).

Suppose that a.s. and both and have bounded support. Consider a parameter space which is a collection of bounded (measurable) functions containing . Assume that for the conditional density exists for each , a.s. (i) (Global) Suppose that for each with , a.s. and

 E[Δ(D)⋅ωΔ(D,Z)|Z]=0 a.s. ⇒Δ(D)=0 a.s. (3.20)

Then is identified in . (ii) (Local) Suppose that a.s. and for each with ,

 E[Δ(D)⋅ω0(D,Z)|Z]=0 a.s. ⇒Δ(D)=0 a.s., (3.21)

and, for some and ,

 ∥E[Δ(D)⋅{ωΔ(D,Z)−ω0(D,Z)}|Z]∥p,P⩽η∥E[Δ(D)⋅ω0(D,Z)|Z]∥p,P. (3.22)

Then is identified in .

Condition (i), mentioned in ?, states a non-linear bounded completeness condition for global identification. The condition (3.20) required is not primitive, but it highlights a useful link with the linear bounded completeness condition: used by ?. The latter condition is needed for identification in the mean IV model under the assumption of a bounded structural function . The latter condition is known to be quite weak, as shown in ?, and there are many primitive sufficient conditions that imply this condition. ? shows that linear completeness is generic under some conditions. Although condition (3.20) is not primitive, it is not vacuous either since the previous theorems provide primitive conditions for its validity. The local identification condition (ii), obtained by ?, provides yet another sufficient condition for condition (i). The result (ii) replaces the nonlinear completeness condition (3.20) by the linear completeness condition (3.21) which is easier to check. The result (ii) also implicitly requires that the set is a sufficiently small neighborhood of and that functional deviations and the conditional density are sufficiently smooth. This is explained in detail in ? where further primitive smoothness and completeness conditions are also provided.

## 4. Other Approaches to Quantile Models with Endogeneity

There are, of course, other sets of modeling assumptions that one could employ to build a quantile model with endogeneity. In this section, we briefly outline two other approaches that have been taken in the literature. The first, due to ?, extends the local average treatment effect (LATE) framework of ? to quantile treatment effects. The second, considered in ? and ?, uses a triangular structure to obtain identification.

### 4.1. Local Quantile Treatment Effects with Binary Treatment and Instrument

In fundamental work, ? develop an approach to estimating quantile treatment effects within the LATE framework of ? in the case where both the instrument and treatment variable are binary. The use of the LATE framework makes this approach appealing as many applied researchers are familiar with LATE and the conditions that allow identification and consistent estimation of this quantity. Importantly, the extension proceeds under exactly the same monotonicity requirement as needed for LATE.

Specifically, ? show that the QTE for a subpopulation is identified if

• (Independence) the instrument is independent of the potential outcome errors, , and the errors in the selection equation, ;

• (Monotonicity) where is the treatment state of an individual when and is defined similarly, holds;

• and other standard conditions are met.

The subpopulation for whom the QTE is identified is the set of “compliers,” those individuals with . In other words, the compliers are the set of individuals whose treatment is altered by switching the instrument from zero to one. Monotonicity is key in this framework. The monotonicity condition rules out “defiers,” individuals who would receive treatment in the absence of the intervention represented by the instrument but would not receive treatment if placed into the treatment group. The effects for individuals who would always receive treatment or never receive treatment regardless of the value of the instrument are unidentified.

Looking at these conditions, we see that the model of ? replaces the monotonicity assumption (A1), the independence assumption (A2), and the similarity assumption (A4) with a different type of monotonicity and a stronger independence assumption and identifies a different quantity: the QTE for compliers. The LATE-style approach has not yet been extended beyond cases with a binary treatment and a single binary instrument while the instrumental variable quantile model of ? applies to any endogenous variables and instruments. Note that neither set of conditions nests the other, and neither framework is more general than the other. Thus, the frameworks are best viewed as complements, providing two sets of conditions that can be considered when thinking about a strategy for estimating heterogeneous treatment effects.

Of course, the two sets of conditions may be mutually compatible. One such case is discussed in ?. In this example, the pattern of results obtained from the two estimators is quite similar, and the difference between the estimates appears small relative to sampling variation. Further exploration of these two approaches and their similarities and differences may be interesting to consider.

### 4.2. Instrumental Variables Quantile Regression in Triangular Systems

Another compelling framework is based on assuming a triangular structure as in ?. See also ?, ?, and ? for related models and results. The triangular model takes the form of a triangular system of equations

 Y =g(D,ϵ), D =h(Z,η),

where is the outcome, is a continuous scalar endogenous variable, is a vector of disturbances, is a vector of instruments with a continuous component, is a scalar reduced form error, and we ignore other covariates for simplicity. It is important to note that the triangular system generally rules out simultaneous equations which typically have that the reduced form relating to depends on a vector of disturbances. For example, in a supply and demand system, the reduced form for both price and quantity will generally depend on the unobservables from both the supply equation and the demand equation. Outside of being a scalar, the key conditions that allow identification of quantile effects in the triangular system are

• (Monotonicity) The function is strictly increasing in , and

• (Independence) and are independent conditional on for some observable or estimable .

The variable is thus the “control function” conditional on which changes in may be taken as causal. ? use , where represents the CDF of , as the control function and show that this variable satisfies the independence condition under the additional condition that is independent of . They show that one may use to identify under the assumed monotonicity of in . Using obtained in this first step, one may then construct the distribution of . Then integrating over the distribution of and using iterated expectations, one has

 ∫FY|D,V(y∣d,v)FV(dv) =∫1(g(d,ϵ)≤y)Fϵ(dϵ) =Pr(g(d,ϵ)≤y):=G(y,d).

It then follows that the quantile of is .

As with the framework of ?, the triangular model under the conditions given above is neither more nor less general than the model of ?. The key difference between the approaches is that ? uses an essentially unrestricted reduced form but requires monotonicity and a scalar disturbance in the structural equation. The triangular system on the other hand relies on monotonicity of the reduced form in a scalar disturbance. In addition, the triangular system, as developed in ?, requires a more stringent independence condition in that the instruments need to be independent of both the structural disturbances and the reduced form disturbance. That the approaches impose structure on different parts of the model makes them complementary with a researcher’s choice between the two being dictated by whether it is more natural to impose restrictions on the structural function or the reduced form in a given application.

The triangular model and the model of ? can be made compatible by imposing the conditions from the triangular model on the reduced form and the conditions from ? on the structural model. ? considers identification and estimation when both sets of conditions are imposed and shows that the requirements on the instruments may be substantially relaxed relative to ? or ? in this case.

## 5. Estimation and Inference

In the previous sections, we have outlined results that are useful for identifying quantile treatment effects and structural functions that are monotonic in a scalar unobservable. In the following, we briefly review the literature on estimation and inference. We focus on estimation of the model of ? presented in Section 2 using the moment conditions derived in Theorem 1. For estimation of the triangular model, see ? for nonparametric estimation and ? for a semiparametric approach. ? provides results for estimating the QTE for compliers within the LATE-style framework. Also, we only review approaches for estimating parametric quantile functions: for . ? and ? present nonparametric estimation and inference results for the IVQT model using condition (2.3).

There are two practical issues that make estimation and inference based on condition (2.3) challenging. The first is that the sample analog to condition (2.3) is non-smooth, and the GMM objective function that would be formed by using (2.3) as the moment conditions is also generically non-convex, even for linear quantile models. The second problem is that the model may suffer from weak identification as in the standard linear IV model; ? provides a useful introductory survey to weak identification and related inference methods in the linear IV model. In the quantile case, the problem of weak identification is more subtle than in the linear model in that some quantiles may be weakly identified while others may be strongly identified. The relevant object for defining the strength of identification of a given quantile is the covariance between and weighted by the conditional density function of the unobservable at the given quantile. See ? for a formal definition of this object and related discussion.

While the non-smoothness and non-convexity of the GMM criterion complicates optimization, it does not render the approach infeasible, especially when the dimension of and is not too large. ? considered this approach for estimating an income model and provides further discussion. One could also estimate the model parameters using the Markov Chain Monte Carlo (MCMC) approach of ?. This approach bypasses the need for optimization, instead relying on sampling and averaging to estimate model parameters. Note that this approach is not a cure-all since MCMC requires careful tuning in applications. It is also worth noting that standard samplers may perform poorly in even simple linear instrumental variables models when identification is not strong; see ?. In an approach related to optimizing the GMM criterion function directly, ? proposes estimating the parameters of an instrumental variables quantile model by optimizing a different non-smooth, non-convex criterion function.

To partially circumvent the numerical problems in optimizing the full GMM criterion, ? suggest a different procedure termed the inverse quantile regression for the linear quantile model . The basic intuition for the inverse quantile regression comes from the observation that if one knew the true value of the coefficient on , , the quantile regression of onto and would yield zero coefficients on the instruments . This observation allows one to effectively concentrate out of the problem and leaves a non-smooth, non-convex optimization problem over only the parameters . Since is low-dimensional in many applications, one can usually solve this optimization problem using highly robust optimization procedures such as a grid-search.

Algorithmically, the inverse quantile regression estimates for a given probability index of interest can be obtained as follows using a grid search over :

1. Define a suitable set of values , and estimate the coefficients and from the model by running the ordinary -quantile regression of on and . Call the estimated coefficients and .

2. Save the inverse of the variance-covariance matrix of , which is readily available in any common implementation of the ordinary QR. Denote this variance-covariance matrix . Form . Note is the Wald statistic for testing .

3. Choose as a value among that minimizes . The estimate of is then given by .

? and ? provide conditions under which the resulting estimator for and is consistent and asymptotically normal and provide a consistent variance estimator. ? provide a similar multi-step algorithm that circumvents the same numeric problems using the objective function of ?.

The good behavior of the asymptotic approximations obtained in ? and ? rely on strong identification of the model parameters just as in the linear IV case. Intuitively, strong identification for a quantile of interest requires that a particular density-weighted covariation matrix between and is not local to being rank deficient and that the impact of is rich enough to guarantee that the moment equations have a unique solution. The first condition is analogous to the usual full rank condition in linear IV analysis, and the second condition is required because of the nonlinearity of the problem. Checking these conditions in practice may be difficult, and it is therefore useful to have inference procedures that are robust to violations of these conditions.

Fortunately, there are several inference procedures that remain valid under weak identification. A nice feature of the algorithm defined for estimating above is that it produces a weak-identification-robust inference procedure naturally as a byproduct. ? show that the Wald statistic, converges in distribution to under the null that where we let denote the true value of without needing either of the conditions discussed in the preceding paragraph. Thus a valid confidence region for may be constructed as the set:

 {α:Wn(α,τ)⩽c1−p} (5.23)

where is such that , and the set is approximated numerically by considering ’s in the grid . ? show that confidence region in equation (5.23) is valid when the model parameters are strongly identified and remains valid when the model is weakly identified or even unidentified. ? provide a similar procedure and result for their procedure as well. ? provides yet a different approach to performing weak-identification-robust inference in models defined by conditions (2.3). Finally, ? show that one can form statistics for inference about the entire parameter vector that are condtionally pivotal in finite-samples for models defined by quantile restrictions such as (2.3). Since the statistics do not depend on unknown nuisance parameters in finite samples, the exact distributions of these statistics can be calculated and inference can proceed without relying on asymptotic approximations or statements about the strength of identification. The distributions produced in ? are not standard and so must be calculated by simulation.

## 6. Conclusion and Directions for Future Research

In this paper, we have reviewed approaches for building quantile models in the presence of endogeneity, focusing on conditions that can be used for identification. We have also briefly reviewed some of the practical issues that arise in estimation of instrumental variables quantile models and approaches to dealing with these issues. The models and estimation strategies outlined and cited in this review have already seen use in empirical economics where they have mostly been used for their ability to uncover interesting distributional effects. In this review, we have also noted that the identification strategy employed in this paper can be used to uncover structural objects even if quantile effects are not the chief objects of interest as in ?.

While the results reviewed in this paper are useful in a variety of contexts, there remain interesting areas for research in quantile models with endogeneity. In some applications, features of the conditional distribution are not the chief objects of interest and researchers are interested in effects of treatments on unconditional quantiles. Given the set of conditional quantiles, such unconditional effects may be uncovered. In recent work, ? propose a different approach, related to ?, to estimating structural effects of endogenous variables on unconditional quantiles directly. It would also be interesting to think about quantile-like quantities for multivariate outcomes with endogenous covariates. The results reviewed in this paper offer one possible approach for quantile modeling with endogeneity, but there remain many interesting directions and other approaches to be explored in further research.

## Appendix A Proofs

### a.1. Proof of Theorem 1

Conditioning on is suppressed. For almost every value of ,

 P[UD⩽τ|Z=z](