Heterogeneous Treatment Effects with Mismeasured Endogenous TreatmentFirst version: November, 2015. I would like to thank Federico A. Bugni, V. Joseph Hotz, Shakeeb Khan, and Matthew A. Masten for their guidance and encouragement. I am also grateful to Luis E. Candelaria, Xian Jiang, Marc Henry, Ju Hyun Kim, Arthur Lewbel, Jia Li, Arnaud Maurel, Marjorie B. McElroy, Ismael Mourifié, Naoki Wakamori, Yichong Zhang, and seminar participants at University of Tokyo, Singapore Management University, University of Oslo, UC-Irvine, UC-Davis, University of Western Ontario, University of Warwick, Duke, European Meeting of the Econometric Society, Asian Meeting of the Econometric Society, North American Summer Meeting of the Econometric Society, and Triangle Econometrics Conference.

Heterogeneous Treatment Effects with Mismeasured Endogenous Treatmentthanks: First version: November, 2015. I would like to thank Federico A. Bugni, V. Joseph Hotz, Shakeeb Khan, and Matthew A. Masten for their guidance and encouragement. I am also grateful to Luis E. Candelaria, Xian Jiang, Marc Henry, Ju Hyun Kim, Arthur Lewbel, Jia Li, Arnaud Maurel, Marjorie B. McElroy, Ismael Mourifié, Naoki Wakamori, Yichong Zhang, and seminar participants at University of Tokyo, Singapore Management University, University of Oslo, UC-Irvine, UC-Davis, University of Western Ontario, University of Warwick, Duke, European Meeting of the Econometric Society, Asian Meeting of the Econometric Society, North American Summer Meeting of the Econometric Society, and Triangle Econometrics Conference.

Takuya Ura Department of Economics, University of California, Davis, One Shields Avenue, Davis, CA 95616-5270; Email: takura@ucdavis.edu
Abstract

This paper studies the identifying power of an instrumental variable in the nonparametric heterogeneous treatment effect framework when a binary treatment is mismeasured and endogenous. Using a binary instrumental variable, I characterize the sharp identified set for the local average treatment effect under the exclusion restriction of an instrument and the deterministic monotonicity of the true treatment in the instrument. Even allowing for general measurement error (e.g., the measurement error is endogenous), it is still possible to obtain finite bounds on the local average treatment effect. Notably, the Wald estimand is an upper bound on the local average treatment effect, but it is not the sharp bound in general. I also provide a confidence interval for the local average treatment effect with uniformly asymptotically valid size control. Furthermore, I demonstrate that the identification strategy of this paper offers a new use of repeated measurements for tightening the identified set.

Keywords: Local average treatment effect; Instrumental variable; Nonclassical measurement error; Endogenous measurement error; Partial identification

1 Introduction

Treatment effect analyses often entail a measurement error problem as well as an endogeneity problem. For example, Black et al. (2003) document a substantial measurement error in educational attainments in the 1990 U.S. Census. At the same time, educational attainments are endogenous treatment variables in a return to schooling analysis, because unobserved individual ability affects both schooling decisions and wages (Card, 2001). The econometric literature, however, has offered only a few solutions for addressing the two problems at the same time. An instrumental variable is a standard technique for correcting endogeneity and measurement error (e.g., Angrist and Krueger, 2001), but, to the best of my knowledge, no existing research has explicitly investigated the identifying power of an instrumental variable for the heterogeneous treatment effect when the treatment is both mismeasured and endogenous.111Many existing methods, including Mahajan (2006) and Lewbel (2007), allow for the treatment effect to be heterogeneous due to observed variables. In this paper I focus on the heterogeneity due to unobserved variables by considering the local average treatment effect framework.

I consider a mismeasured treatment in the framework of Imbens and Angrist (1994) and Angrist et al. (1996), and focus on the local average treatment effect as a parameter of interest. My analysis studies the identifying power of a binary instrumental variable under the following two assumptions: (i) the instrument affects the outcome and the measured treatment only through the true treatment (the exclusion restriction of an instrument), and (ii) the instrument weakly increases the true treatment (the deterministic monotonicity of the true treatment in the instrument). These assumptions are an extension of Imbens and Angrist (1994) and Angrist et al. (1996) into the framework with mismeasured treatment. The local average treatment effect is the average treatment effect for the compliers, that is, the subpopulation whose true treatment status is strictly affected by an instrument. Focusing on the local average treatment effect is meaningful for a few reasons.222Deaton (2009) and Heckman and Urzúa (2010) are cautious about interpreting the local average treatment effect as a parameter of interest. See also Imbens (2010, 2014) for a discussion. First, the local average treatment effect has been a widely used parameter to investigate the heterogeneous treatment effect with endogeneity. My analysis offers a tool for a robustness check to those who have already investigated the local average treatment effect. Second, the local average treatment effect can be used to extrapolate to the average treatment effect or other parameters of interest. Imbens (2010) emphasize the utility of reporting the local average treatment effect in addition to the other parameters of interest, because the extrapolation often requires additional assumptions and can be less credible than the local average treatment effect.

The mismeasured treatment prevents the local average treatment effect from being point-identified. As in Imbens and Angrist (1994) and Angrist et al. (1996), the local average treatment effect is the ratio of the intent-to-treat effect over the size of compliers.333The intent-to-treat effect is defined as the mean difference of the outcome between the two groups defined by the instrument. The size of compliers is the probability of being a complier, and it is the mean difference of the true treatment (Imbens and Angrist, 1994 and Angrist et al., 1996). Since the measured treatment is not the true treatment, however, the size of compliers is not identified and therefore the local average treatment effect is not identified. The under-identification for the local average treatment effect is a consequence of the under-identification for the size of compliers; if I assumed no measurement error, I could compute the size of compliers based on the measured treatment and therefore the local average treatment effect would be the Wald estimand.444The Wald estimand in this paper is defined as the ratio of the intent-to-treat effect over the mean difference of the measured treatment between the two groups defined by the instrument. Note that the Wald estimand is identified because it uses the measured treatment, but it is not the local average treatment effect because it does not use the true treatment.

I take a worst case scenario approach with respect to the measurement error and allow for a general form of measurement error. The only assumption concerning the measurement error is its independence of the instrumental variable. (Section 3.4 dispenses with this assumption and shows that it is still possible to bound the local average treatment effect.) I consider the following types of measurement error. First, the measurement error is nonclassical; that is, it can be dependent on the true treatment. The measurement error for a binary variable is always nonclassical. It is because the measurement error cannot be negative (positive) when the true variable takes the low (high) value. Second, I allow the measurement error to be endogenous (or differential); that is, the measured treatment can be dependent on the outcome conditional on the true treatment. For example, as Black et al. (2003) argue, the measurement error for educational attainment depends on the familiarity with the educational system in the U.S., and immigrants may have a higher rate of measurement error. At the same time, the familiarity with the U.S. educational system can be related to the English language skills, which can affect the labor market outcomes. Bound et al. (2001) also argue that measurement error is likely to be endogenous in some empirical applications. (In Appendix D, I explore for the identifying power of the exogeneity assumption on the measurement error. The additional assumption yields a tighter sharp identified set, but I still cannot point identify the local average treatment effect in general.) Third, there is no assumption concerning the marginal distribution of the measurement error. It is not necessary to assume anything about the accuracy of the measurement.

In the presence of measurement error, I derive the identified set for the local average treatment effect (Theorem 4).

ITT

Wald

(a) When the intent-to-treat effect is positive

Wald

ITT

(b) When the intent-to-treat effect is zero

ITT

Wald

(c) When the intent-to-treat effect is negative
Figure 1: Identified set for the local average treatment effect. ITT is the intent-to-treat effect and Wald is the Wald estimand. The thick line is the identified set for the local average treatment effect. Note that the identified set is when the intent-to-treat effect is zero.

Figure 1 describes the relationship among the identified set for the local average treatment effect, the intent-to-treat effect, and the Wald estimand. First, the intent-to-treat effect has the same sign as the local average treatment effect. This is why Figure 1 has three subfigures according to the sign of the intent-to-treat effect: (a) positive, (b) zero, and (c) negative. Second, the intent-to-treat effect is the sharp lower bound on the local average treatment effect in absolute value. Third, the Wald estimand is an upper bound on the local average treatment effect in absolute value. The Wald estimand is the probability limit of the instrumental variable estimator in my framework, which ignores the measurement error but controls only for the endogeneity. This point implies that an upper bound on the local average treatment effect is obtained by ignoring the measurement error. Frazis and Loewenstein (2003) obtain a similar result in the homogeneous treatment effect model. Last, but most importantly, the sharp upper bound in absolute value can be smaller than the Wald estimand. It is a potential cost of ignoring the measurement error and using the Wald estimand. Even for analyzing only an upper bound on the local average treatment effect, it is recommended to take the measurement error into account, which can yield a smaller upper bound than the Wald estimand. Section 3.1 investigates when the Wald estimand coincide with the sharp upper bound.

I extend the identification analysis to incorporate covariates other than the treatment variable. In this setting, the instrumental variable satisfies the exclusion restriction after conditioning covariates. Based on the insights from Abadie (2003) and Frölich (2007), I show that the identification strategy of this paper works in the presence of covariates.

I construct a confidence interval for the local average treatment effect. To construct the confidence interval, first, I approximate the identified set by discretizing the support of the outcome where the discretization becomes finer as the sample size increases. The approximation for the identified set resembles many moment inequalities in Menzel (2014) and Chernozhukov et al. (2014), who consider a finite but divergent number of moment inequalities. I apply a bootstrap method in Chernozhukov et al. (2014) to construct a confidence interval with uniformly asymptotically valid asymptotic size control. The confidence interval also rejects parameter values which do not belong to the sharp identified set. An empirical excise and a Monte Carlo simulation demonstrate a finite sample property of the proposed inference method. The empirical exercise is based on Abadie (2003), who studies the effects of 401(k) participation on financial savings, and considers a misclassification of the 401(k) participation.555The pension type is subject to a measurement error. See, for example, Gustman et al. (2008) for the pension type misclassification in the Health and Retirement Study.

As an extension, I consider the dependence between the instrument and the measurement error. In this case, there is no assumption on the measurement error, and therefore the measured treatment has no information on the local average treatment effect. Even without using the measured treatment, however, I can still apply the same identification strategy and obtain finite (but less tight) bounds on the local average treatment effect.

Moreover, I offer a new use of repeated measurements as additional sources for identification. The existing practice of repeated measurements uses one of them as an instrumental variable, as in Hausman et al. (1991), Hausman et al. (1995), Mahajan (2006), and Hu (2008).666It is worthwhile to mention that Lewbel (2007) allows for a certain form of the endogeneity in a repeated measurement, under which a repeated measurement still satisfies some exclusion restriction. However, when the true treatment is endogenous, the repeated measurements are likely to be endogenous and are not good candidates for an instrumental variable. My identification strategy demonstrates that those variables are useful for bounding the local average treatment effect in the presence of measurement error, even if none of the repeated measurement are valid instrumental variables.

The remainder of this paper is organized as follows. Section 1.1 explains several empirical examples motivating mismeasured endogenous treatments and Section 1.2 reviews the related econometric literature. Section 2 introduces mismeasured treatments in the framework of Imbens and Angrist (1994) and Angrist et al. (1996). Section 3 constructs the identified set for the local average treatment effect. I also discuss two extensions. One extension describes how repeated measurements tighten the identified set even if I cannot use any of the repeated measurements as an instrumental variable, and the other dispenses with independence between the instrument and the measurement error. Section 4 proposes an inference procedure for the local average treatment effect. Section 5 conducts an empirical illustrations, and Section 6 conducts Monte Carlo simulations. Section 7 concludes. Appendix collects proofs and remarks.

1.1 Examples for mismeasured endogenous treatments

I introduce several empirical examples in which binary treatments can be both endogenous and mismeasured at the same time. The first example is the return to schooling, in which the outcome is wages and the treatment is educational attainment, for example, whether a person has completed college or not. Unobserved individual ability affects both the schooling decision and wage determination, which leads to the endogeneity of educational attainment in the wage equation (see, for example, Card (2001)). Moreover, survey datasets record educational attainments based on the interviewee’s answers, and these self-reported educational attainments are subject to measurement error. Griliches (1977), Angrist and Krueger (1999), Kane et al. (1999), Card, 2001, Black et al. (2003) have pointed out the mismeasurement of educational attainments. For example, Black et al. (2003) estimate that the 1990 Decennial Census has 17.7% false positive rate of reporting a doctoral degree.

The second example is labor supply response to welfare program participation, in which the outcome is employment status and the treatment is welfare program participation. Self-reported welfare program participation in survey datasets can be mismeasured (Hernandez and Pudney, 2007). The psychological cost for welfare program participation, welfare stigma, affects job search behavior and welfare program participation simultaneously; that is, welfare stigma may discourage individuals from participating in a welfare program, and, at the same time, affect an individual’s effort in the labor market (see Moffitt (1983) and Besley and Coate (1992) for a discussion on the welfare stigma). Moreover, the welfare stigma gives welfare recipients some incentive not to reveal their participation status to the survey, which causes endogenous measurement error in that the unobserved individual heterogeneity affects both the measurement error and the outcome.

The third example is the effect of a job training program on wages. As it is similar to the return to schooling, unobserved individual ability plays a key role in this example. Self-reported completion of job training program is also subject to measurement error (Bollinger, 1996). Frazis and Loewenstein (2003) develop a methodology for evaluating a homogeneous treatment effect with mismeasured endogenous treatment, and apply their methodology to evaluate the effect of a job training program on wages.

The last example is the effect of maternal drug use on infant birth weight. Kaestner et al. (1996) estimate that a mother tends to underreport her drug use, but, at the same time, she tends to report it correctly if she is a heavy user. When the degree of drug addiction is not observed, it becomes an individual unobserved heterogeneity which affects infant birth weight and the measurement in addition to the drug use.

1.2 Literature review

Here I summarize the related econometric literature. Mahajan (2006), Lewbel (2007), and Hu (2008) use an instrumental variable to correct for measurement error in a binary (or discrete) treatment in the homogeneous treatment effect framework and they achieve nonparametric point identification of the average treatment effect. They assume that the true treatment is exogenous, whereas I allow it to be endogenous.

Finite mixture models are related to my analysis. I consider the unobserved binary treatment, whereas finite mixture models deal with unobserved type. Henry et al. (2014) and Henry et al. (2015) are the most closely related. They investigate the identification problem in finite mixture models, by using the exclusion restriction in which an instrumental variable only affects the mixing distribution of a type without affecting the component distribution (that is, the conditional distribution given the type). If I applied their approach directly to my framework, their exclusion restriction would imply conditional independence between the instrumental variable and the outcome given the true treatment. This conditional independence implies that the local average treatment effect does not exhibit essential heterogeneity (Heckman et al., 2010) and that the local average treatment effect is the mean difference between the control and treatment groups.777This footnote uses the notation introduced in Section 2. The conditional independence implies . Under this assumption, and therefore . I obtain the equality This above equation implies that the local average treatment effect does not depend on the compliers of consideration, which is in contrast with the essential heterogeneity of the treatment effect. Furthermore, since is the local average treatment effect, I do not need to care about the endogeneity. Instead of applying the approaches in Henry et al. (2014) and Henry et al. (2015), I use a different exclusion restriction in which the instrumental variable does not affect the outcome or the measured treatment directly.

A few papers have applied an instrumental variable to a mismeasured binary regressor in the homogenous treatment effect framework. They include Aigner (1973), Kane et al. (1999), Bollinger (1996), Black et al. (2000), Frazis and Loewenstein (2003), and DiTraglia and García-Jimeno (2015). Frazis and Loewenstein (2003) and DiTraglia and García-Jimeno (2015) are the most closely related among them, since they allow for endogeneity. Here I allow for heterogeneous treatment effects, and I contribute to the heterogeneous treatment effect literature by investigating the consequences of the measurement errors in the treatment.

Kreider and Pepper (2007), Molinari (2008), Imai and Yamamoto (2010), and Kreider et al. (2012) apply a partial identification strategy for the average treatment effect to the mismeasured binary regressor problem by utilizing the knowledge of the marginal distribution for the true treatment. Those papers use auxiliary datasets to obtain the marginal distribution for the true treatment. Kreider et al. (2012) is the most closely related, in that they allow for both treatment endogeneity and endogenous measurement error. My instrumental variable approach can be an an alternative strategy to deal with mismeasured endogenous treatment. It is worthwhile because, as mentioned in Schennach (2013), the availability of an auxiliary dataset is limited in empirical research. Furthermore, it is not always the case that the results from auxiliary datasets is transported into the primary dataset (Carroll et al., 2012, p.10),

Some papers investigate mismeasured endogenous continuous variables, instead of binary variables. Amemiya (1985); Hsiao (1989); Lewbel (1998); Song et al. (2015) consider nonlinear models with mismeasured continuous explanatory variables. The continuity of the treatment is crucial for their analysis, because they assume classical measurement error. The treatment in my analysis is binary and therefore the measurement error is nonclassical. Hu et al. (2015) consider mismeasured endogenous continuous variables in single index models. However, their approach depends on taking derivatives of the conditional expectations with respect to the continuous variable. It is not clear if it can be extended to binary variables. Song (2015) considers the semi-parametric model when endogenous continuous variables are subject to nonclassical measurement error. He assumes conditional independence between the instrumental variable and the outcome given the true treatment, which would impose some structure on the outcome equation when a treatment is binary (see Footnote 7). Instead I propose an identification strategy without assuming any structure on the outcome equation.

Chalak (2017) investigates the consequences of measurement error in the instrumental variable instead of the treatment. He assumes that the treatment is perfectly observed, whereas I allow for it to be measured with error. Since I assume that the instrumental variable is perfectly observed, my analysis is not overlapped with Chalak (2017).

Manski (2003), Blundell et al. (2007), and Kitagawa (2010) have similar identification strategy in the context of sample selection models. These papers also use the exclusion restriction of the instrumental variable for their partial identification results. Particularly, Kitagawa (2010) derives the integrated envelope from the exclusion restriction, which is similar to the total variation distance in my analysis because both of them are characterized as a supremum over the set of the partitions. First and the most importantly, I consider mismeasurement of the treatment, whereas the sample selection model considers truncation of the outcome. It is not straightforward to apply their methodologies in sample selection models into mismeasured treatment problem. Second, I offer an inference method with uniform size control, but Kitagawa (2010) derives only point-wise size control. Last, Blundell et al. (2007) and Kitagawa (2010) use their result for specification test, but I cannot use it to carry out a specification test because the sharp identified set of my analysis is always non-empty.

Finally, Calvi et al. (2017) and Yanagi (2017) have recently discussed identification issues of the local average treatment effect in the presence of a measurement error in the treatment variable. They are built on results in the previous draft of this paper (Ura, 2015) to derive novel and important results when there are additional variables in a dataset: multiple measurements of the true treatment variable (Calvi et al., 2017) or multiple instrumental variables (Yanagi, 2017). In contrast, the results of this paper are valid without these additional variables and only requires the assumptions in Imbens and Angrist (1994) and Angrist et al. (1996).

2 Local average treatment effect framework with misclassification

My analysis considers a mismeasured treatment in the framework of Imbens and Angrist (1994) and Angrist et al. (1996). The objective is to evaluate the causal effect of a binary treatment on an outcome , where represents the control group and represents the treatment group. To deal with endogeneity of , I use a binary instrumental variable which shifts exogenously without any direct effect on . The treatment of interest is not directly observed, and instead there is a binary measurement for . I put the symbol on to emphasize that the true treatment is unobserved. I allow to be discrete, continuous or mixed; is only required to have some known dominating finite measure on the real line. For example, can be the Lebesgue measure or the counting measure. Let be the support for the random variable and be the support for .

To describe the data generating process, I consider the counterfactual variables. is the counterfactual true treatment when . is the counterfactual outcome when . is the counterfactual measured treatment when . The individual treatment effect is . It is not directly observed; and are not observed at the same time. Only is observable. Using the notation, the observed variables are generated by the three equations:

(1)
(2)
(3)

Figure 2 graphically describes the relationship among the instrument , the (unobserved) true treatment , the measured treatment , and the outcome .

outcome

true treatment

measured treatment

instrument

Figure 2: Graphical representation of dependencies among variables

(1) is the measurement equation, which is the arrow from to in Figure 2. is the measurement error; (or ) represents a false positive and (or ) represents a false negative. Equations (2) and (3) are the same as Imbens and Angrist (1994) and Angrist et al. (1996). (2) is the outcome equation, which is the arrow from to in Figure 2. (3) is the treatment assignment equation, which is the arrow from to in Figure 2. A potentially non-zero correlation between and causes an endogeneity problem.

In a return to schooling analysis, is wages, is the true indicator for college completion, is the proximity to college, and is the self-reported college completion. The treatment effect in the return to schooling is the effect of college completion on wages . The college completion is not correctly measured in a survey dataset, such that only the self report is observed.

This section and Section 3 impose only the following assumption.

Assumption 1.

(i) For each , is independent of . (ii) almost surely. (iii) .

Assumption 1 (i) is the exclusion restriction and I consider stochastic independence instead of mean independence. Although it is stronger than the minimal conditions for the identification for the local average treatment effect without measurement error, a large part of the existing applied papers assume stochastic independence (Huber and Mellace, 2015, p.405). is also independent of conditional on , which is the only assumption on the measurement error for the identified set in Section 3. (Section 3.4 even dispenses with this assumption.) Assumption 1 (ii) is the monotonicity condition for the instrument, in which the instrument increases the value of for all the individuals. de Chaisemartin (2016) relaxes the monotonicity condition, and it can be shown in Appendix E that the identification results in my analysis still holds with a slight modification under the complier-defiers-for-marginals condition in de Chaisemartin (2016). Note that Assumption 1 does not include a relevance condition for the instrumental variable. The standard relevance condition does not affect the identification results in my analysis. I will discuss the relevance condition in my framework after Theorem 4. Assumption 1 (iii) excludes that is constant.

As I emphasized in the introduction, the framework here does not assume anything on measurement error except for its independence from . Assumption 1 does not impose any restriction on the marginal distribution of the measurement error or on the relationship between the measurement error and . Particularly, the measurement error can be endogenous, that is, and can be correlated.888Although it has not been supported in validation data studies (e.g., Black et al., 2003), a majority of the literature on measurement error has assume that the measurement error is exogenous (Bound et al., 2001). I also explore for the identifying power of the exogenous measurement error assumption in Appendix D.

I focus on the local average treatment effect, which is defined by

The local average treatment effect is the average of the treatment effect over the subpopulation (the compliers) whose treatment status is strictly affected by the instrument. Imbens and Angrist (1994, Theorem 1) show that the local average treatment effect equals

where I define for a random variable . Note that is the intent-to-treat effect, that is, the regression of on . The treatment is measured with error, and therefore the above fraction is not the Wald estimand

Since is not identified, I cannot identify the local average treatment effect. The failure of point identification comes purely from the measurement error, because the local average treatment effect would be point-identified under . In fact, my proposed methodology in this paper is essentially a bounding strategy of and I use the bound to construct the sharp identified set for the local average treatment effect.

3 Identified set for the local average treatment effect

This section show how the instrumental variable partially identifies the local average treatment effect in the framework of Section 2. Before defining the identified set, I express the local average treatment effect as a function of the underlying distribution of . I use the symbol on to clarify that is the distribution of the unobserved variables. I denote the expectation operator by when I need to clarify the underlying distribution. The local average treatment effect is a function of the unobserved distribution :

I denote by the parameter space for the local average treatment effect , that is, the set of where and are density functions dominated by the known probability measure . For example, when is binary.

The identified set is the set of parameter values for the local average treatment effect which is consistent with the distribution of the observed variables. I use for the distribution of the observed variables The equations (1), (2), and (3) induce the distribution of the observables from the unobserved distribution , and I denote by the induced distribution. When the distribution of is , the set of which induces is , where is the set of ’s satisfying Assumptions 1. For every distribution of , the (sharp) identified set for the local average treatment effect is defined as .

Imbens and Angrist (1994, Theorem 1) provides a relationship between and the local average treatment effect:

(4)

This equation gives the two pieces of information of . First, the sign of is the same as . Second, the absolute value of is at least the absolute value of . The following lemma summarizes these two pieces.

Lemma 2.

Under Assumption 1,

I derive a new implication from the exclusion restriction for the instrumental variable in order to obtain an upper bound on in absolute value. To explain the new implication, I introduce the total variation distance, which is the distance between the distribution and : For any random variable , define

where is a dominating measure for the distribution of .

Lemma 3.

Under Assumption 1,

The first term, , in Lemma 3 reflects the dependency of on , and it can be interpreted as the magnitude of the distributional effect of on . The second and third terms, and , are the effect of the instrument on the true treatment . Based on Lemma 3, the magnitude of the effect of on is no smaller than the magnitude of the effect of on .

The new implication in Lemma 3 gives a lower bound on and therefore yields an upper bound on the local average treatment effect in absolute value, combined with equation (4). Therefore, I use these relationships to derive an upper bound on the local average treatment effect in absolute value, that is,

as long as .

Theorem 4 shows that the above observations characterize the sharp identified set for the local average treatment effect.

Theorem 4.

Suppose that Assumption 1 holds, and consider an arbitrary data distribution of . The identified set for the local average treatment effect is characterized as follows: if ; otherwise,

The total variation distance plays two roles in determining the sharp identified set in this theorem. First, measures the strength of the instrumental variable, that is, is the relevance condition in my identification analysis. When , the interval in the above theorem is always nonempty and bounded, which implies that has some identifying power for the local average treatment effect. By contrast, means that the instrumental variable does not affect and , in which case has no identifying power for the local average treatment effect. In this case, almost everywhere over and particularly . Note that all the three inequalities in Theorem 4 have no restriction on in this case. Second, determines the length of the sharp identified set. The length is , which is a decreasing function in .

In general, the lower and upper bounds of the sharp identified set are not equal to the local average treatment effect. The lower bound is weakly smaller (in the absolute value) than the local average treatment effect, because the size of the compliers is weakly smaller than one. The upper bound is weakly larger (in the absolute value) than the local average treatment effect, because is weakly smaller than the size of the compliers due to the mis-measurement of the treatment variable.

The standard relevance condition is not required in Theorem 4. is a necessary condition to define the Wald estimand, but the sharp identified set does not depend directly on the Wald estimand. In fact, in Theorem 4 is weaker than .

Note that the sharp identified set is always non-empty. There is no testable implications on the distribution of the observed variables, and therefore it is impossible to conduct a specification test for Assumption 1.

3.1 Wald estimand and the identified set

The Wald estimand can be outside the identified set. One necessary and sufficient condition for the Wald estimand to be included in the identified set is given as follows.

Lemma 5.

The Wald estimand is in the identified set if and only if

(5)

This condition in (5) are the testable implications from the the local average treatment effect framework without measurement error (Balke and Pearl, 1997 and Heckman and Vytlacil, 2005). The recent papers by Huber and Mellace (2015), Kitagawa (2015), and Mourifié and Wan (2016) propose the testing procedures for (5). Based on the results in Theorem 4, their testing procedures are re-interpreted as a test for the null hypothesis that the Wald estimand is inside the sharp upper bound on the local average treatment effect.999Unfortunately, (5) cannot be used for testing the existence of a measurement error. Even if there is non-zero measurement error, (5) can still hold.

3.2 Conditional exogeneity of the instrumental variable

As in Abadie (2003) and Frölich (2007), this section considers the conditional exogeneity of the instrumental variable in which is exogenous given a set of covariates , which weaker than the unconditional exogeneity in Assumption 1.

Assumption 6.

There is some variable taking values in a set satisfying the following properties. (i) For each , is conditionally independent of given . (ii) almost surely. (iii) .

I define the -conditional total variation distance by

Note that is a random variable as a function of . Under the conditional exogeneity of , Theorem 4 becomes as follows.

Theorem 7.

Suppose that Assumption 6 holds, and consider an arbitrary data distribution of . The identified set for the local average treatment effect is characterized as follows: if ; otherwise,

3.3 Identifying power of repeated measurements

The identification strategy in the above analysis offers a new use of repeated measurements as additional sources for identification. Repeated measurements (for example, Hausman et al., 1991) is a popular approach in the literature on measurement error, but they cannot be instrumental variables in this framework. This is because the true treatment is endogenous and it is natural to suspect that a measurement of is also endogenous. The more accurate the measurement is, the more likely it is to be endogenous. Nevertheless, the identification strategy incorporates repeated measurements as an additional information to tighten the identified set for the local average treatment effect, when they are coupled with the instrumental variable . Unlike the other paper on repeated measurements, I do not need to assume the independence of measurement errors among multiple measurements. The strategy also benefits from having more than two measurements unlike Hausman et al. (1991) who achieve point identification with two measurements.

Consider a repeated measurement for . I do not require that is binary, so can be discrete or continuous. Like , I model using the counterfactual outcome notations. is a counterfactual second measurement when the true treatment is , and is a counterfactual second measurement when the true treatment is . Then the data generation of is

I strengthen Assumption 1 by assuming that the instrumental variable is independent of conditional on .

Assumption 8.

(i) is independent of for each . (ii) almost surely. (iii) .

Note that I do not assume the independence between and , where the independence between the measurement errors is a key assumption when the repeated measurement is an instrumental variable. Assumption 8 tightens the identified set for the local average treatment effect as follows.

The requirement on does not restrict to have the same support as . In fact, can be any variable which depends on . For example, can be another outcome variable than .

Theorem 9.

Suppose that Assumption 8 holds, and consider an arbitrary data distribution of . The identified set for the local average treatment effect is characterized as follows: if ; otherwise,

The identified set in Theorem 9 is weakly smaller than the identified set in Theorem 4. The total variation distance in Theorem 9 is weakly larger than that in Theorem 4, because, using the triangle inequality,

and the strict inequality holds unless the sign of is constant in for every . Therefore, it is possible to test whether the repeated measurement has additional information, by testing whether the sign of is constant in .

3.4 Dependence between measurement error and instrumental variable

It is still possible to apply the same identification strategy and obtain finite (but less tight) bounds on the local average treatment effect, even without the independence between the instrumental variable and the measurement error. (Assumption 1 (i) implies that is independent of for each .) Instead Assumption 1 is weakened to allow for the measurement error to be correlated with the instrumental variable .

Assumption 10.

(i) is independent of for each . (ii) almost surely. (iii) .

Theorem 11 shows that the above observations characterize the identified set for the local average treatment effect under Assumption 10.

Theorem 11.

Suppose that Assumption 10 holds, and consider an arbitrary data distribution of . The identified set for the local average treatment effect is characterized as follows: if ; otherwise,

The difference from Theorem 4 is that Theorem 11 does not depend on the measured treatment . Although it is observed in the dataset, does not have any information on the local average treatment effect because Assumption 10 does not restrict . When , there are nontrivial upper and lower bounds on the local average treatment effect even without using the measured treatment .

4 Inference

Based on the sharp identified set in the presence of covariates (Theorem 7), this section constructs a confidence interval for the local average treatment effect based on an i.i.d. sample of . The confidence interval described below controls the asymptotic size uniformly over a class of data generating processes, and rejects all the fixed alternatives.

The identified set in 7 is characterized by moment inequalities as follows.

Lemma 12.

Let be an arbitrary data distribution of . Under Assumption 6, is the set of in which

(6)
(7)
(8)

where , is the set of measurable functions on taking a value in and .

I construct a -confidence interval for the local average treatment effect with treating as a nuisance parameter for given . I assume that a -confidence interval for is available for researchers for given . Given , I construct the -confidence interval for the local average treatment effect as

where and are defined below using the bootstrap-based testing (Chernozhukov et al., 2014).

The number of the moment inequalities in Lemma 12 can be finite or infinite, which determines whether some of the existing methods can be applied directly to the inference on the local average treatment effect. When has finite supports and therefore is finite, the sharp identified set is characterized by a finite number of inequalities, and therefore I can apply inference methods based on unconditional moment inequalities.101010The literature on conditional and unconditional moment inequality models is broad and growing. See Canay and Shaikh (2016) for a recent survey on this literature. To the best of my knowledge, however, inference for the local average treatment effect in my framework does not fall directly into the existing moment inequality models when either or is continuous. When either or is continuous, the sharp identified set is characterized by an uncountably infinite number of inequalities. In the current literature on the partially identified parameters, an infinite number of moment inequalities are mainly considered in the context of conditional moment inequalities. The identified set in this paper is not characterized by conditional moment inequalities.111111Chernozhukov et al. (2013) also considers an infinite number of unconditional moment inequalities in which the moment functions are continuously indexed by a compact subset in a finite dimensional space. It is not straightforward to verify the continuity condition (Condition C.1 in their paper) for the moment inequalities in Lemma 12, in which the moment functions need to be continuously indexed by a compact subset of the finite dimensional space.121212Andrews and Shi (2017) considers an infinite number of unconditional moment inequalities in which the moment functions satisfies manageability condition. I cannot apply their approach here because takes discrete values in and then the packing numbers depends on the sample size.

I considers a sequence of finite sets which converges to as a sample size increases. (The convergence is formally defined in Assumption 14, and an example for appears after Assumption 14.) Note that, when is finite, can be equal to . If is replaced with in Lemma 12, the number of the moment inequalities becomes finite. At the same time, as approaches to , the approximation error from using converges to zero, and the number of the inequalities can be increasing, particularly diverging to the infinity when includes infinite elements. The approximated identified set is characterized by a finite number of the following moment inequalities:

(9)
(10)
(11)

Denote by the resulting number of moment inequalities, that is, the number of elements in plus . Note that, when , the moment inequalities in (11) is equivalent to using the Wald estimand as the upper bound for .

For the size , I construct a test statistic and a critical value via the multiplier bootstrap (Chernozhukov et al., 2014) for many moment inequality models (described in Section Appendix A: Multiplier bootstrap).131313In this paper I focus on the one-step multiplier bootstrap in (Chernozhukov et al., 2014). It is also possible to use the two- or three-step empirical/multiplier bootstrap in this paper, but I do not compare them because the comparison of these methods is above the scope of this paper. Chernozhukov et al. (2014) studies the testing problem for moment inequality models in which the number of the moment inequalities is finite but growing. Since the number of the moment inequalities in (9)-(11) is finite but growing, their results are applicable to construct a confidence interval based on (9)-(11).

Assumption 13.

Given positive constants and , the class of data generating processes, denoted by , and the parameter spaces satisfy

  • ,

  • is bounded,

  • The random variable inside in (9)-(11) has a non-zero variance for every and every ,

  • ,

  • for every .

The first assumption (i) is a regularity condition. The second assumption (ii) requires researchers to know ex ante upper and lower bounds on the parameter. The third assumption (iii) guarantees that the test statistic is well-defined. The fourth assumption (iv) is that the confidence interval for controls the size uniformly over . The last assumption (v) is that the propensity score is bounded away from zero and one.

In this paper I assume that satisfies the following conditions.

Assumption 14.

(i) . (ii) The convergence

(12)

holds uniformly over and . (iii) The number of elements in satisfies

(13)

for some and .

An example of is obtained by discretizing . Consider a partition over , in which the intervals and the grid size depend on the sample size . Let be a generic function of into that is constant over for every . Let be the set of all such functions. Lemma 15 shows that this construction of satisfies Eq. (12) under conditions on and . The conditions in Lemma 15 guarantee that the approximation error from the discretization vanishes as the sample size increases.

It is worthwhile to mention that, when , the implied upper bound in (11) is equal to the Wald estimand. It can be smaller than the Wald estimand as long as ,

Lemma 15.

Assumption 14 holds if

  • the partition is a refinement of the partition ;

  • satisfies (13);

  • there is a positive constant such that is a subset of some open ball with radius in ; and

  • the density function is Hölder continuous in with the Hölder constant and exponent .

Theorem 16 shows asymptotic properties of the confidence interval . The first result (i) is the uniform asymptotic size control and the second result (ii) is the consistency against all the fixed alternatives.

Theorem 16.

Suppose that Assumptions 13 and 14 hold. (i) The confidence interval controls the asymptotic size uniformly:

(ii) If Eq. (12) holds, the confidence interval excludes all the fixed alternatives:

5 Empirical illustrations

This section studies the effects of 401(k) participation on financial savings using the inference method in Section 4. I introduce a measurement error problem to the analysis of Abadie (2003), which investigates the local average treatment effect using the eligibility for 401(k) program. The robustness to misclassification is empirically relevant, because the retirement pension plan type is subject to a measurement error in survey datasets. Using the Health and Retirement Study, for example, Gustman et al. (2008) estimate that around one fourth of the survey respondents misclassified their pension plan type.

The dataset in my analysis is from the Survey of Income and Program Participation (SIPP) of 1991. It has been used in various analyses, e.g., Poterba et al. (1995) and Abadie (2003). I follow the data construction in Abadie (2003). The sample consists of households in which at least one person is employed, which has no income from self-employment, and whose annual family income is between $10,000 and $200,000. The resulting sample size is 9,275.

The outcome variable is the net financial assets, the measured treatment variable is the self-reported participation in 401(k), is the eligibility for 401(k) and is the participation in an individual retirement account (IRA). The control variables includes constant, family income, age and its square, marital status, and family size. I compute the summary statistics for these variables in Table 4. The 401(k) participation can be endogenous, because participants in 401(k) might be more informed or plan more about retirement savings than non-participants. To control for the endoeneity problem, this paper uses 401(k) eligibility as an instrumental variable.

I use the linear probability model for the regression of the instrumental variable on the control variables , that is, . For a comparison purpose, I compute the Wald estimator, , with a 95% bootstrapped confidence interval .141414The Wald estimator here is the sample analogue of with being estimated in the linear probability model. The intent-to-treat effect is estimated as with a 95% bootstrapped confidence interval

Table 4-4 shows that the confidence intervals for the local average treatment effect, under different assumptions (Theorem 4, 9, 11, respectively).151515I use 2000 draws for the bootstrap and set for the moment selection and for the estimation of . See Appendix A for the multiplier bootstrap critical value. The confidence intervals in these tables are robust to a misclassification of the treatment variable. They are wider than the 95% confidence interval for the Wald estimator, but are in general comparable to the confidence interval for the Wald estimator. The confidence intervals in this exercise do not shrink as increases from to . (Note that, when , the moment inequalities in (11) is equivalent to using the Wald estimand as the upper bound for .) This is possibly because the data generation process does not violate the conditions in (5) to a large extent, and therefore the Wald estimand is close (even if not equal) to the sharp upper bound for the local average treatment effect.

Table 4 summarizes the confidence intervals with the IRA participation as an additional measurement, as discussed in Theorem 9. It shows similar values to Table 4 and it can be interpreted as a result that the IRA participation has only little identifying power on the local average treatment effect in this empirical exercise.

Table 4 summarizes the confidence intervals without using the measured treatment , as in Theorem 11. The lower bound of the confidence intervals does not change from those in Table 4, because the lower bound of the identified set does not change without the information from the measured treatment . The upper bound is 3-4 times larger than those in Table 4, which is the cost of not using .161616When ,