Nonparametric Instrumental Variable Estimation Under Monotonicity1footnote 11footnote 1First version: January 2014. This version: July 9, 2019. We thank Alex Belloni, Richard Blundell, Stéphane Bonhomme, Moshe Buchinsky, Matias Cattaneo, Xiaohong Chen, Victor Chernozhukov, Andrew Chesher, Joachim Freyberger, Jinyong Hahn, Dennis Kristensen, Simon Lee, Zhipeng Liao, Rosa Matzkin, Eric Mbakop, Ulrich Müller, Markus Reiß, Susanne Schennach, Azeem Shaikh, and Vladimir Spokoiny for useful comments and discussions.

Nonparametric Instrumental Variable Estimation Under Monotonicity111First version: January 2014. This version: July 9, 2019. We thank Alex Belloni, Richard Blundell, Stéphane Bonhomme, Moshe Buchinsky, Matias Cattaneo, Xiaohong Chen, Victor Chernozhukov, Andrew Chesher, Joachim Freyberger, Jinyong Hahn, Dennis Kristensen, Simon Lee, Zhipeng Liao, Rosa Matzkin, Eric Mbakop, Ulrich Müller, Markus Reiß, Susanne Schennach, Azeem Shaikh, and Vladimir Spokoiny for useful comments and discussions.

Denis Chetverikov Department of Economics, University of California at Los Angeles, 315 Portola Plaza, Bunche Hall, Los Angeles, CA 90024, USA; E-Mail address: chetverikov@econ.ucla.edu.    Daniel Wilhelm Department of Economics, University College London, Gower Street, London WC1E 6BT, United Kingdom; E-Mail address: d.wilhelm@ucl.ac.uk. The author gratefully acknowledges financial support from the ESRC Centre for Microdata Methods and Practice at IFS (RES-589-28-0001).
Abstract

The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark contrast to the existing literature, the presence of a monotone instrument implies boundedness of our measure of ill-posedness when restricted to the space of monotone functions. Based on this result we derive a novel non-asymptotic error bound for the constrained estimator that imposes monotonicity of the regression function. For a given sample size, the bound is independent of the degree of ill-posedness as long as the regression function is not too steep. As an implication, the bound allows us to show that the constrained estimator converges at a fast, polynomial rate, independently of the degree of ill-posedness, in a large, but slowly shrinking neighborhood of constant functions. Our simulation study demonstrates significant finite-sample performance gains from imposing monotonicity even when the regression function is rather far from being a constant. We apply the constrained estimator to the problem of estimating gasoline demand functions from U.S. data.

1 Introduction

Despite the pervasive use of linear instrumental variable methods in empirical research, their nonparametric counterparts are far from enjoying similar popularity. Perhaps two of the main reasons for this originate from the observation that point-identification of the regression function in the nonparametric instrumental variable (NPIV) model requires completeness assumptions, which have been argued to be strong (Santos (2012)) and non-testable (Canay, Santos, and Shaikh (2013)), and from the fact that the NPIV model is ill-posed, which may cause regression function estimators in this model to suffer from a very slow, logarithmic rate of convergence (e.g. Blundell, Chen, and Kristensen (2007)).

In this paper, we explore the possibility of imposing shape restrictions to improve statistical properties of the NPIV estimators and to achieve (partial) identification of the NPIV model in the absence of completeness assumptions. We study the NPIV model

(1)

where is a dependent variable, an endogenous regressor, and an instrumental variable (IV). We are interested in identification and estimation of the nonparametric regression function based on a random sample of size from the distribution of . We impose two monotonicity conditions: (i) monotonicity of the regression function (we assume that is increasing222All results in the paper hold also when is decreasing. In fact, as we show in Section 4 the sign of the slope of is identified under our monotonicity conditions.) and (ii) monotonicity of the reduced form relationship between the endogenous regressor and the instrument in the sense that the conditional distribution of given corresponding to higher values of first-order stochastically dominates the same conditional distribution corresponding to lower values of (the monotone IV assumption).

We show that these two monotonicity conditions together significantly change the structure of the NPIV model, and weaken its ill-posedness. In particular, we demonstrate that under the second condition, a slightly modified version of the sieve measure of ill-posedness defined in Blundell, Chen, and Kristensen (2007) is bounded uniformly over the dimension of the sieve space, when restricted to the set of monotone functions; see Section 2 for details. As a result, under our two monotonicity conditions, the constrained NPIV estimator that imposes monotonicity of the regression function possesses a fast rate of convergence in a large but slowly shrinking neighborhood of constant functions.

More specifically, we derive a new non-asymptotic error bound for the constrained estimator. The bound exhibits two regimes. The first regime applies when the function is not too steep, and the bound in this regime is independent of the sieve measure of ill-posedness, which slows down the convergence rate of the unconstrained estimator. In fact, under some further conditions, the bound in the first regime takes the following form: with high probability,

where is the constrained estimator, an appropriate -norm, the number of series terms in the estimator , the number of derivatives of the function , and some constant; see Section 3 for details. Thus, the constrained estimator has fast rate of convergence in the first regime, and the bound in this regime is of the same order, up to a log-factor, as that for series estimators of conditional mean functions. The second regime applies when the function is sufficiently steep. In this regime, the bound is similar to that for the unconstrained NPIV estimators. The steepness level separating the two regimes depends on the sample size and decreases as the sample size grows large. Therefore, for a given increasing function , if the sample size is not too large, the bound is in its first regime, where the constrained estimator does not suffer from ill-posedness of the model. As the sample size grows large, however, the bound eventually switches to the second regime, where ill-posedness of the model undermines the statistical properties of the constrained estimator similarly to the case of the unconstrained estimator.

Intuitively, existence of the second regime of the bound is well expected. Indeed, if the function is strictly increasing, it lies in the interior of the constraint that is increasing. Hence, the constraint does not bind asymptotically so that, in sufficiently large samples, the constrained estimator coincides with the unconstrained one and the two estimators share the same convergence rate. In finite samples, however, the constraint binds with non-negligible probability even if is strictly increasing. The first regime of our non-asymptotic bound captures this finite-sample phenomenon, and improvements from imposing the monotonicity constraint on in this regime can be understood as a boundary effect. Importantly, and perhaps unexpectedly, we show that under the monotone IV assumption, this boundary effect is so strong that ill-posedness of the problem completely disappears in the first regime.333Even though we have established the result that ill-posedness disappears in the first regime under the monotone IV assumption, currently we do not know whether this assumption is necessary for the result. In addition, we demonstrate via our analytical results as well as simulations that this boundary effect can be strong even far away from the boundary and/or in large samples.

Our simulation experiments confirm these theoretical findings and demonstrate dramatic finite-sample performance improvements of the constrained relative to the unconstrained NPIV estimator when the monotone IV assumption is satisfied. Imposing the monotonicity constraint on removes the estimator’s non-monotone oscillations due to sampling noise, which in ill-posed inverse problems can be particularly pronounced. Therefore, imposing the monotonicity constraint significantly reduces variance while only slightly increasing bias.

In addition, we show that in the absence of completeness assumptions, that is, when the NPIV model is not point-identified, our monotonicity conditions have non-trivial identification power, and can provide partial identification of the model.

We regard both monotonicity conditions as natural in many economic applications. In fact, both of these conditions often directly follow from economic theory. Consider the following generic example. Suppose an agent chooses input (e.g. schooling) to produce an outcome (e.g. life-time earnings) such that , where summarizes determinants of outcome other than . The cost of choosing a level is , where is a cost-shifter (e.g. distance to college) and represents (possibly vector-valued) unobserved heterogeneity in costs (e.g. family background, a family’s taste for education, variation in local infrastructure). The agent’s optimization problem can then be written as

so that, from the first-order condition of this optimization problem,

(2)

if marginal cost are decreasing in (i.e. ), marginal cost are increasing in (i.e. ), and the production function is concave (i.e. ). As long as is independent of the pair , condition (2) implies our monotone IV assumption and increasing corresponds to the assumption of a monotone regression function. Dependence between and generates endogeneity of , and independence of from implies that can be used as an instrument for .

Another example is the estimation of Engel curves. In this case, the outcome variable is the budget share of a good, the endogenous variable is total expenditure, and the instrument is gross income. Our monotonicity conditions are plausible in this example because for normal goods such as food-in, the budget share is decreasing in total expenditure, and total expenditure increases with gross income. Finally, consider the estimation of (Marshallian) demand curves. The outcome variable is quantity of a consumed good, the endogenous variable is the price of the good, and could be some variable that shifts production cost of the good. For a normal good, the Slutsky inequality predicts to be decreasing in price as long as income effects are not too large. Furthermore, price is increasing in production cost and, thus, increasing in the instrument , and so our monotonicity conditions are plausible in this example as well.

Both of our monotonicity assumptions are testable. For example, a test of the monotone IV condition can be found in Lee, Linton, and Whang (2009). In this paper, we extend their results by deriving an adaptive test of the monotone IV condition, with the value of the involved smoothness parameter chosen in a data-driven fashion. This adaptation procedure allows us to construct a test with desirable power properties when the degree of smoothness of the conditional distribution of given is unknown. Regarding our first monotonicity condition, to the best of our knowledge, there are no procedures in the literature that consistently test monotonicity of the function in the NPIV model (1). We consider such procedures in a separate project and, in this paper, propose a simple test of monotonicity of given that the monotone IV condition holds.

Matzkin (1994) advocates the use of shape restrictions in econometrics and argues that economic theory often provides restrictions on functions of interest, such as monotonicity, concavity, and/or Slutsky symmetry. In the context of the NPIV model (1), Freyberger and Horowitz (2013) show that, in the absence of point-identification, shape restrictions may yield informative bounds on functionals of and develop inference procedures when the regressor and the instrument are discrete. Blundell, Horowitz, and Parey (2013) demonstrate via simulations that imposing Slutsky inequalities in a quantile NPIV model for gasoline demand improves finite-sample properties of the NPIV estimator. Grasmair, Scherzer, and Vanhems (2013) study the problem of demand estimation imposing various constraints implied by economic theory, such as Slutsky inequalities, and derive the convergence rate of a constrained NPIV estimator under an abstract projected source condition. Our results are different from theirs because we focus on non-asymptotic error bounds, with special emphasis on properties of our estimator in the neighborhood of the boundary, we derive our results under easily interpretable, low level conditions, and we find that our estimator does not suffer from ill-posedness of the problem in a large but slowly shrinking neighborhood of constant functions.

Other related literature.

The NPIV model has received substantial attention in the recent econometrics literature. Newey and Powell (2003), Hall and Horowitz (2005), Blundell, Chen, and Kristensen (2007), and Darolles, Fan, Florens, and Renault (2011) study identification of the NPIV model (1) and propose estimators of the regression function . See Horowitz (2011, 2014) for recent surveys and further references. In the mildly ill-posed case, Hall and Horowitz (2005) derive the minimax risk lower bound in -norm and show that their estimator achieves this lower bound. Under different conditions, Chen and Reiß (2011) derive a similar bound for the mildly and the severely ill-posed case and show that the estimator by Blundell, Chen, and Kristensen (2007) achieves this bound. Chen and Christensen (2013) establish minimax risk bounds in the sup-norm, again both for the mildly and the severely ill-posed case. The optimal convergence rates in the severely ill-posed case were shown to be logarithmic, which means that the slow convergence rate of existing estimators is not a deficiency of those estimators but rather an intrinsic feature of the statistical inverse problem.

There is also large statistics literature on nonparametric estimation of monotone functions when the regressor is exogenous, i.e. , so that is a conditional mean function. This literature can be traced back at least to Brunk (1955). Surveys of this literature and further references can be found in Yatchew (1998), Delecroix and Thomas-Agnan (2000), and Gijbels (2004). For the case in which the regression function is both smooth and monotone, many different ways of imposing monotonicity on the estimator have been studied; see, for example, Mukerjee (1988), Cheng and Lin (1981), Wright (1981), Friedman and Tibshirani (1984), Ramsay (1988), Mammen (1991), Ramsay (1998), Mammen and Thomas-Agnan (1999), Hall and Huang (2001), Mammen, Marron, Turlach, and Wand (2001), and Dette, Neumeyer, and Pilz (2006). Importantly, under the mild assumption that the estimators consistently estimate the derivative of the regression function, the standard unconstrained nonparametric regression estimators are known to be monotone with probability approaching one when the regression function is strictly increasing. Therefore, such estimators have the same rate of convergence as the corresponding constrained estimators that impose monotonicity (Mammen (1991)). As a consequence, gains from imposing a monotonicity constraint can only be expected when the regression function is close to the boundary of the constraint and/or in finite samples. Zhang (2002) and Chatterjee, Guntuboyina, and Sen (2013) formalize this intuition by deriving risk bounds of the isotonic (monotone) regression estimators and showing that these bounds imply fast convergence rates when the regression function has flat parts. Our results are different from theirs because we focus on the endogenous case with and study the impact of monotonicity constraints on the ill-posedness property of the NPIV model which is absent in the standard regression problem.

Notation.

For a differentiable function , we use to denote its derivative. When a function has several arguments, we use with an index to denote the derivative of with respect to corresponding argument; for example, denotes the partial derivative of with respect to . For random variables and , we denote by , , and the joint, conditional and marginal densities of , given , and , respectively. Similarly, we let , , and refer to the corresponding cumulative distribution functions. For an operator , we let denote the operator norm defined as

Finally, by increasing and decreasing we mean that a function is non-decreasing and non-increasing, respectively.

Outline.

The remainder of the paper is organized as follows. In the next section, we analyze ill-posedness of the model (1) under our monotonicity conditions and derive a useful bound on a restricted measure of ill-posedness for the model (1). Section 3 discusses the implications of our monotonicity assumptions for estimation of the regression function . In particular, we show that the rate of convergence of our estimator is always not worse than that of unconstrained estimators but may be much faster in a large, but slowly shrinking, neighborhood of constant functions. Section 4 shows that our monotonicity conditions have non-trivial identification power. Section 5 provides new tests of our two monotonicity assumptions. In Section 6, we present results of a Monte Carlo simulation study that demonstrates large gains in performance of the constrained estimator relative to the unconstrained one. Finally, Section 7 applies the constrained estimator to the problem of estimating gasoline demand functions. All proofs are collected in the appendix.

2 Boundedness of the Measure of Ill-posedness under Monotonicity

In this section, we discuss the sense in which the ill-posedness of the NPIV model (1) is weakened by imposing our monotonicity conditions. In particular, we introduce a restricted measure of ill-posedness for this model (see equation (9)) and show that, in stark contrast to the existing literature, our measure is bounded (Corollary 1) when the monotone IV condition holds.

The NPIV model requires solving the equation for the function . Letting be the linear operator defined by and denoting , we can express this equation as

(3)

In finite-dimensional regressions, the operator corresponds to a finite-dimensional matrix whose singular values are typically assumed to be nonzero (rank condition). Therefore, the solution is continuous in , and consistent estimation of at a fast convergence rate leads to consistent estimation of at the same fast convergence rate. In infinite-dimensional models, however, is an operator that, under weak conditions, possesses infinitely many singular values that tend to zero. Therefore, small perturbations in may lead to large perturbations in . This discontinuity renders equation (3) ill-posed and introduces challenges in estimation of the NPIV model (1) that are not present in parametric regressions nor in nonparametric regressions with exogenous regressors; see Horowitz (2011, 2014) for a more detailed discussion.

In this section, we show that, under our monotonicity conditions, there exists a finite constant such that for any monotone function and any constant function , with and , we have

where is a truncated -norm defined below. This result plays a central role in our derivation of the upper bound on the restricted measure of ill-posedness, of identification bounds, and of fast convergence rates of a constrained NPIV estimator that imposes monotonicity of in a large but slowly shrinking neighborhood of constant functions.

We now introduce our assumptions. Let and be some constants. We implicitly assume that , , and are close to whereas , , and are close to . Our first assumption is the monotone IV condition that requires a monotone relationship between the endogenous regressor and the instrument .

Assumption 1 (Monotone IV).

For all ,

(4)

Furthermore, there exists a constant such that

(5)

and

(6)

Assumption 1 is crucial for our analysis. The first part, condition (4), requires first-order stochastic dominance of the conditional distribution of the endogenous regressor given the instrument as we increase the value of the instrument . This condition (4) is testable; see, for example, Lee, Linton, and Whang (2009). In Section 5 below, we extend the results of Lee, Linton, and Whang (2009) by providing an adaptive test of the first-order stochastic dominance condition (4).

The second and third parts of Assumption 1, conditions (5) and (6), strengthen the stochastic dominance condition (4) in the sense that the conditional distribution is required to “shift to the right” by a strictly positive amount at least between two values of the instrument, and , so that the instrument is not redundant. Conditions (5) and (6) are rather weak as they require such a shift only in some intervals and , respectively.

Condition (4) can be equivalently stated in terms of monotonicity with respect to the instrument of the reduced form first stage function. Indeed, by the Skorohod representation, it is always possible to construct a random variable distributed uniformly on such that is independent of , and equation holds for the reduced form first stage function . Therefore, condition (4) is equivalent to the assumption that the function is increasing for all . Notice, however, that our condition (4) allows for general unobserved heterogeneity of dimension larger than one, for instance as in Example 2 below.

Condition (4) is related to a corresponding condition in Kasy (2014) who assumes that the (structural) first stage has the form where , representing (potentially multidimensional) unobserved heterogeneity, is independent of , and the function is increasing for all values . Kasy employs his condition for identification of (nonseparable) triangular systems with multidimensional unobserved heterogeneity whereas we use our condition (4) to derive a useful bound on the restricted measure of ill-posedness and to obtain a fast rate of convergence of a monotone NPIV estimator of in the (separable) model (1). Condition (4) is not related to the monotone IV assumption in the influential work by Manski and Pepper (2000) which requires the function to be increasing. Instead, we maintain the mean independence condition .

Assumption 2 (Density).

(i) The joint distribution of the pair is absolutely continuous with respect to the Lebesgue measure on with the density satisfying for some finite constant . (ii) There exists a constant such that for all and . (iii) There exists constants such that for all .

This is a mild regularity assumption. The first part of the assumption implies that the operator is compact. The second and the third parts of the assumption require the conditional distribution of given or and the marginal distribution of to be bounded away from zero over some intervals. Recall that we have and . We could simply set in the second part of the assumption but having and is required to allow for densities such as the normal, which, even after a transformation to the interval , may not yield a conditional density bounded away from zero; see Example 1 below. Therefore, we allow for the general case and . The restriction for all imposed in Assumption 2 is not actually required for the results in this section, but rather those of Section 3.

We now provide two examples of distributions of that satisfy Assumptions 1 and 2, and show two possible ways in which the instrument can shift the conditional distribution of given . Figure 1 displays the corresponding conditional distributions.

Example 1 (Normal density).

Let be jointly normal with mean zero, variance one, and correlation . Let denote the distribution function of a random variable. Define and . Since for some standard normal random variable that is independent of , we have

where is independent of . Therefore, the pair satisfies condition (4) of our monotone IV Assumption 1. Lemma 7 in the appendix verifies that the remaining conditions of Assumption 1 as well as Assumption 2 are also satisfied.

Example 2 (Two-dimensional unobserved heterogeneity).

Let , where are mutually independent, and . Since is positive, it is straightforward to see that the stochastic dominance condition (4) is satisfied. Lemma 8 in the appendix shows that the remaining conditions of Assumption 1 as well as Assumption 2 are also satisfied.

Figure 1 shows that, in Example 1, the conditional distribution at two different values of the instrument is shifted to the right at every value of , whereas, in Example 2, the conditional support of given changes with , but the positive shift in the cdf of occurs only for values of in a subinterval of .

Before stating our results in this section, we introduce some additional notation. Define the truncated -norm by

Also, let denote the set of all monotone functions in . Finally, define . Below is our first main result in this section.

Theorem 1 (Lower Bound on ).

Let Assumptions 1 and 2 be satisfied. Then there exists a finite constant depending only on such that

(7)

for any function .

To prove this theorem, we take a function with and show that is bounded away from zero. A key observation that allows us to establish this bound is that, under monotone IV Assumption 1, the function is monotone whenever is. Together with non-redundancy of the instrument implied by conditions (5) and (6) of Assumption 1, this allows us to show that and cannot both be close to zero so that is bounded from below by a strictly positive constant from the values of in the neighborhood of either or . By Assumption 2, must then also be bounded away from zero.

Theorem 1 has an important consequence. Indeed, consider the linear equation (3). By Assumption 2(i), the operator is compact, and so

(8)

Property (8) means that being small does not necessarily imply that is small and, therefore, the inverse of the operator , when it exists, cannot be continuous. Therefore, (3) is ill-posed in Hadamard’s sense444Well- and ill-posedness in Hadamard’s sense are defined as follows. Let be a continuous mapping between metric spaces and . Then, for and , the equation is called “well-posed” on in Hadamard’s sense (see Hadamard (1923)) if (i) is bijective and (ii) is continuous, so that for each there exists a unique satisfying , and, moreover, the solution is continous in “the data” . Otherwise, the equation is called “ill-posed” in Hadamard’s sense., if no other conditions are imposed. This is the main reason why standard NPIV estimators have (potentially very) slow rate of convergence. Theorem 1, on the other hand, implies that, under Assumptions 1 and 2, (8) is not possible if belongs to the set of monotone functions in for all and we replace the -norm in the numerator of the left-hand side of (8) by the truncated -norm , indicating that shape restrictions may be helpful to improve statistical properties of the NPIV estimators. Also, in Remark 1, we show that replacing the norm in the numerator is not a significant modification in the sense that for most ill-posed problems, and in particular for all severely ill-posed problems, (8) holds even if we replace -norm in the numerator of the left-hand side of (8) by the truncated -norm .

Next, we derive an implication of Theorem 1 for the (quantitative) measure of ill-posedness of the model (1). We first define the restricted measure of ill-posedness. For , let

be the space containing all functions in with lower derivative bounded from below by uniformly over the interval . Note that whenever and that is the set of increasing functions in . For continuously differentiable functions, belongs to if and only if . Further, define the restricted measure of ill-posedness:

(9)

As we discussed above, under our Assumptions 1 and 2, if we use the -norm instead of the truncated -norm in the numerator in (9). We show in Remark 1 below, that for many ill-posed and, in particular, for all severely ill-posed problems even with the truncated -norm as defined in (9). However, Theorem 1 implies that is bounded from above by and, by definition, is increasing in , i.e. for . It turns out that is bounded from above even for some positive values of :

Corollary 1 (Bound for the Restricted Measure of Ill-Posedness).

Let Assumptions 1 and 2 be satisfied. Then there exist constants and depending only on such that

(10)

for all .

This is our second main result in this section. It is exactly this corollary of Theorem 1 that allows us to obtain a fast convergence rate of our constrained NPIV estimator not only when the regression function is constant but, more generally, when belongs to a large but slowly shrinking neighborhood of constant functions.

Remark 1 (Ill-posedness is preserved by norm truncation).

Under Assumptions 1 and 2, the integral operator satisfies (8). Here we demonstrate that, in many cases, and in particular in all severely ill-posed cases, (8) continues to hold if we replace the -norm by the truncated -norm in the numerator of the left-hand side of (8), that is, there exists a sequence in such that

(11)

Indeed, under Assumptions 1 and 2, is compact, and so the spectral theorem implies that there exists a spectral decomposition of operator , , where is an orthonormal basis of and is a decreasing sequence of positive numbers such that as , and . Also, Lemma 6 in the appendix shows that if is an orthonormal basis in , then for any , for infinitely many , and so there exists a subsequence such that . Therefore, under a weak condition that as , using for all , we conclude that for the subsequence ,

leading to (11). Note also that the condition that as necessarily holds if there exists a constant such that for all large , that is, if the problem is severely ill-posed. Thus, under our Assumptions 1 and 2, the restriction in Theorem 1 that belongs to the space of monotone functions in plays a crucial role for the result (7) to hold. On the other hand, whether the result (7) can be obtained for all without imposing our monotone IV Assumption 1 appears to be an open (and interesting) question.

Remark 2 (Severe ill-posedness is preserved by norm truncation).

One might wonder whether our monotone IV Assumption 1 excludes all severely ill-posed problems, and whether the norm truncation significantly changes these problems. Here we show that there do exist severely ill-posed problems that satisfy our monotone IV Assumption 1, and also that severely ill-posed problems remain severely ill-posed even if we replace the -norm by the truncated -norm . Indeed, consider Example 1 above. Because, in this example, the pair is a transformation of the normal distribution, it is well known that the integral operator in this example has singular values decreasing exponentially fast. More specifically, the spectral decomposition of the operator satisfies for all and some . Hence,

Since as exponentially fast, this example leads to a severely ill-posed problem. Moreover, by Lemma 6, for any and ,

for infinitely many . Thus, replacing the norm by the truncated norm preserves the severe ill-posedness of the problem. However, it follows from Theorem 1 that uniformly over all , Therefore, in this example, as well as in all other severely ill-posed problems satisfying Assumptions 1 and 2, imposing monotonicity on the function significantly changes the properties of the ratio .

Remark 3 (Monotone IV Assumption does not imply control function approach).

Our monotone IV Assumption 1 does not imply the applicability of a control function approach to estimation of the function . Consider Example 2 above. In this example, the relationship between and has a two-dimensional vector of unobserved heterogeneity. Therefore, by Proposition 4 of Kasy (2011), there does not exist any control function such that (i) is invertible in its second argument, and (ii) is independent of conditional on . As a consequence, our monotone IV Assumption 1 does not imply any of the existing control function conditions such as those in Newey, Powell, and Vella (1999) and Imbens and Newey (2009), for example.555It is easy to show that the existence of a control function does not imply our monotone IV condition either, so our and the control function approach rely on conditions that are non-nested. Since multidimensional unobserved heterogeneity is common in economic applications (see Imbens (2007) and Kasy (2014)), we view our approach to avoiding ill-posedness as complementary to the control function approach.

Remark 4 (On the role of norm truncation).

Let us also briefly comment on the role of the truncated norm in (7). There are two reasons why we need the truncated -norm rather than the usual -norm . First, Lemma 2 in the appendix shows that, under Assumptions 1 and 2, there exists a constant such that

for any increasing and continuously differentiable function . This result does not require any truncation of the norms and implies boundedness of a measure of illposedness defined in terms of -norms: . To extend this result to -norms we need to introduce a positive, but arbitrarily small, amount of truncation at the boundaries, so that we have a control for some constant and all monotone functions . Second, we want to allow for the normal density as in Example 1, which violates condition (ii) of Assumption 2 if we set .

Remark 5 (Bounds on the measure of ill-posedness via compactness).

Another approach to obtain a result like (7) would be to employ compactness arguments. For example, let be some (potentially large) constant and consider the class of functions consisting of all functions in such that . It is well known that the set is compact under the -norm , and so, as long as is invertible, there exists some such that for all since (i) is continuous and (ii) any continuous function achieves its minimum on a compact set. This bound does not require the monotone IV assumption and also does not require replacing the -norm by the truncated -norm. Further, defining for all and using the same arguments as those in the proof of Corollary 1, one can show that there exist some finite constants such that for all . This (seemingly interesting) result, however, is not useful for bounding the estimation error of an estimator of because, as the proof of Theorem 2 in the next section reveals, obtaining meaningful bounds would require a result of the form for all for some sequence such that , even if we know that and we impose this constraint on the estimator of . In contrast, our arguments in Theorem 1, being fundamentally different, do lead to meaningful bounds on the estimation error of the constrained estimator of .

3 Non-asymptotic Risk Bounds Under Monotonicity

The rate at which unconstrained NPIV estimators converge to depends crucially on the so-called sieve measure of ill-posedness, which, unlike , does not measure ill-posedness over the space , but rather over the space , a finite-dimensional (sieve) approximation to . In particular, the convergence rate is slower the faster the sieve measure of ill-posedness grows with the dimensionality of the sieve space . The convergence rates can be as slow as logarithmic in the severely ill-posed case. Since by Corollary 1, our monotonicity assumptions imply boundedness of for some range of finite values , we expect these assumptions to translate into favorable performance of a constrained estimator that imposes monotonicity of . This intuition is confirmed by the novel non-asymptotic error bounds we derive in this section (Theorem 2).

Let , , be an i.i.d. sample from the distribution of . To define our estimator, we first introduce some notation. Let and be two orthonormal bases in . For and , denote

Let and . Similarly, stack all observations on in . Let be a sequence of finite-dimensional spaces defined by

which become dense in as . Throughout the paper, we assume that where is a large but finite constant known by the researcher. We define two estimators of : the unconstrained estimator with

(12)

which is similar to the estimator defined in Horowitz (2012) and a special case of the estimator considered in Blundell, Chen, and Kristensen (2007), and the constrained estimator with

(13)

which imposes the monotonicity of through the constraint .

To study properties of the two estimators we introduce a finite-dimensional, or sieve, counterpart of the restricted measure of ill-posedness defined in (9) and also recall the definition of the (unrestricted) sieve measure of ill-posedness. Specifically, define the restricted and unrestricted sieve measures of ill-posedness and as

The sieve measure of ill-posedness defined in Blundell, Chen, and Kristensen (2007) and also used, for example, in Horowitz (2012) is . Blundell, Chen, and Kristensen (2007) show that is related to the singular values of .666In fact, Blundell, Chen, and Kristensen (2007) talk about the eigenvalues of , where is the adjoint of but there is a one-to-one relationship between eigenvalues of and singular values of . If the singular values converge to zero at the rate as , then, under certain conditions, diverges at a polynomial rate, that is . This case is typically referred to as “mildly ill-posed”. On the other hand, when the singular values decrease at a fast exponential rate, then , for some constant . This case is typically referred to as “severely ill-posed”.

Our restricted sieve measure of ill-posedness is smaller than the unrestricted sieve measure of ill-posedness because we replace the -norm in the numerator by the truncated -norm and the space by . As explained in Remark 1, replacing the -norm by the truncated -norm does not make a crucial difference but, as follows from Corollary 1, replacing by does. In particular, since for all by Corollary 1, we also have for all because . Thus, for all values of that are not too large, remains bounded uniformly over all , no matter how fast the singular values of converge to zero.

We now specify conditions that we need to derive non-asymptotic error bounds for the constrained estimator .

Assumption 3 (Monotone regression function).

The function is monotone increasing.

Assumption 4 (Moments).

For some constant , (i) and (ii) .

Assumption 5 (Relation between and ).

For some constant , .

Assumption 3, along with Assumption 1, is our main monotonicity condition. Assumption 4 is a mild moment condition. Assumption 5 requires that the dimension of the vector is not much larger than the dimension of the vector . Let be some constant.

Assumption 6 (Approximation of ).

There exist and a constant such that the function , defined for all , satisfies (i) , (ii) , and (iii) .

The first part of this condition requires the approximating function to be increasing. The second part requires a particular bound on the approximation error in the -norm. De Vore (1977a, b) show that the assumption holds when the approximating basis consists of polynomial or spline functions and belongs to a Hölder class with smoothness level . Therefore, approximation by monotone functions is similar to approximation by all functions. The third part of this condition is similar to Assumption 6 in Blundell, Chen, and Kristensen (2007).

Assumption 7 (Approximation of ).

There exist and a constant such that the function , defined for all , satisfies .

This condition is similar to Assumption 3(iii) in Horowitz (2012). Also, define the operator by

where .

Assumption 8 (Operator ).

(i) The operator is injective and (ii) for some constant , for all .

This condition is similar to Assumption 5 in Horowitz (2012). Finally, let

We start our analysis in this section with a simple observation that, if the function is strictly increasing and the sample size is sufficiently large, then the constrained estimator coincides with the unconstrained estimator , and the two estimators share the same rate of convergence.

Lemma 1 (Asymptotic equivalence of constrained and unconstrained estimators).

Let Assumptions 1-8 be satisfied. In addition, assume that is continuously differentiable and for all and some constant . If , , and as , then

(14)

The result in Lemma 1 is similar to that in Theorem 1 of Mammen (1991), which shows equivalence (in the sense of (14)) of the constrained and unconstrained estimators of conditional mean functions. Lemma 1 implies that imposing monotonicity of cannot lead to improvements in the rate of convergence of the estimator if is strictly increasing. However, the result in Lemma 1 is asymptotic and only applies to the interior of the monotonicity constraint. It does not rule out faster convergence rates on or near the boundary of the monotonicity constraint nor does it rule out significant performance gains in finite samples. In fact, our Monte Carlo simulation study in Section 6 shows significant finite-sample performance improvements from imposing monotonicity even if is strictly increasing and relatively far from the boundary of the constraint. Therefore, we next derive a non-asymptotic estimation error bound for the constrained estimator and study the impact of the monotonicity constraint on this bound.

Theorem 2 (Non-asymptotic error bound for the constrained estimator).

Let Assumptions 1-8 be satisfied, and let be some constant. Assume that for sufficiently small . Then with probability at least , we have

(15)

and