Confidence Intervals forProjections of Partially Identified Parameters^{†}^{†}thanks: We are grateful to Elie Tamer and three anonymous reviewers for very useful suggestions that substantially improved the paper. We thank for their comments Ivan Canay and seminar and conference participants at Bonn, BC/BU joint workshop, Brown, Cambridge, Chicago, Columbia, Cornell, CREST, Kobe, Maryland, Michigan, Michigan State, NYU, Penn State, Royal Holloway, Syracuse, Toronto, UCLA, UCSD, UPenn, Vanderbilt, Vienna, Yale, Wisconsin, CEME, ESNAWM 2015, Frontiers of Theoretical Econometrics Conference, ESWorld Congress 2015, ESAsia Meeting 2016, KEAKAEA International Conference, Verein für Socialpolitik Ausschuss für Ökonometrie, and ESESM 2017. We are grateful to Zhonghao Fu, Debi Mohapatra, Sida Peng, Talal Rahim, Matthew Thirkettle, and Yi Zhang for excellent research assistance. A MATLAB package implementing the method proposed in this paper, Kaido, Molinari, Stoye, and Thirkettle (2017), is available at https://molinari.economics.cornell.edu/programs/KMSportable_V3.zip. We are especially grateful to Matthew Thirkettle for his contributions to this package. Finally, we gratefully acknowledge financial support through NSF grants SES1230071 (Kaido), SES0922330 (Molinari), and SES1260980 (Stoye).
Abstract
We propose a bootstrapbased calibrated projection procedure to build confidence intervals for single components and for smooth functions of a partially identified parameter vector in moment (in)equality models. The method controls asymptotic coverage uniformly over a large class of data generating processes.
The extreme points of the calibrated projection confidence interval are obtained by extremizing the value of the component (or function) of interest subject to a proper relaxation of studentized sample analogs of the moment (in)equality conditions. The degree of relaxation, or critical level, is calibrated so that the component (or function) of , not itself, is uniformly asymptotically covered with prespecified probability. This calibration is based on repeatedly checking feasibility of linear programming problems, rendering it computationally attractive.
Nonetheless, the program defining an extreme point of the confidence interval is generally nonlinear and potentially intricate. We provide an algorithm, based on the response surface method for global optimization, that approximates the solution rapidly and accurately. The algorithm is of independent interest for inference on optimal values of stochastic nonlinear programs. We establish its convergence under conditions satisfied by canonical examples in the moment (in)equalities literature.
Our assumptions and those used in the leading alternative approach (a profiling based method) are not nested. An extensive Monte Carlo analysis confirms the accuracy of the solution algorithm and the good statistical as well as computational performance of calibrated projection, including in comparison to other methods.
Keywords: Partial identification; Inference on projections; Moment inequalities; Uniform inference.
.tocmtchapter \etocsettagdepthmtchaptersubsection \etocsettagdepthmtappendixnone
1 Introduction
This paper provides theoretically and computationally attractive confidence intervals for projections and smooth functions of a parameter vector , , that is partially or point identified through a finite number of moment (in)equalities. The values of that satisfy these (in)equalities constitute the identification region .
Until recently, the rich literature on inference in this class of models focused on confidence sets for the entire vector , usually obtained by test inversion as
(1.1) 
where is a test statistic that aggregates violations of the sample analog of the moment (in)equalities, and is a critical value that controls asymptotic coverage, often uniformly over a large class of data generating processes (DGPs). In point identified moment equality models, this would be akin to building confidence ellipsoids for by inversion of the test statistic proposed by Anderson and Rubin (1949).
However, applied researchers are frequently primarily interested in a specific component (or function) of , e.g., the returns to education. Even if not, they may simply want to report separate confidence intervals for components of a vector, as is standard practice in other contexts. Thus, consider the projection , where is a known unit vector. To date, it has been common to report as confidence interval for the projection of :
(1.2) 
where denotes sample size; see for example Ciliberto and Tamer (2009), Grieco (2014) and Dickstein and Morales (2016). Such projection is asymptotically valid, but typically yields conservative and therefore needlessly large confidence intervals. The potential severity of this effect is easily appreciated in a point identified example. Given a consistent estimator with limiting covariance matrix equal to the identity matrix, a 95% confidence interval for is obtained as , . In contrast, if , then projection of a 95% Wald confidence ellipsoid yields with true coverage of essentially . We refer to this problem as projection conservatism.
Our first contribution is to provide a bootstrapbased calibrated projection method that largely anticipates and corrects for projection conservatism. For each candidate , is calibrated so that across bootstrap repetitions the projection of is covered with at least some prespecified probability. Computationally, this bootstrap is relatively attractive because we linearize all constraints around , so that coverage of corresponds to the projection of a stochastic linear constraint set covering zero. We then propose the confidence interval
(1.3) 
We prove that asymptotically covers with probability at least uniformly over a large class of DGPs and that it is weakly shorter than (1.2) for each .^{1}^{1}1 This comparison is based on projection of the confidence set of Andrews and Soares (2010) and holds the choice of tuning parameters and criterion function in (1.2) and (1.3) constant across methods. We also provide simple conditions under which it is asymptotically strictly shorter.
Our second contribution is a general method to accurately and rapidly compute projectionbased confidence intervals. These can be our calibrated projection confidence intervals, but they can also correspond to projection of many other methods for inference on either or its identified set . Examples include Chernozhukov, Hong, and Tamer (2007), Andrews and Soares (2010), or (for conditional moment inequalities) Andrews and Shi (2013). Projectionbased inference extends well beyond its application in partial identification, hence our computational method proves useful more broadly. For example, Freyberger and Reeves (2017a, b, Section S.3) use it to construct uniform confidence bands for an unknown function of interest under (nonparametric) shape restrictions.
We propose an algorithm that is based on the response surface method, thus it resembles an expected improvement algorithm (see e.g. Jones, 2001; Jones, Schonlau, and Welch, 1998, and references therein). Bull (2011) established convergence of the expected improvement algorithm for unconstrained optimization problems where the objective is a “black box” function. Building on his results, we show convergence of our algorithm for constrained optimization problems in which the constraint functions are “black box” functions, assuming that they are sufficiently smooth. We then verify this smoothness condition for canonical examples in the moment (in)equalities literature. Our extensive Monte Carlo experiments confirm that the algorithm is fast and accurate.^{2}^{2}2Freyberger and Reeves (2017b, Section S.3) similarly find our method to be accurate and to considerably reduce computational time.
Previous implementations of projectionbased inference were based on approximating the set by searching for vectors such that (using, e.g., gridsearch or simulated annealing with no cooling), and reporting the smallest and largest value of among parameter values that were “guessed and verified” to belong to . This becomes computationally cumbersome as increases because it typically requires a number of evaluation points that grows exponentially with . In contrast, our method typically requires a number of evaluation points that grows linearly with .
The main alternative inference prodedure for projections was introduced in Romano and Shaikh (2008) and significantly advanced in Bugni, Canay, and Shi (2017, BCS henceforth). It is based on profiling out a test statistic. The classes of DGPs for which our procedure and the profilingbased method of BCS (BCSprofiling henceforth) can be shown to be uniformly valid are nonnested. We show that in well behaved cases, calibrated projection and BCSprofiling are asymptotically equivalent. We also provide conditions under which calibrated projection has lower probability of false coverage, thereby establishing that the two methods’ power properties are nonranked. Computationally, calibrated projection has the advantage that the bootstrap iterates over linear as opposed to nonlinear programming problems. While the “outer” optimization problems in (1.3) are potentially intricate, our algorithm is geared toward them. Our Monte Carlo simulations suggest that these two factors give calibrated projection a considerable computational edge over BCSprofiling, with an average speed gain of about 78times.
In an influential paper, Pakes, Porter, Ho, and Ishii (2011) also use linearization but, subject to this approximation, directly bootstrap the sample projection.^{3}^{3}3The published version, i.e. Pakes, Porter, Ho, and Ishii (2015), does not contain the inference part. This is valid only under stringent conditions, and we show that calibrated projection can be much simplified under those conditions. Other related papers that explicitly consider inference on projections include Andrews, Berry, and Jia (2004), Beresteanu and Molinari (2008), Bontemps, Magnac, and Maurin (2012), Chen, Tamer, and Torgovitsky (2011), Kaido (2016), Kitagawa (2012), Kline and Tamer (2015), and Wan (2013). However, some are Bayesian, as opposed to our frequentist approach, and none of them establish uniform validity of confidence sets. Chen, Christensen, and Tamer (2017) establish uniform validity of MCMCbased confidence intervals for projections, but these are aimed at covering the entire set , whereas we aim at covering the projection of . Finally, Gafarov, Meier, and MontielOlea (2016) have used our insight in the context of set identified spatial VARs.
Structure of the paper. Section 2 sets up notation and describes our approach in detail. Section 3 describes computational implementation of the method and establishes convergence of our proposed algorithm. Section 4 lays out our assumptions and, under these assumptions, establishes uniform validity of calibrated projection for inference on projections and smooth functions of . It also shows that more stringent conditions allow for several simplifications to the method, including that it can suffice to evaluate at only two values of and that one can dispense with a tuning parameter. The section closes with a formal comparison of calibrated projection and BCSprofiling. Section 5 reports the results of Monte Carlo simulations. Section 6 draws conclusions. The proof of convergence of our algorithm is in Appendix A. All other proofs, background material for our algorithm, and additional results are in the Online Appendix.^{4}^{4}4Section B provides convergencerelated results and background material for our algorithm and describes how to compute . Section C verifies, for a number of canonical moment (in)equality models, the assumptions that we invoke to show validity of our inference procedure and for our algorithm. Section D contains proofs of the Theorems in this paper’s Section 4. Section E collects Lemmas supporting the preceding proofs. Section F provides further comparisons with the profiling method of Bugni, Canay, and Shi (2017), including an example where calibrated projection has higher power in finite sample. Section G provides comparisons with “uncalibrated” projection of the confidence region in Andrews and Soares (2010), including simple conditions under which is asymptotically strictly shorter than .
2 Detailed Explanation of the Method
Let be a random vector with distribution , let denote the parameter space, and let for denote measurable functions characterizing the model, known up to parameter vector . The true parameter value is assumed to satisfy the moment inequality and equality restrictions
(2.1)  
(2.2) 
The identification region is the set of parameter values in satisfying (2.1)(2.2). For a random sample of observations drawn from , we write
(2.3)  
(2.4) 
for the sample moments and the analog estimators of the population moment functions’ standard deviations .^{5}^{5}5Under Assumption 4.3b, in equation (2.5) instead of we use the estimator specified in (E.188) in Lemma E.10 p.E.188 of the Online Appendix for (with defined in the assumption). In equation (3.2) we use for all . To ease notation, we distinguish the two only where needed.
The confidence interval in (1.3) then becomes , where
(2.5) 
and similarly for . Here, we define moments, where for . That is, we split moment equality constraints into two opposing inequality constraints and relax them separately.^{6}^{6}6For a simple analogy, consider the point identified model defined by the single moment equality , where is a scalar. In this case, . The upper endpoint of the confidence interval can be written as , with , and similarly for the lower endpoint.
For a class of DGPs that we specify below, define the asymptotic size of by
(2.6) 
Our goal is to calibrate so that (2.6) is at least equal to a prespecified level while anticipating projection conservatism. To build intuition, fix s.t. . The projection of is covered when
(2.7) 
where the second equivalence follows from substituting and taking to be the choice parameter. (Intuitively, we localize around at rate .)
We control asymptotic size by finding such that 0 asymptotically lies within the optimal values of the NLPs in (2.7) with probability . To reduce computational burden, we approximate the event in equation (2.7) through linear expansion in of the constraint set. To each constraint , we add and subtract and apply the mean value theorem to obtain
(2.8) 
Here is a normalized empirical process indexed by , is the gradient of the normalized moment, is the studentized population moment, and the mean value lies componentwise between and .^{7}^{7}7The mean value changes with but we omit the dependence to ease notation.
Calibration of requires careful analysis of the local behavior of the moment restrictions at each point in the identification region. This is because the extent of projection conservatism depends on (i) the asymptotic behavior of the sample moments entering the inequality restrictions, which can change discontinuously depending on whether they bind at () or not, and (ii) the local geometry of the identification region at , i.e. the shape of the constraint set formed by the moment restrictions, and its relation to the level set of the objective function . Features (i) and (ii) can be quite different at different points in , making uniform inference for the projection challenging. In particular, (ii) does not arise if one only considers inference for the entire parameter vector, and hence is a new challenge requiring new methods.^{8}^{8}8This is perhaps best expressed in the testing framework: Inference for projections entails a null hypothesis specifying the value of a single component (or a function) of . The components not under test become additional nuisance parameters, and dealing with them presents challenges that one does not face when testing hypotheses that specify the value of the entire vector . This is where this paper’s core theoretical innovation lies.
An important component of this innovation is to add to (2.7) the constraint that , where and a tuning parameter. This is slightly conservative but regularizes the effect of the local geometry of at on the inference problem. See Section 4.3 for further discussion. We show that the probability of the event in (2.7), with restricted to be in , is asymptotically approximated by the probability that lies between the optimal values of two programs that are linear in . The constraint sets of these programs are characterized by (i) a Gaussian process evaluated at (that we can approximate through a simple nonparametric bootstrap), (ii) a gradient (that we can uniformly consistently estimate^{9}^{9}9See Online Appendix C for proposal of such estimators in some canonical moment (in)equality examples. on compact sets), and (iii) the parameter that measures the extent to which each moment inequality is binding (that we can conservatively estimate using insights from Andrews and Soares (2010)). This suggests a computationally attractive bootstrap procedure based on linear programs.
3 Computing Calibrated Projection Confidence Intervals
3.1 Computing the Critical Level
For a given , we calibrate through a bootstrap procedure that iterates over linear programs.^{10}^{10}10If is defined through smooth convex (in)equalities, these can be linearized too. Define
(3.1) 
where is a bootstrap analog of ,^{11}^{11}11Bugni, Canay, and Shi (2017) approximate the stochastic process using with i.i.d. This approximation is equally valid in our approach, and can be computationally faster as it avoids repeated evaluation of across bootstrap replications. is a consistent estimator of , is a constant chosen by the researcher (see Section 4.3), , and is defined by
(3.2) 
where is a userspecified thresholding sequence such that , is one of the generalized moment selection (GMS) functions proposed by Andrews and Soares (2010), and . A common choice of is given componentwise by
(3.3) 
Restrictions on and the rate at which diverges are imposed in Assumption 4.2.
Remark 3.1:
For concreteness, in (3.3) we write out the “hard thresholding” GMS function. As we establish below, our results apply to all but one of the GMS functions in Andrews and Soares (2010).^{12}^{12}12These are in Andrews and Soares (2010), all of which depend on . We do not consider GMS function in Andrews and Soares (2010), which depends also on the covariance matrix of the moment functions.
Heuristically, the random convex polyhedral set in (3.1) is a local (to ) linearized bootstrap approximation to the random constraint set in (2.7). To see this, note that the bootstrapped empirical process and the estimator of the gradient approximate the first two terms in the constraint in (2.7) as linearized in (2.8). Next, for , the GMS function conservatively approximates the local slackness parameter . This is needed because the scaling of precludes consistent estimation. The problem is resolved by shrinking estimated intercepts toward zero, thereby tightening constraints and hence increasing . As with other uses of GMS, the resulting conservative distortion vanishes pointwise but not uniformly. Finally, restricting to the “box” has a strong regularizing effect: It ensures uniform validity in challenging situations, including several that are assumed away in most of the literature. We discuss this point in more detail in Section 4.3.
The critical level to be used in (1.3) is the smallest value of that makes the bootstrap probability of the event
(3.4) 
at least . Because is convex, we have
so that we can equivalently define
(3.5) 
where denotes the law of the random set induced by the bootstrap sampling process, i.e. by the distribution of , conditional on the data. Importantly, can be assessed by repeatedly checking feasibility of a linear program.^{13}^{13}13We implement a program in for simplicity but, because defines a linear subspace, one could reduce this to . We describe in detail in Online Appendix B.4 how we compute through a rootfinding algorithm.
3.2 Computation of the Outer Maximization Problem
Projection based methods as in (1.2) and (1.3) have nonlinear constraints involving a critical value which in general is an unknown function of . Moreover, in all methods, including ours and Andrews and Soares (2010), the gradients of the critical values with respect to are not available in closed form. When the dimension of the parameter vector is large, directly solving optimization problems with such constraints can be expensive even if evaluating the critical value at each is cheap.
To mitigate this issue, we provide an algorithm that is a contribution to the moment (in)equalities literature in its own right and that can be helpful for implementing other approaches.^{14}^{14}14This algorithm is based on the response surface method used in the optimization literature; see Jones (2001), Jones, Schonlau, and Welch (1998), and references therein. We apply it to constrained optimization problems of the following form:
(3.6) 
where is an optimal solution of the problem, are known functions, and is a function that requires a higher computational cost. In our context, and, for calibrated projection, . Conditional on the data , these functions are considered deterministic. A key feature of the problem is that the function is relatively costly to evaluate.^{15}^{15}15Here we assume that computing the sample moments is less expensive than computing the critical value. When computation of moments is also very expensive, our proposed algorithm can be used to approximate these too. Our algorithm evaluates on finitely many values of . For other values, it imposes a probabilistic model that gets updated as specific values are computed and that is used to determine the next evaluation point. Under reasonable conditions, the resulting sequence of approximate optimal values converges to .
Specifically, after drawing an initial set of evaluation points that grows linearly with the dimensionality of parameter space, the algorithm has three steps called E, A, and M below.
 Initializationstep:

Draw randomly (uniformly) over a set of initial evaluation points. We suggest setting .
 Estep: (Evaluation)

Evaluate for , where . Set . The current estimate of the optimal value can be computed using
(3.7) where is the set of feasible evaluation points.
 Astep: (Approximation)

Approximate by a flexible auxiliary model. We use a Gaussianprocess regression model (or kriging), which for a meanzero Gaussian process indexed by and with constant variance specifies
(3.8) (3.9) where is a kernel with parameter vector , e.g. . The unknown parameters can be estimated by running a GLS regression of on a constant with the given correlation matrix. The unknown parameters can be estimated by a (concentrated) MLE.
The (best linear) predictor of the critical value and its gradient at an arbitrary point are then given by
(3.10) (3.11) where is a vector whose th component is as given above with estimated parameters, , and is an by matrix whose entry is with estimated parameters. This approximating (or surrogate) model has the property that its predictor satisfies . Hence, it provides an analytical interpolation to the evaluated critical values together with an analytical gradient.^{16}^{16}16See details in Jones, Schonlau, and Welch (1998). We use the DACE Matlab kriging toolbox (http://www2.imm.dtu.dk/projects/dace/) for this step in our Monte Carlo experiments. Further, the amount of uncertainty left in (at an arbitrary point) is captured by the following variance:
(3.12)  Mstep: (Maximization):

With probability maximize the expected improvement function to obtain the next evaluation point, with:
(3.13) where . This step can be implemented by standard nonlinear optimization solvers, e.g. Matlab’s
fmincon
orKNITRO
(see Appendix B.3 for details). With probability , draw randomly from a uniform distribution over .
Once the next evaluation point is determined, one adds it to the set of evaluation points and iterates the EAM steps. This yields an increasing sequence of approximate optimal values . Once a convergence criterion is met, the value is reported as the end point of . We discuss convergence criteria in Section 5.
Remark 3.2:
The advantages of EAM are as follows. First, we control the number of points at which we evaluate the critical value. Since the evaluation of the critical value is the relatively expensive step, controlling the number of evaluations is important. One should also note that the Estep with the initial evaluation points can easily be parallelized. For any additional Estep (i.e. ), one needs to evaluate only at a single point . The Mstep is crucial for reducing the number of additional evaluation points. To determine the next evaluation point, one needs to take into account the tradeoff between “exploitation” (i.e. the benefit of drawing a point at which the optimal value is high) and “exploration” (i.e. the benefit of drawing a point in a region in which the approximation error of is currently large). The expected improvement function in (3.13) quantifies this tradeoff, and draws a point only in an area where one can expect the largest improvement in the optimal value, yielding substantial computational savings.^{17}^{17}17It is also possible to draw multiple points in each iteration. See Schonlau, Welch, and Jones (1998).
Second, the proposed algorithm simplifies the Mstep by providing constraints and their gradients for program (3.13) in closed form. Availability of analytical gradients greatly aids fast and stable numerical optimization. The price is the additional approximation step. In the numerical exercises of Section 5, this price turns out to be low.
3.3 Convergence of the EAM Algorithm
We now provide formal conditions under which converges to the true end point of as .^{18}^{18}18We build on Bull (2011), who proves a convergence result for the algorithm proposed by Jones, Schonlau, and Welch (1998) applied to an unconstrained optimization problem in which the objective function is unknown outside the evaluation points. Our convergence result recognizes that the parameters of the Gaussian process prior in (3.8) are estimated for each iteration of the Astep using the “observations” , and hence change with . Because of this, a requirement for convergence is that is a sufficiently smooth function of . We show that a highlevel condition guaranteeing this level of smoothness ensures a general convergence result for the EAM algorithm. This is a novel contribution to the literature on response surface methods for constrained optimization.
In the statement of Theorem 3.1 below, is the reproducing kernel Hilbert space (RKHS) on determined by the kernel used to define the correlation functional in (3.9). The norm on this space is ; see Online Appendix B.2 for details. Also, the expectation is taken with respect to the law of determined by the Initializationstep and the Mstep, holding the sample fixed. See Appendix A for a precise definition of and a proof of the theorem.
Theorem 3.1:
Suppose is a compact hyperrectangle with nonempty interior and that . Let the evaluation points be drawn according to the Initialization and the M steps. Let in (3.9) be a Matérn kernel with index and . Let satisfy for some , where . Then
(3.14) 
Remark 3.3:
The requirement that is a compact hyperrectangle with nonempty interior can be replaced by a requirement that belongs to the interior of a closed hyperrectangle in such that satisfies the smoothness requirement in Theorem 3.1 on that rectangle.
In order to apply Theorem 3.1 to calibrated projection, we provide low level conditions (Assumption B.1 in Online Appendix B.1.1) under which the map uniformly stochastically satisfies a Lipschitztype condition. To get smoothness, we work with a mollified version of , denoted and provided in equation (B.1), with .^{19}^{19}19For a discussion of mollification, see e.g. Rockafellar and Wets (2005, Example 7.19) Theorem B.1 in the Online Appendix shows that and can be made uniformly arbitrarily close to each other and that yields valid inference in the sense of equation (2.6). In practice, one may therefore directly apply the EAM steps to .
Remark 3.4:
The key condition imposed in Theorem B.1 is Assumption B.1. It requires that the GMS function used is Lipschitz in its argument, and that the standardized moment functions are Lipschitz in . In Online Appendix C.1 we establish that the latter condition is satisfied by some canonical examples in the moment (in)equality literature, namely the mean with missing data, linear regression and best linear prediction with interval data (and discrete covariates), and entry games with multiple equilibria (and discrete covariates).^{20}^{20}20It can also be shown to hold in semiparametric binary regression models with discrete or interval valued covariates under the assumptions of Magnac and Maurin (2008).
4 Asymptotic Validity of Inference
4.1 Assumptions
We posit that , the distribution of the observed data, belongs to a class of distributions denoted by . We write stochastic order relations that hold uniformly over using the notations and ; see Online Appendix D.1 for the formal definitions. Below, , , , , , , denote generic constants which may be different in different appearances but cannot depend on . Given a square matrix , we write for its smallest eigenvalue.
Assumption 4.1:
(a) is a compact hyperrectangle with nonempty interior.
(b) All distributions satisfy the following:

and for some ;

are i.i.d.;

for for all ;

For some and and for all , .
Assumption 4.1(a) requires that is a hyperrectangle, but can be replaced with the assumption that is defined through a finite number of nonstochastic inequality constraints smooth in and such that is convex. Compactness is a standard assumption on for extremum estimation. We additionally require convexity as we use mean value expansions of in ; see (2.8). Assumption 4.1(b) defines our moment (in)equalities model. Assumption 4.2 constrains the GMS function and the rate at which its tuning parameter diverges. Both 4.1(b) and 4.2 are based on Andrews and Soares (2010) and are standard in the literature,^{21}^{21}21Continuity of for is restrictive only for GMS function in Andrews and Soares (2010). although typically with . The slower rate is satisfied for the popular choice, recommended by Andrews and Soares (2010), of .
Next, and unlike some other papers in the literature, we impose restrictions on the correlation matrix of the moment functions. These conditions can be easily verified in practice because they are implied when the correlation matrix of the moment equality functions and the moment inequality functions specified below have a determinant larger than a predefined constant for any .
Assumption 4.3:
All distributions satisfy one of the following two conditions for some constants :

Let . Denote
Then .

The functions are defined on . There exists , , and measurable functions , such that for each ,
(4.1) For each and any choice , denoting , where
one has
(4.2) Finally,
(4.3)
Assumption 4.3