Empirical likelihood approach to goodness of fit testing
Abstract
Motivated by applications to goodness of fit testing, the empirical likelihood approach is generalized to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. The latter is needed to deal with nuisance parameters. The proposed empirical likelihood based goodness of fit tests are asymptotically distribution free. For univariate observations, tests for a specified distribution, for a distribution of parametric form, and for a symmetric distribution are presented. For bivariate observations, tests for independence are developed.
0 \volume19 \issue3 2013 \firstpage954 \lastpage981 \doi10.3150/12BEJ440 \newremarkexampleExample \newremarkremarkRemark[section] \newremarkdetailsDetails
Goodness of Fit
1]\fnmsHanxiang \snmPeng\thanksref1label=e1]hpeng@math.iupui.edu and 2]\fnmsAnton \snmSchick\corref\thanksref2label=e2]anton@math.binghamton.edu
estimated constraint functions \kwdinfinitely many constraints \kwdnuisance parameter \kwdregression model \kwdtesting for a parametric model \kwdtesting for a specific distribution \kwdtesting for independence \kwdtesting for symmetry
1 Introduction
The empirical likelihood approach was introduced by Owen O88 (), O90 () to construct confidence intervals in a nonparametric setting, see also Owen O01 (). As a likelihood approach possessing nonparametric properties, it does not require us to specify a distribution for the data and often yields more efficient estimates of the parameters. It allows data to decide the shape of confidence regions and is Bartlett correctable (DiCiccio, Hall and Romano DH91 ()). The approach has been developed to various situations, for example, to generalized linear models (Kolaczyk K94 ()), local linear smoother (Chen and Qin CQ00 ()), partially linear models (Shi and Lau SL00 (), Wang and Jing WJ03 ()), parametric and semiparametric models in multiresponse regression (Chen and Van Keilegom CK09 ()), linear regression with censored data (Zhou and Li ZL08 ()), and plugin estimates of nuisance parameters in estimating equations in the context of survival analysis (Qin and Jing QJ01a (), Wang and Jing WJ01 (), Li and Wang LC01 ()). Algorithms, calibration and higherorder precision of the approach can be found in Hall and La Scala HS90 (), Emerson and Owen EO09 () and Liu and Chen LC10 () among others. It is especially convenient to incorporate side information expressed through equality constraints. Qin and Lawless QL94 () linked empirical likelihood with finitely many estimating equations. These estimating equations serve as finitely many equality constraints.
In semiparametric settings, information on the model can often be expressed by means of infinitely many constraints which may also depend on parameters of the model. In goodness of fit testing, the null hypothesis can typically be expressed by infinitely many such constraints. This is the case when testing for a fixed distribution (see Example 2 below), when testing for a given parametric model (Example 2), when testing for symmetry about a fixed point (Example 2), and when testing for independence (Example 2). Modeling conditional expectations can also be done by means of infinitely many constraints. This has applications to heteroscedastic regression models (Section 3) and to conditional moment restriction models treated by Tripathi and Kitamura TK03 () using a smoothed empirical likelihood approach.
Recently Hjort, McKeague and Van Keilegom HMV09 () extended the scope of the empirical method. In particular, they developed a general theory for constraints with nuisance parameters and considered the case with infinitely many constraints. Their results for infinitely many constraints, however, do not allow for nuisance parameters. In this paper we will fill this gap and in the process improve on their results. Let us now discuss some of our results in the following special case.
Let be independent copies of a random vector with distribution . Let be orthonormal elements of
Then the random variables have mean zero, variance one and are uncorrelated. Now consider the empirical likelihood based on the first of these functions,
where denotes the closed probability simplex in dimension . For fixed , it follows from Owen’s work that has asymptotically a chisquare distribution with degrees of freedom. In other words,
(1) 
where denotes the quantile of the chisquare distribution with degrees of freedom. Hjort et al. HMV09 () have shown that (1) holds under some additional assumptions even if tends to infinity with by proving the asymptotic normality result
(2) 
This result requires higher moment assumptions on the functions and restrictions on the rate at which can tend to infinity. For example, if the functions are uniformly bounded, then the rate suffices for (2). They also state in their Theorem 4.1, that if is finite for some , then suffices for (2). A gap in their argument was fixed by Peng and Schick PS12 (). We shall show that larger are allowed in some cases. In particular, for , it suffices that holds (instead of their ) and if , then is enough (instead of their ), see our Theorems 7.2 and 7.3 below.
Our rate for matches the rate given in Theorem 2 of Chen, Peng and Qin CPQ09 (). These authors obtain asymptotic normality for larger than in Hjort et al. HMV09 () by imposing additional structural assumptions. These assumptions, however, are typically not met in the applications we have in mind.
One of the key points in our proof is a simple condition for the convex hull of some vectors to have the origin as an interior point. Our condition is that the smallest eigenvalue of exceeds . Here, denotes the euclidean norm of a vector . This sufficient condition ties in nicely with the other requirements used to establish the asymptotic behavior of the empirical likelihood and is typically implied by these. For example, conditions (A1)–(A3) in Theorem 2.1 of Hjort et al. HMV09 () already imply their (A0). Thus, the conclusion of their theorem is valid under (A1)–(A3) only, see our Theorem 6.1.
Let us now look at the case when the functions are unknown. Then we can work with the empirical likelihood
where is an estimator of such that
(3) 
Now, we have the conclusion under the condition
(4) 
and mild additional conditions such as

[(ii)]

for some constant and all and , or

and .
Our results, however, go beyond this simple result. If (4) is replaced by
(5) 
with a measurable function into which is standardized under in the sense that and , the identity matrix, then the conclusion holds under (i) or (ii).
Our paper is organized as follows. In Section 2, we give four examples that motivate our research. The emphasis in these examples is on goodness of fit testing. The proposed empirical likelihood based goodness of fit tests are asymptotically distribution free. For univariate observations, tests for a specified distribution, for a distribution of parametric form, and for a symmetric distribution are presented. For bivariate observations, tests for independence are discussed. Another example is given in Section 3 with a small simulation study. This example considers tests for the regression parameters in simple linear heteroscedastic regression. The simulations compare our new procedure based on infinitely many constraints with the classical empirical likelihood procedure and illustrate improvements by the new procedures. In Section 4, we introduce notation and recall some results on the spectral norm of matrices. In Section 5, we derive a lemma that extracts the essence from the proofs of Owen (O01 (), Chapter 11) and also obtains the aforementioned sufficient condition for a convex hull of vectors to contain the origin as interior point. The results are derived for nonstochastic vectors and formulated as inequalities. The inequalities are used in Section 6 to obtain the behavior of the empirical likelihood with random vectors whose dimension may increase. The results are formulated abstractly and do not require independence. In Section 7, we specialize our results to the case of independent observations with infinitely many constraints, both known and unknown. We also briefly discuss the behavior under contiguous alternatives. The details for our examples are given in Section 8.
2 Motivating examples
In this section, we give examples that motivated the research in this paper.
[(Testing for a fixed distribution)] Let be independent copies of a random variable . Suppose we want to test whether their common distribution function equals a known continuous distribution function . Under the null hypothesis, we have for every , and has a uniform distribution. An orthonormal basis of is thus given by for any orthonormal basis of , where is the uniform distribution on . We shall work with the trigonometric basis defined by
(6) 
as these basis functions are uniformly bounded by . As test statistic, we take
which uses the first of the trigonometric functions. Under the null hypothesis, we have for every as both and tend to infinity and tends to zero. Thus, the test has asymptotic size . Here, we are still in the framework of Hjort et al. HMV09 () with infinitely many known constraints.
[(Testing for a parametric model)] Let be again independent and identically distributed random variables. But now suppose we want to test whether their common distribution function belongs to a model indexed by an open subset of . Suppose that the distribution functions have densities such that the map is continuously differentiable in with derivative and the matrix is invertible for each . In this case we set . Let now be an estimator of the parameter in the model. We require it to satisfy the stochastic expansion
(7) 
for each , where is the measure for which . Such estimators are efficient in the parametric model. Candidates are maximum likelihood estimators. As test statistic we take , the test statistic from the previous example with replaced by . Here, we are no longer in the framework of Hjort et al. HMV09 () as we now have infinitely many unknown constraints. We shall show that under the null hypothesis for every as both and tend to infinity and tends to zero. In view of this result, the test has asymptotic size . It is crucial for our result that we have chosen an estimator satisfying (7).
[(Testing for symmetry)] Let be independent copies of a random variable with a continuous distribution function . We want to test whether is symmetric about zero in the sense that for all real . Under the null hypothesis of symmetry, the random variables and are independent, and takes values and with probability one half. This is equivalent to for every , where is the distribution function of . Since is continuous, an orthonormal system of is given by where and are given in (6). This suggests the test statistic
where and is the empirical distribution function based on . We shall show that under symmetry one has for every as and tend to infinity and tends to zero. From this, we derive that the test has asymptotic size .
[(Testing for independence)] Let be independent copies of a bivariate random vector . We assume that the marginal distribution functions and are continuous. We want to test whether and are independent. Independence is equivalent to for all and and thus equivalent to for all positive integers and .
[(b)]
Assume first that and are known. This is for example the case in an actuarial setting where and denote residual lifetimes and their distribution functions are available from life tables. Motivated by the above, we take as test statistics
Under the null hypothesis, one has for every as and tend to infinity and tends to zero. Here, we are in the framework of Hjort, McKeague and Van Keilegom HMV09 (). The above shows that the test has asymptotic size .
Now assume that and are unknown. In this case, we replace both marginal distribution functions by their empirical distribution functions. The resulting test statistic is , where denotes the empirical distribution based on and the one based on . We shall show that under the null hypothesis for every as and tend to infinity and tends to zero. Thus the test has asymptotic size .
Suppose that form a simple linear homoscedastic regression model, , with and independent. We can use the test statistic from case (b) to test the hypothesis whether the slope parameter is zero. Indeed, is equivalent to the independence of and .
The asymptotic distributions of the above tests under contiguous alternatives are linked to noncentral chisquare distributions; see Remark 7 for details. As the noncentrality parameters are bounded, the local asymptotic power along such a contiguous alternative coincides with the level. Our tests are asymptotically equivalent to Neyman’s smooth tests N39 () with increasing dimensions. In view of the optimality results of Inglot and Ledwina IL96 (), for those tests under moderate deviations, we expect similar results for our tests. Of course, this needs to be explored more carefully.
3 Another example and simulations
Let be independent copies of , where , with , bounded and bounded away from zero, and . Assume that has a finite variance and a continuous distribution function . We are interested in testing whether the regression parameter equals some specific value . We could proceed as in Owen O91 () and use the test based on the empirical likelihood
But this empirical likelihood does not use all the information of the model. Here we have for every . Since is continuous (but unknown), we work with the empirical likelihood
where and is the empirical distribution function based on the covariate observations . It follows from Corollary 7.6 and Lemma 8.1 below that if . The resulting test is Both tests have asymptotic size .
We performed a small simulation study to compare the procedures. For our simulation, we chose and and took . We modeled the error as , with and independent of . As distributions for , we chose the exponential distribution with mean 5 ((5)) and the distribution with three degrees of freedom ((3)), while for we chose the standard normal distribution ((0, 1)) and the double exponential distribution with location 0 and scale 0.5 ((0, 0.5)).
0  2  3  4  5  0  2  3  4  5  

0.6  2.3  0.71  0.88  0.86  0.85  0.84  0.38  0.37  0.39  0.40  0.41  
0.8  1.5  0.68  0.82  0.84  0.83  0.83  0.95  0.99  0.99  0.99  0.99  
1.0  2.0  0.13  0.09  0.10  0.12  0.13  0.12  0.07  0.09  0.12  0.14  
1.2  2.2  0.37  0.42  0.43  0.43  0.44  0.51  0.54  0.52  0.50  0.52  
1.4  1.7  0.71  0.88  0.87  0.86  0.86  0.37  0.34  0.37  0.40  0.44  
0.6  2.3  0.89  0.98  0.99  0.98  0.98  0.61  0.64  0.68  0.71  0.74  
0.8  1.5  0.84  0.96  0.98  0.98  0.98  0.93  1.00  1.00  1.00  1.00  
1.0  2.0  0.14  0.10  0.14  0.17  0.21  0.13  0.10  0.11  0.14  0.17  
1.2  2.2  0.57  0.70  0.70  0.70  0.74  0.68  0.84  0.84  0.82  0.83  
1.4  1.7  0.89  0.99  0.99  0.99  0.99  0.62  0.67  0.72  0.73  0.76 
Table 1 reports simulated powers of the tests and (with several choices of ) and for some values of . The reported values are based on 1000 repetitions. The column labeled 0 corresponds to Owen’s test , while the columns labeled 2, 3, 4, 5 correspond to our tests with , respectively. Clearly our new test is more powerful than the traditional test. The values in the rows corresponding to the parameter values are the observed significance levels of the nominal significance level . Our new test overall has closer observed significance levels than the traditional one except for .
4 Notation
In this section, we introduce some of the notation we use throughout. We write for the euclidean norm and for the operator (or spectral) norm of a matrix which are defined by
In other words, the squared euclidean norm equals the sum of the eigenvalues of , while the squared operator norm equals the largest eigenvalue of . Consequently, the inequality holds. Thus, we have
for compatible vectors . We should also point out the identity
If is a nonnegative definite symmetric matrix, this simplifies to
Using this and the Cauchy–Schwarz inequality, we obtain
(8)  
(9) 
whenever is a measure and and are measurable functions into and such that and are finite. As a special case, we derive the inequality
and therefore
(10) 
with
for vectors of the same dimension.
5 A maximization problem
Let be dimensional vectors. Set ,
and let and denote the smallest and largest eigenvalue of the matrix ,
Using Lagrange multipliers, Owen O88 (), O01 () obtained the identity
if there exists a in such that , , and
(11) 
He also showed that such a vector exists and is unique if (i) the origin is an interior point of the convex hull of and (ii) the matrix is invertible. Let us now show that the inequality implies these two conditions. Indeed, the matrix is then positive definite and hence invertible as its smallest eigenvalue is positive. To show (i), we will rely on the following lemma.
Lemma 5.1
A random variable with and for some positive obeys the inequality
Fix in . By the properties of , we obtain and .
The origin is an interior point of the convex hull of if for every unit vector there is at least one such that . This latter condition is equivalent to
For a unit vector , we have and thus
It follows from the triangle inequality that for . Note that is positive if is positive definite. Thus, Lemma 5.1 yields the lower bound with
Thus, we have . This shows that the inequality implies and hence the desired condition (i).
Assume now that the inequality holds. We proceed as on page 220 of Owen O01 (). Let be a unit vector such that . Then we have the identity
and the inequality
Consequently, we find and obtain the bound
(12) 
From this, one immediately derives
(13)  
(14)  
(15) 
The identity and (14) yield
for vectors of the same dimension. Taking , we derive with the help of (11)
Using , the Cauchy–Schwarz inequality, (13) and (15) we bound the square of the first summand of the righthand side by
and the square of the second summand by
Combining the above, we obtain
(16) 
Using the inequality valid for , and then (14) we derive
With , we can write and , and obtain the identity . Using this and (16), we arrive at the bound
In view of (12) and , this becomes
(17) 
If we bound by and by and use (13), we obtain the bound
(18) 
Thus, we have proved the following result.
6 Applications with random vectors
We shall now discuss implications of Lemma 5.2 to the case when the vectors are replaced by random vectors. We are interested in the case when the dimension of the random vectors increases with .
Let be dimensional random vectors. With these random vectors we associate the empirical likelihood
To study the asymptotic behavior of , we introduce
and the matrix
and let and denote the smallest and largest eigenvalues of ,