Enhancements of nonparametric generalized likelihood ratio test: Bias-correction and dimension reduction

Enhancements of nonparametric generalized likelihood ratio test: Bias-correction and dimension reduction

Cuizhen Niu, Xu Guo and Lixing Zhu 111Corresponding author: Lixing Zhu, email: lzhu@hkbu.edu.hk. The research described herewith was supported by a grant from the University Grants Council of Hong Kong, China and also supported by the Outstanding Innovative Talents Cultivation Funded Programs 2014 of Renmin University of China.
School of Statistics, Renmin University of China, Beijing
Nanjing University of Aeronautics and Astronautics, Nanjing
Department of Mathematics, Hong Kong Baptist University, Hong Kong

Abstract: Nonparametric generalized likelihood ratio test is popularly used for model checking for regressions. However, there are two issues that may be the barriers for its powerfulness. First, the bias term in its liming null distribution causes the test not to well control type I error and thus Monte Carlo approximation for critical value determination is required. Second, it severely suffers from the curse of dimensionality due to the use of multivariate nonparametric function estimation. The purpose of this paper is thus two-fold: a bias-correction is suggested to this test and a dimension reduction-based model-adaptive enhancement is recommended to promote the power performance. The proposed test still possesses the Wilks phenomenon, and the test statistic can converge to its limit at a much faster rate and is much more sensitive to alternative models than the original nonparametric generalized likelihood ratio test as if the dimension of covariates were one. Simulation studies are conducted to evaluate the finite sample performance and to compare with other popularly used tests. A real data analysis is conducted for illustration.


Keywords: Dimension reduction; Model adaption; Nonparametric generalized likelihood ratio; Bias-correction; Wilks phenomenon.

1 Introduction

Parametric regression models, which take specifically parametric forms for the regression relationship between response and predictors, are commonly used and studied in practice. This is due to their well established theories, easy implementation and interpretation. However, when the dimension of the predictor vector is high, even moderate, it is difficult to correctly specify the regression form. People also concern whether a specifically parametric regression can fit the data adequately. Therefore, it is necessary to conduct model checking to determine a suitable regression model before any further statistical analysis. To avoid model mis-specification, nonparametric regression models are proposed and investigated. However, under this case, the nonparametric estimation is usually inaccurate, which is documented as ‘curse of dimensionality’.

Consider the following parametric single-index model

(1.1)

where is the response with the covariate , and are the parameter vectors of dimensions and , respectively and . Besides, is a known link function and the superscript denotes transposition. The nonparametric regression model takes the following form:

(1.2)

where is the unknown mean function and . There exist several proposals available in the literature to test model (1.1) against model (1.2). To the best of our knowledge, all of existing tests can usually be classified into two categories: local smoothing methods and global smoothing methods with their respective advantages and disadvantages. The former generally relies on nonparametric regression estimators and the latter class involves empirical processes. As commented in Guo et al. (2014), the former is more sensitive to high-frequency alternative models while the latter may be in favor of smooth alternatives. On the other hand, it is well known that the former severely suffers from curse of dimensionality since the typical convergence rate is only , which is very slow in high-dimensional situation. Due to the data sparseness in high-dimensional space, the power performance of the latter often drops down very significantly. Further, for these classical tests, compute-intensive re-sampling technique or Monte Carlo approximation are usually needed to help critical value determination or -value computation.

Varieties of methods can be included into the first class. Among others, the quadratic conditional moment-based test was proposed by Zheng (1996), the minimum distance test developed by Koul and Ni (2004) and the distribution distance test introduced by Van Keilegom et al (2008). See also Härdle and Mammen (1993), Dette (1999) and Zhang and Dette (2004). Turn to the latter category. A test that is based on residual-marked empirical process was presented by Stute (1997). An innovation approach for the above empirical process was suggested by Stute et. al (1998b). A relevant reference for generalized linear models is Stute and Zhu (2002). Koul and Stute (1999) studied a class of tests for an autoregressive model, which are based on empirical processes marked by certain residuals. Stute et. al (1998a) recommended the wild bootstrap to approximate the limiting null distribution of the empirical process-based test statistic. For a comprehensive review, see Gonzlez-Manteiga and Crujeiras (2013).

Inspired by the classical likelihood ratio test, Fan et al. (2001) introduced a nonparametric generalized likelihood ratio (NGLR) test for the above hypothetical model in (1.1) versus the alternative model in (1.2) . A significant property of NGLR is that the limiting null distribution does not depend on nuisance functions, exhibiting what is known as Wilks phenomenon. This test has been applied in many different situations such as the varying-coefficient partially linear models (Fan and Huang, 2005) and partially linear additive models (Jiang et. al, 2007). For more details, see Fan and Jiang (2007). However, similar to other local smoothing tests, NGLR also suffers from the curse of dimensionality because of the inevitable use of multivariate nonparametric function estimation. To be precise, under the null hypothesis, the NGLR test statistic converges to its limit at the rate of order and can only detect alternatives distinct from the null hypothesis at the rate of order . Further, its limit has a bias term and thus, as Fan and Li (1997) and the later research work in this field pointed out, the bias term causes the great difficulty of controlling type I error. Also, as Dette and von Lieres und Wilkau (2001) pointed out, for finite sample sizes, the bias converging to infinity has to be taken into account. See also Zhang and Dette (2004). To determine critical values, resampling/Monte Carlo approximation such as the bootstrap method is required even the Wilk’s phenomenon holds. But the bootstrap approximation is computationally intensive and even though type I error can not well be controlled when the dimension of predictors is high. We will see this in the simulation section below. Second, when the dimension is even moderate the power can be very low. In the simulation section, we will show that the power drops down significantly with increasing the dimension. Obviously, it is desirable to have a test that can well control type I error without the assistance of resampling approximation and more importantly, can handle high-dimensional data with acceptable performance.

Note that for the parametric single-index model (1.1), the regression relationship of the response depends on the predictors only through the one-dimensional index . In other words, can capture all the regression information on the response. Thus, the model (1.1) can be viewed as a model with a dimension-reduction structure. Motivated by this feature, Stute and Zhu (2002) proposed a test that is fully based on for the model (1.1) as follows:

where and are, under the null hypothesis, two respective root- consistent estimates of and , and is the indicator function. The above test statistic can greatly avoid the dimensionality. But it is obviously a directional test that cannot detect general alternatives. Guo et al. (2015) discussed this problem in details and proposed a test that has the nature of dimension reduction and still is omnibus.

In this paper, we try to simultaneously solve these two problems from which the original NGLR suffers by reducing both the bias dimensionality. To this end, we will propose a bias-correction based version of NGLR such that the bias can asymptotically be removed and combine the idea in Guo et al. (2015) with NGLR to circumvent the curse of dimensionality. The details are as follows.

Consider a more general alternative model:

(1.3)

where is a orthonormal matrix with being a dimensional identity matrix where is unknown with , is an unknown smooth function and . Under the null hypothesis, the model (1.3) reduces to the model (1.1) with and . Here denotes the norm. Under the alternative hypothesis, the model (1.3) can be regarded as a multi-index model with indices. If the alternative model is a purely nonparametric regression model as (1.2), is equal to .

As we found that the bias comes from the slow convergence rate of nonparametric estimation in the residual sum of squares, we then consider a sum of residual product that are from parametric and nonparametric fits. It will be confirmed theoretically that the bias can be asymptotically removed. As a by-product, the bias-reduction-based NGLR statistic also has a smaller asymptotic variance. Through the simulation studies later, we can see its advantages on significance level maintainance and power enhancement.

As for dimensionality reduction, the key is how to adapt the model structure under the null hypothesis and under the alternative how the test can automatically adapt the model structure such that the test is omnibus. To achieve this goal, we need to automatically identify and , and then construct a test that is based on this model-adaptive identification to fully use the information provided by the underlying model. The estimation procedure will be described in the next section. the model-adaptive version of NGLR does have two desired features: the Wilks phenomenon still holds and the test statistic can converge to the limit at the rate of order under the null hypothesis and can detect local alternatives distinct from the null at the rate of order rather than .

The rest of this article is organized as follows. In Section 2, the model-adaptive enhancement versions of the NGLR tests without bias-correction and with bias-correction are both constructed. The methods to estimate and identify the structure dimension are also presented in this section. The asymptotic properties under the null hypothesis, local and global alternative hypothesis are investigated in Section 3. Simulation studies and a real data analysis are conducted to evaluate the finite sample performance in Sections 4 and  5, respectively. All the technical proofs are relegated to Appendix.

2 Test statistic construction and relevant properties

The hypotheses are:

2.1 Model-adaptive enhancement of the NGLR test

Suppose is a random sample and the error in nonparametric model (1.2) is i.i.d . The normality is just used for motivating the test statistic construction. As Fan et al (2001) did, the conditional log-likelihood function of given can be written as:

Under the null hypothesis ,

(2.1)

Here is the residual sum of squares under the hypothetical model (1.1), that is,

(2.2)

where and denote the ordinary least squares (OLS) estimates of the parameters and , respectively. Further, maximizing the likelihood in (2.1) with respect to nuisance parameter yields and substituting the estimate in (2.1) yields the following log-likelihood function:

(2.3)

Under the alternative hypothesis , with a similar argument, we can obtain

(2.4)

Here is an estimate of . is the residual sum of squares under the alternative model (1.3), that is,

(2.5)

We will later specify the estimate of .

Under the null hypothesis , should be close to . While under the alternative hypothesis , they should deviate from each other. This motivates us to define the following test statistic:

(2.6)

From the above construction, it seems there is no difference from the original NGLR test except having an extra estimate of . However, we will see that using an extra estimate of will play a very important role in the model-adaption of the test.

Once an estimate of is available, can be estimated by:

(2.7)

Here

with being a -dimensional kernel function and being the bandwidth. Moreover is a symmetric kernel with compact support, and is of order , that is,

Now we can make the comparison with NGLR. The main difference is the definition of . In Fan at al. (2001),

Here , , with a -dimensional kernel function. Then NGLR is defined as

(2.8)

We will see this difference will make the test behave very different from the original NGLR.

2.2 Bias-corrected version of the test statistic

From Theorem 1 in Section 3 below, it can be seen clearly that there exists an asymptotic bias converging to infinity in the limiting null distribution, which has to be taken into account in practice. In this subsection, we aim to propose a bias-correction method to remove the asymptotic bias. The motivation is from the theoretical investigation for the test statistic. The asymptotic bias of in (2.6) is caused by the slow convergence rate of to in . To eliminate the bias asymptotically, we replace in (2.5) by

(2.9)

where, denotes the absolute value and the leave-one-out kernel estimate of is applied. To be precise,

(2.10)

which is different from the original kernel estimate in (2.7). It is clear that the term represents the sum of residual product under the null and alternative hypothesis. The bias-corrected version of the test statistic in (2.6) is defined as:

(2.11)

where

To understand the difference between NGLR and the bias-corrected NGLR under the null hypothesis, we can check mainly the difference between two numerators of the two test statistics because the denominator goes to a constant in probability. Recall that the main reason to have the bias term is the slow convergence rate of the nonparametric estimation in to the regression function under the null hypothesis. The parametric fitting in has, under the null hypothesis, a faster convergence rate to the error than the nonparametric fitting, and then reduces the bias asymptotically and at the same time, the another nonparametric fitting can still help the test detect the alternative.

Bias correction was also investigated in Gozalo and Linton (2001) for additivity test motivated by Lagrange multiplier tests. However, there is no investigation for model checking. Further, unlike Gozalo and Linton (2001), a leave-one-out kernel estimate is applied which is proved to be useful since without this, the asymptotic bias still exists although it can be reduced. Further, we also use absolute summands in to avoid possible negative values.

2.3 Identification and estimation of

Note that the model (1.3) is a multi-index model. Here, we first estimate the matrix under the given and then study how to select consistently. To this aim, outer product of gradients (OPG) and minimum average variance estimation (MAVE) introduced by Xia et al (2002) are adopted. OPG is easy to implement and MAVE possesses excellent performance in general. Review these two methods below.

2.3.1 Outer product gradients

Denote and . Note that

owns nonzero eigenvalues. Therefore, is in the space spanned by the eigenvectors of corresponding to the largest eigenvalues. Through the above analysis, the estimate of can be obtained by estimating .

To implement the estimation, we first estimate the gradients by the local linear smoother:

where , and . Then the estimate can be obtained by solving the following minimization problem:

where with being a -dimensional kernel function and being a bandwidth. The corresponding estimating equation can be rewritten as

The estimate of can be constructed as:

(2.12)

Thus, the eigenvectors that are associated with the largest eigenvalues of can be regarded as the estimate of the matrix .

2.3.2 Minimum average variance estimation

The minimum average variance estimation (MAVE) method was first proposed by Xia et al. (2002). This method needs no strong assumptions on the probabilistic structure on . At the population level, MAVE minimizes the objective function

This motivates an estimation procedure for . By using the local linear smoother for , can be estimated by . Finally take the average over all to get the estimated objective function.

In sum, when a sample is available, we can estimate through minimizing

over all satisfying , and . The details of the algorithm can be found in Xia et al. (2002).

2.4 Estimation of structural dimension

In the above subsection, the structural dimension is assumed to be known. Now we introduce two techniques to estimate it: ridge-type ratio estimate (RRE) and Bayesian information criterion type estimate (BIC) that is motivated by Zhu et al (2006) and Wang and Yin (2008).

Ridge-type Ratio Estimate(RRE). This estimate is in spirit similar to the one proposed by Xia et al (2014), but with a different ridge value. It is very simple and easy to implement. It is the eigenvalues’ ratio modified by adding a positive ridge value . Determine by

where are the eigenvalues of in (2.12). The consistency of will be proved in the following and the constant is recommended.

BIC estimate. For MAVE, it is related to the residual sum of squares. The above simple ridge-type ratio estimate can not be used to determine for the MAVE based estimate. Instead, we use the modified BIC developed by Wang and Yin (2008):

where is the residual sum of squares and is the estimate of the dimension:

Under some mild conditions, the consistency of has been shown by Theorem 1 in Wang and Yin (2008).

Remark 1.

The following consistency of is established under both the null and global alternative hypothesis. Under Conditions (C5) and (C7) in Appendix, with a probability going to one, the ridge-type ratio estimate as . Under the regularity conditions designed in Wang and Yin (2008), the MAVE-based estimate as . Therefore, for a orthogonal matrix , is a consistent estimate of . We will show under the local alternative hypothesis in Section 3.

3 Theoretical results

In this section, the limiting null distributions of our proposed model-adaptive enhancement NGLR test in (2.6) and the bias-correction version in (2.11) are both derived and their asymptotic properties under local and fixed alternative hypothesis are also investigated.

3.1 Limiting null distribution

Let and Define

(3.1)
(3.2)
(3.3)

where the symbol denotes the convolution operator and . Besides, , and is the density function of . In order to obtain the estimates and for and , we first list the consistent estimates for the quantities , , and as follows:

Therefore, we have

The following theorem states the limiting null distribution of the proposed test statistic in (2.6).

Theorem 1.

Under the null hypothesis in (1.1) and conditions (C1)-(C7) in Appendix, we have:

Plugging in their consistent estimates and by Slutsky’s Theorem, the following corollary can be easily obtained.

Corollary 1.

Under the null hypothesis in (1.1) and conditions (C1)-(C7) in Appendix, we have:

(3.4)

Theorem 1 and Corollary 1 characterize the asymptotic normality of the proposed test statistic. The null hypothesis is rejected when that is the upper quantile of the distribution . It is notable that with homoscedastic errors, the asymptotic bias and variance in Theorem 1 is free of any nuisance parameters and nuisance functions. To be precise, and , here denotes the Lebesgue’s measure of the support . Thus, similar to NGLR, the limiting null distribution of the proposed test statistic is free of nuisance parameters enjoying Wilks phenomenon. Note that under . Thus Theorem 1 and Corollary 1 show that under and the asymptotic bias of is at the order of . However, for NGLR, and the asymptotic bias has the order of . Thus, the convergence rate is greatly improved, the dimensionality effect is significantly eliminated and the bias is much reduced. These results make that the proposed test controls the size better compared with the NGLR test. The finite sample simulations later also confirm this claim.

We then state the asymptotic property of the bias-correction version test statistic in (2.11) under the null hypothesis.

Theorem 2.

Under the same conditions as those in Theorem 1, we have

where

(3.5)

Compared with Theorem 1, the significant difference is that there is no asymptotic bias in the limiting null distribution of . Another notable feature of is that the asymptotic variance is also reduced since . For a formal proof of this inequality, see, e.g, Dette and von Lieres und Wilkau (2001). We will explore this point further in the power performance studies. As for the situation of homoscedastic errors, we have Here is also the Lebesgue’s measure of the support . The above theorem shows that under (and only under) conditional homoskedasticity, does not depend on nuisance parameters and nuisance functions. In this case, the bias-correction version, like the NGLR statistic in (2.8), also enjoys the Wilks phenomenon. This offers a great convenience in implementing the bias-correction version. As the original NGLR test, we need to estimate to obtain a standardized version of . To this end, notice that

here and has been defined in (2.10).

We now standardize in (2.11) to get a scale-invariant statistic. According to Theorem 2, the standardized version of is

(3.6)

By the consistency of , an application of the Slutsky’s Theorem yields the following corollary.

Corollary 2.

Under the conditions in Theorem 2 and , we have

where is the standard normal distribution.

From this corollary, we can calculate -values easily by using its limiting null distribution.

3.2 Power study

Now we are in the position to study the power performance of the test under a sequence of local alternative models with the following forms

(3.7)

Here, , and is a constant sequence. Denote to be the minimizer

with . Under the null hypothesis, is the true parameter. Further, for the least square estimate , we always have . We state the consistency of under the local alternative hypothesis (3.7).

Lemma 1.

Suppose that conditions (C1)-(C7) in Appendix hold, under the local alternative (3.7) with , and under global alternative (1.3), either the OPG-based or the MAVE-based estimate respectively equals 1 or with a probability going to one as .

To state the following theorem, we first define the notation as

(3.8)

The asymptotic properties under global and local alternative hypothesis are stated in the following Theorem:

Theorem 3.

Given conditions (C1)-(C7) in Appendix, we have the following results.

(i) Under the global alternative of (1.3),

where is a positive constant.

(ii) Under the local alternative hypotheses in (3.7) with , we have

and .

This theorem indicates under the global alternative hypothesis, the proposed test is consistent with the asymptotic power , the test can detect the local alternatives distinct from the null at the rate of order rather than the rate of order the NGLR can achieves. We further present the asymptotic properties of the bias-corrected version in (2.11).

Theorem 4.

Assume that conditions (C1)-(C7) in Appendix hold, we have the following conclusions.

(i) Under the global alternative of (1.3),

where is a positive constant.

(ii) Under the local alternative hypotheses in (3.7) with , we have

and , where have been defined in (3.8) and (3.5), respectively.

Denote

From the above theorems, it can be also shown that, the asymptotic powers of and are and respectively for the alternatives, which are distinct from the null ones at rate . Since , it is evident that is more powerful than in theoretically. Thus, is asymptotically more efficient than under .

Remark 2.

Both of our preliminary simulation results based on in the formula (3.4) and bias-correction statistic in (3.6) show the inflation sizes of the tests. Therefore, a size-adjustment is adopted:

(3.9)

Note that the size-adjustment value is asymptotically negligible when and thus . The size-adjustment is selected via intensive simulation with many different values we conduct and this one is worthy of a recommendation. After such an adjustment, the new tests can much better control type I errors and enhance the powers than those without the size-adjustment.

4 Simulation study

Denote the adjusted test statistics in the formula (3.9) based on OPG and MAVE methods as and , respectively and the corresponding bias-corrected versions are denoted as and . In this section, three numerical studies are carried out to examine the finite sample performance of the proposed tests. The first study is used to examine and compare the performance among our proposed four tests. The effect of nonlinearity under the null hypothesis on the performance of the tests is also considered here. The objective of the second study is to examine how much improvement our method can make compared with the NGLR test proposed by Fan et al (2001) and the impact of correlation between the covariate is also discussed in this study. Since the test proposed by Guo et al (2015) is also to solve the dimensionality problem in model checking, the last study aims at comparing our tests with .

Study 1: The data are generated from the following models:

where . The covariate are i.i.d. and generated from a multivariate normal distribution where is a identity matrix. The residual and are considered. Here, we set and . In these two models, corresponds to the null hypothesis and to the alternative hypotheses. Under the alternatives, the last two models are high-frequent and the first two are not. We examine whether our test can be powerful for these two types of models. Throughout these simulations, unless otherwise specified, the kernel function is taken to be if and , otherwise. The bandwidth is selected as . The sample sizes and are considered and the significance level is set to be . Every simulation result is the average of 2000 replications. Empirical sizes (type I errors) and simulated powers of our test against the alternatives are tabulated in Table 1 and Table 2 to compare the performance of the four test statistics and .

Based on Table 1, we can obtain the following observations. First and the most important, for every combination of the random error and the sample sizes we conduct, both of the size-adjustment bias-correction versions