High-dimensional varying index coefficient quantile regression model

# High-dimensional varying index coefficient quantile regression model

\fnmsJIALIANG \snmLIlabel=e1]stalj@nus.edu.sg\thanksreft1 [    \fnmsJING \snmLV\correflabel=e2]lvjing@swu.edu.cn\thanksreft2 [ National University of Singapore\thanksmarkt1 and Southwest University\thanksmarkt2 JIALIANG LI
Department of Statistics and Applied Probability
National University of Singapore
Singapore, 119077
JING LV
School of Mathematics and Statistics
Southwest University
Chongqing, 400715, China
###### Abstract

Statistical learning evolves quickly with more and more sophisticated models proposed to incorporate the complicated data structure from modern scientific and business problems. Varying index coefficient models extend varying coefficient models and single index models, becoming the latest state-of-the-art for semiparametric regression. This new class of models offers greater flexibility to characterize complicated nonlinear interaction effects in regression analysis. To safeguard against outliers and extreme observations, we consider a robust quantile regression approach to estimate the model parameters in this paper. High-dimensional loading parameters are allowed in our development under reasonable theoretical conditions. In addition, we propose a regularized estimation procedure to choose between linear and non-linear forms for interaction terms. We can simultaneously select significant non-zero loading parameters and identify linear functions in varying index coefficient models, in addition to estimate all the parametric and nonparametric components consistently. Under technical assumptions, we show that the proposed procedure is consistent in variable selection as well as in linear function identification, and the proposed parameter estimation enjoys the oracle property. Extensive simulation studies are carried out to assess the finite sample performance of the proposed method. We illustrate our methods with an environmental health data example.

[
\kwd
\startlocaldefs\endlocaldefs\runtitle

QUANTILE VARYING INDEX COEFFICIENT MODEL

{aug}

and \thankstextt1 First author \thankstextt2Corresponding author.

class=MSC] \kwd[Primary ]62F12 \kwd \kwd[62G35; secondary ]62H12, 62J99.

High-dimensional data; Model mis-specification; Non-concave penalty; Quantile regression; Varying index coefficient model.

## 1 Introduction

Semiparametric regression models are powerful statistical learning approaches and become more and more popular in scientific and business research studies since they can enjoy the merits of both parametric and nonparametric models. We consider the varying index coefficient model (VICM) recently proposed in the literature (Ma and Song [30]). This new class of models extends varying coefficient models (Fan and Zhang [13]), single-index models (Xia, Tong, Li, and Zhu [44]), single index coefficient models (Xue and Pang [45]) and almost all other familiar semiparametric models, thus becoming the latest state of the art. To safeguard against outliers and extreme observations, we consider a robust quantile regression approach to fit the VICM in this paper. Specifically, for a given quantile level , varying index coefficient quantile regression models are given by

 Qτ(Y|X,Z)=d∑l=1mτ,l(ZTβτ,l)Xl, (1.1)

where with and are covariates for the response variable , are unknown loading parameters for the th covariate and are unknown nonparametric functions, . Let be the model error with an unspecified conditional density function and conditional cumulative distribution function given . Please note that ’s conditional th quantile equals zero, that is, . In the rest of the article, we drop the subscript from , , , and to simplify the notations, but it is helpful to bear in mind that all those quantities are -specific. For the sake of identifiability, we assume that belongs to the following parameter space: where denotes the norm such that for any vector . Model (1.1) is quite general and includes many other existing models as special cases. For example, (i) when are assumed to be constant or linear function, it reduces to the linear regression model with interactions; (ii) when and , it is the single index model; (iii) when are set as constant for and , it is the partial linear single-index model; (iv) when common coefficient vector are used, it is the single index coefficient model; (v) when and by the definition of we have , it reduces to the varying coefficient model. The VICM is very flexible to model and assess nonlinear interaction effects between the covariate and . Our main interest is to make statistical inference on both the loading coefficients and the nonparametric functions .

Ma and Song [30] proposed a profile least squares estimation procedure for the VICM and established its theoretical properties. Their work focused on mean regression, which is most suitable for nicely distributed data such as Gaussian and may perform badly in the presence of outliers and heavy-tailed errors. Our model (1.1) imposes different assumptions on the error structure and thus produces a novel and robust framework applicable for wider applications. The estimation methods and the associated asymptotic theories are thus totally different from Ma and Song [30].

Since the seminal work of Koenker and Bassett [24], quantile regression has emerged as an important alternative to mean regression. It is well understood that inference based on quantile regression is more robust against distribution contamination (Koenker [23]). A full range of quantile analysis can provide more complete description of the conditional distribution. It is now widely acknowledged that quantile regression based analysis may lead to more appropriate findings. For example, climatologists often pay close attention to how the high quantiles of tropical cyclone intensity change over time (Elsner, Kossin and Jagger [9]), as it not only generates strong winds and waves, but also often results in heavy rain and storm surges, causing serious disasters. In another health sciences example, medical scientists often study the influences of maternal behaviors on the low quantiles of the birth weight distributions (Abrevaya [1]). Consider one more case study from business and economics: petroleum is a primary source of non-renewable energy and has important influence on industrial production, electric power generation and transportation (Marimoutou, Raggad and Trabelsi [31]). Thus, most analysts particularly focus on the high quantiles of oil prices, as oil price fluctuations have considerable impacts on economic activity. The quantile regression framework considered in this paper may impact all these important fields where direct application of mean regression is inappropriate.

In recent decades the classical parametric quantile regression has been integrated with semiparametric models to produce more flexible inference tools. Here we only list a few relevant works among the abundant developments. For single index models, Wu, Yu and Yu [43] developed a robust minimum average variance estimation procedure based on the familiar quantile regression. Kong and Xia [25] combined quantile regression and a penalty function to develop an adaptive quantile estimation algorithm. Ma and He [29] considered a pseudo-profile likelihood approach, which enables a straight forward statistical inference on the index coefficients. Christou and Akritas [6] proposed a non-iterative quantile estimation algorithm for heteroscedastic data, and provided the asymptotic properties of the proposed approach. For varying coefficient model, Tang, Wang and Zhu [34] developed a new variable selection procedure by utilizing basis function approximation and a class of group versions of the adaptive LASSO penalty. Peng, Xu and Kutner [32] considered a shrinkage estimator under quantile regression. For single index coefficient models, Jiang and Qian [20] considered a new estimation procedure to reduce the computing cost of existing back-fitting algorithm. Zhao, Li and Lian [47] developed a bias-corrected quantile estimating equations and presented the fixed-point algorithm for fast and accurate computation of the parameter estimates. Other related works about semiparametric quantile regression include Wang and Zhu [39], Wang, Stefanski and Zhu [37], Zhu, Huang and Li [49], Ji, Peng, Cheng and Lai [19], Jiang, Zhou, Qian and Chen [21], Wang, Zhou, and Li [38], Li and Peng [26], Sun, Peng, Manatunga and Marcus [33], Zhao and Lian [48], among many others.

Another important contribution of this paper is that we consider the high-dimensional learning issues for the new VICM. In fact, recent advances in technologies for cheaper and faster data acquisition and storage have led to an explosive growth of data complexity in a variety of scientific areas such as medicine, economics and environmental science. We have to consider a realistic solution facing the “large , diverging ” data setting. Specifically we will allow the dimension of the covariates to increase to infinity as the sample size increases. Many penalty-based estimation methods are proposed in modern statistical community to address the high dimensional issue (Fan and Li [10]; Bühlmann and van de Geer [3]; Hastie, Tibshirani and Wainwright [16]; Giraud [15]). This framework can effectively reduce the model bias and improve the prediction performance of the fitted model. Fan and Peng [12] first studied nonconcave penalized likelihood estimation when the number of covariates increases with the sample size. Wang, Zhou and Qu [41] extended the method to generalized linear models for longitudinal measurements. High dimensional issue has also been investigated for semiparametric models. Lin and Yuan [28]combined basis function approximation with the SCAD penalty to propose a variable selection procedure for generalized varying coefficient partially linear models with diverging number of parameters, and also established the consistency and oracle property of their method. Wang and Wang [35] applied the SCAD penalty to perform variable selection for single index prediction models with a diverging number of index parameters. Fan, Liu and Lu [11] presented a penalized empirical likelihood approach for high dimensional semiparametric models.

Variable selection for model (1.1) is challenging since the high-dimensional loading parameter is structured within the unknown nonparametric function coefficients. We adopt a spline basis approximation to the estimation of and consequently estimate the unknown loading parameters vector under the sparsity assumption. In addition, we tackle the problem of correctly identifying the linear interaction effects between covariates. That is, we want to decide whether it is necessary to model nonparametrically for all the varying index functions. Ma and Song [30] constructed a generalized likelihood ratio statistic to test whether there exists a linear interaction effect between covariates. Although this test approach works very well for low-dimensional problems, it is computationally infeasible when the number of covariate is large. To this end, we develop a group penalization method that can quickly and effectively differentiate linear functions from nonparametric functions. The theoretical justification is also non-trivial for this complicated setting.

The rest of the paper is organized as follows. In Sect. 2, using the B-spline basis approximation, we construct robust quantile estimating equations for loading parameters and obtain the estimators of unknown nonparametric functions by minimizing the quantile loss function. Asymptotic properties of the proposed estimators are also established in this section. In Sect. 3, we consider high-dimensional issues and describe the variable selection procedure for loading parameters. Theoretical results are also presented including the estimation convergence rate, selection consistency and oracle property of estimators. In Sect. 4, a group penalized method is proposed to identify linear functional effects along with the theoretical properties. In Sect. 5, simulation studies and real data analysis are provided to illustrate our methods. In Sect. 6, we conclude with some remarks. All technical proofs are given in the Appendix B.

### 2.1 Estimation procedures

Suppose that is an independent and identically distributed sample from model (1). Without loss of generality, we assume that is confined in a compact set . B-spline basis functions are commonly used to approximate the unknown smooth functions owing to its desirable numerical stability in practice (de Boor [8]). We thus adopt such a nonparametric approach to estimate the index functions. More specifically, let be a set of B-spline basis functions of order () with internal knots and . We then approximate by a linear combination of B-spline basis functions where is the spline coefficient vector with for .

Let be the quantile loss function where is an indicator function. We obtain the estimators of the spline coefficients and the loading parameters by minimizing

 Lτn(λ,β)=n∑i=1ρτ{Yi−d∑l=1B(ZTiβl)TλlXil} (2.1)

subject to the constraint and . Minimizing (2.1) with respect to all unknown quantities requires a very non-standard nonlinear programming and the solution is usually hard to obtain directly. To address this computing difficulty, we consider an iterative procedure to estimate and . The detailed steps are given below.

Step 0. Initialization step: Obtain an initial value with . For example, one may use the profile least squares estimation proposed by Ma and Song [30].

Step 1. For a given , can be attained by . At this step, the minimization is a standard quantile regression problem and thus can be solved very easily using the R function “rq” from the package “quantreg”. This leads to . The first-order derivative can be approximated by the spline functions of one order lower than that of . That is, where is the first order derivative of the basis function .

In order to estimate , we consider leaving one component of out to acknowledge the constraint . Let be a dimensional parameter vector after removing in . The original loading parameter can be rewritten as

 βl=βl(βl,−1)=(√1−∥∥βl,−1∥∥22,βTl,−1)T,  ∥∥βl,−1∥∥22<1. (2.2)

It is obvious that is infinitely differentiable with respect to and the Jacobian matrix is given by

 Jl(βl,−1)=∂βl∂βl,−1=⎛⎜⎝−βTl,−1/βTl,−1√1−∥∥βl,−1∥∥22√1−∥∥βl,−1∥∥22         Ip−1⎞⎟⎠,

where is a identity matrix. Denote . Then belongs to

 Θ−1={β−1=(βTl,−1:1≤l≤d)T:∥∥βl,−1∥∥22<1,βl,−1∈Rp−1}.

Step 2. Let with the aforementioned definition for . Combining the estimators , and from Step 1, we may construct the quantile regression estimating equations for by setting . However, the equations involve the discontinuous function . This adds difficulty in computation despite there is linear programming solver (eg. Jin, Lin, Wei and Ying [22]). To achieve faster and more stable estimation, we consider an induced smoothing method via approximating by a smooth function (Whang [42]; Brown and Wang [2]; Chiou, Kang and Yan [5]). We introduce , where , is a kernel function and is a bandwidth. Thus, we construct the approximation function . Consequently the smoothed estimating equations are given as

 (2.3)

where with . Then we may employ Fisher scoring algorithm to solve the equations to obtain the estimates. That is

 (2.4)

Step 3. Repeat Steps 1 and 2 until convergence, and denote the final estimators as and . Then, we may apply the formula (2.2) to obtain , and construct the estimators of the nonparametric functions as .

### 2.2 Theoretical properties

To establish asymptotic normality and the convergence rate of the proposed estimators, we need some assumptions and notations. First, let be the true parameters in model (1.1), where and for . Here the subscript in is used to make it explicit that the dimension of loading parameters may depend on . Write to be the norm of a function , and we focus on the space as a collection of functions with finite norm on given by

 M={g(u,x)=d∑l=1gl(ul)xl,Eg2l(ZTβl)<∞}

with and For , we assume that is a minimizer in for the following optimization problem,

 P(Zk)=g0k(U(β0),X)        =d∑l=1g0l,k(ZTβ0l)Xl        =argming∈ME[f(0|X,Z)(Zk−g(U(β0),X))2],

where . This defines a quadratic projection of to be . Next, let , for any matrix , and

 Hn(β0−1)=n∑i=1[f(0|Xi,Zi)(˙ml(ZTiβ0l)XilJ0Tl~Zi)dl=1]⊗2

with for . Suppose and are positive definite. Let () be the order of smoothness of the nonparametric functions given in condition (C2) of the Appendix. We denote if . We first present the consistency and asymptotic normality of .

###### Theorem 2.1.

Suppose that conditions (C1)–(C7) in the Appendix B hold, and . If , then such that , we have

(i) ;

(ii) ,

where means the convergence in distribution.

Following de Boor [8], for any nonparametric function satisfying Condition (C2) in the Appendix B, there exists a best spline approximation function such that for some integer . Let and . Suppose and are positive definite. Let be the vector with the -th element being 1 and other elements being 0, with and

 σ2nl(ul)=n−1τ(1−τ)eTlB(u)Ω−1n(β0)Ψn(β0)Ω−1n(β0)BT(u)el. (2.5)

The following theorem provides the asymptotic results for the nonparametric estimates.

###### Theorem 2.2.

Under conditions (C1)–(C7) in the Appendix B, and , we have for each ,

(i)  uniformly for any ;

(ii) under , .

Let with and , , , and with . Under the conditions of Theorem 2.1, we can show

and as ,

where denotes the convergence in probability.

###### Remark 2.1.

Based on the iterative formula (2.4) and above results, we apply the following sandwich formula to consistently estimate the asymptotic covariance of

 Cov(^β−1)=H−1τn(^β−1)Mτn(^β−1)H−1τn(^β−1). (2.6)

Furthermore, we define as the direct sum of Jacobian matrices with dimension . Then, we can obtain the estimated asymptotic covariance of by .

###### Remark 2.2.

Write and . Under the conditions of Theorem 2.2, we can show and as . Thus, variance of can be consistently estimated by

 Var(^ml(ul,^β))=eTlB(u)C−1τnDτnC−1τnBT(u)el. (2.7)

So far all covariates in model (1.1) are assumed to be important for predicting the response variable. However, the true model is often unknown. On one hand, the fitted models may be seriously biased and non-informative if important predictors are omitted; on the other hand, including spurious covariates may unnecessarily increase the complexity and further reduce the estimation efficiency. Thus, it is a fundamental issue to select variables for the VICM when there is no prior knowledge of the true model form. In particular, we consider estimation when facing a diverging number of loading parameters. As usual we assume the model sparsity in the sense that most of the components of are essentially zero. For selecting important variables and estimating them simultaneously, penalized robust estimating equations are developed as

 Rτnh(β−1)−nbα1(β−1)=0, (3.1)

where

is a vector with and is the first order derivative of the SCAD penalty function, defined by

 ˙pα1(x)=α1{I(x≤α1)+(aα1−x)+(a−1)α1I(x>α1)},

where , and is a nonnegative penalty parameter which regulates the complexity of the model. It is easy to see that is close to zero if is large. Thus little extra bias is introduced by the penalty term. Meanwhile, should be large when is close to zero, which results in these small components being shrunk to zero. An iterative majorize-minorize (MM) algorithm proposed by Hunter and Li [18] can be incorporated to estimate in estimating equations (3.1). Specifically, for a fixed , we can obtain the estimate of using the following iterative procedure

 β(k+1)α1,−1=β(k)α1,−1−[∂Rτnh(β−1)/∂β−1−nΔα1]−1×                         (Rτnh(β−1)−nbα1(β−1))∣β−1=β(k)α1,−1, (3.2)

where and is a small number such as . The above iterative formula is similar to the MM algorithm of Hunter and Li [18], and its convergence can be similarly justified using their proposition 3.3 under the stationary and continuity assumptions.

In general, we define the true coefficients as with and , and , and . Correspondingly, we also divide into two parts, with and . Define and . Here we assume the number of nonzero components in is fixed for , namely, does not vary with .

We establish the following main results for the penalized estimation.

###### Theorem 3.1.

Under conditions (C1)–(C12) in the Appendix B, and . If as , we have .

Let and , where and are sub-matrices of and corresponding to .

###### Theorem 3.2.

Under conditions (C1)–(C12) in the Appendix B, and . If , and as , with probability tending to one, the consistent estimator satisfies

(i) for ;

(ii)

Now we define as the direct sum of Jacobian matrices with dimension . For , can be estimated by