Quantile Regression for Partially Linear Varying Coefficient Spatial Autoregressive Models

# Quantile Regression for Partially Linear Varying Coefficient Spatial Autoregressive Models

Xiaowen Dai, Shaoyang Li, Maozai Tian
1. Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China
2. School of Statistics, Lanzhou University of Finance and Economics Lanzhou, 730101, Gansu, China
###### Abstract

This paper considers the quantile regression approach for partially linear spatial autoregressive models with possibly varying coefficients. B-spline is employed for the approximation of varying coefficients. The instrumental variable quantile regression approach is employed for parameter estimation. The rank score tests are developed for hypotheses on the coefficients, including the hypotheses on the non-varying coefficients and the constancy of the varying coefficients. The asymptotic properties of the proposed estimators and test statistics are both established. Monte Carlo simulations are conducted to study the finite sample performance of the proposed method. Analysis of a real data example is presented for illustration.
Keywords: Spatial autoregressive model; Varying coefficient; Partially linear; Quantile regression; Instrumental variables

## 1 Introduction

Spatial econometric models have been widely used in many areas (e.g., economics, political science and public health) to deal with spatial interaction effects among geographical units (e.g., jurisdictions, regions, and states). Many of the early studies have been summarized in Anselin (1988), Anselin and Bera (1998), LeSage (1999) and LeSage and Pace (2009). Recently, there are a large number of literature concerning on the spatial econometric models. For instance, Lee (2007) studied the generalized method of moments (GMM) applied to the Spatial autoregressive model. Lee (2004) studied asymptotic properties of the quasi-maximum likelihood estimator of the Spatial autoregressive model. Lee and Yu (2010) proposed the maximum likelihood (ML) estimator for the spatial autoregressive (SAR) panel model with both spatial lag and spatial disturbances. Dai, et al. (2015, 2016) respectively studied the local influence and outlier detection in the general spatial model which includes the spatial autoregressive model and the spatial error model as two special cases. Xu and Lee (2015) considered the instrumental variable (IV) and MLE estimators for spatial autoregressive model with a nonlinear transformation of the dependent variable. Qu and Lee (2015) provided three estimation methods for the spatial autoregressive model with an endogenous spatial weight matrix, including two-stage instrumental variable (2SIV) method, quasi-maximum likelihood estimation (QMLE) approach, and generalized method of moments (GMM). Zhang and Shen (2015) investigated the GMM estimation approach for the partially linear varying coefficient spatial autoregressive panel data models with random effects. Jin, et al. (2016) studied oulier detection in the spatial autoregressive model.

However, in some practical applications, a linear model might not be flexible enough to capture the underlying complex dependence structure. And a purely nonparametric model may suffer from the so-called “curse of dimensionality” problem, the practical implementation might not be easy, and the visual display may not be useful for the exploratory purposes. To deal with the aforementioned problems, some dimension reduction modeling methods have been proposed in the literature. For example, He et al. (1998), He and Ng (1999), He and Portnoy (2000), De Gooijer and Zerom (2003), Yu and Lu (2004) considered the additive quantile regression models for iid data. Honda (2004) and Cai and Xu (2008) proposed the varying coefficient quantile regression models for time series data. He and Shi (1996), He and Liang (2000), and Lee (2003) considered the partially linear quantile regression models for iid data. Ahmad, Leelahanon and Li (2005) and Fan and Huang (2005) considered the partially linear varying coefficient models for cross-sectional data. Sun and Wu (2005) and Fan, Huang and Li (2007) considered the partially linear varying coefficient models for longitudinal data.

In this paper, we investigate the quantile regression approach for partially linear varying coefficient spatial autoregressive models, since the partially linear varying coefficient model is a good balance between flexibility and parsimony. We employ B-spline for the approximation of varying coefficients. Due to the presence of endogenous variable, we employ the instrumental variable quantile regression (IVQR) method to attenuate the bias. The focus of this paper is to estimate the conditional quantile curves without any specification of the error distribution.

The rest of the paper is organized as follows. Section 2 introduces the partially linear varying coefficient spatial autoregressive models. Section 3 proposes the IVQR estimation procedure. Section 4 proposes the inference procedures for testing the non-varying coefficients and the constancy of the varying coefficients. The asymptotic properties of the estimators and test statistics are also discussed. Proofs of the theorems in Sections 3 and 4 are given in the Appendix. Section 5 reports a simulation study for assessing the finite sample performance of the proposed estimators. An empirical illustration is considered in Section 6. Section 7 concludes the paper.

## 2 The Models

Consider the following partially linear varying coefficient spatial autoregressive model

 yi=ρn∑j=1wijyj+X⊤iβ+Z⊤iγ(Ui)+εi, (2.1)

where is the dependent variable, is a vector, is a vector. is the th element of the spatial weight matrix . The parameter is a coefficient on the spatial lagged dependent variable , is a parameter vector, comprises unknown smooth functions, is the smoothing variable. Here, we only consider one-dimensional smoothing variable .

Matrix form of model (2.1) is

 y=ρWy+Xβ+Zγ(U)+ε, (2.2)

where , , , , , , is an vector with the th element equal to 1 and the rest equal to 0, is an vector, . Here, we can denote .

Due to the presence of endogenous variable , we employ the instrumental variable quantile regression (IVQR) method to attenuate the bias. The endogenous variable is related to a vector of instruments which are independent of . Then we can define the following conditional instrumental quantile relationship:

 Qτ(yi|F−i,Xi,Zi,Ui)=ρ(τ)di+X⊤iβ(τ)+Z⊤iγ(τ,Ui)+ωiζ(τ), (2.3)

where is the conditional -quantile of given and , is the -field of , is the coefficient corresponding to the instrumental variable , .

## 3 The proposed method

### 3.1 Instrumental Variable Quantile Regression Estimator (IVQR)

In this section, we employ B-spline for estimation. Without loss of generality, we assume that for all throughout.

We employ normalized B-splines of order to approximate the , . We consider a sequence of positive integers , , and an extended partition of by quasi-uniform internal knots. Let denote a set of B-spline basis functions. We approximate each by a linear combination of normalized B-spline basis functions

 γl(τ,u)≈kn+h+1∑s=1Bs(u)θl,s(τ)=πkn(u)⊤θl(τ),

where is the spline coefficient vector. For details on the construction of B-spline basis functions, the readers are referred to Schumaker (1981). With the B-spline basis, model (2.3) can be approximated by

 Qτ(yi|F−i,Xi,Zi,Ui) ≈ρ(τ)di+p∑l=1Xi,lβl(τ)+q∑l=1kn+h+1∑s=1Zi,lBs(Ui)θl,s(τ)+ωiζ(τ), =ρ(τ)di+X⊤iβ(τ)+Π⊤iΘ(τ)+ωiζ(τ), (3.1)

where , , .

Then we can define the following objective function:

 RIV(τ,ρ,β,Θ,ζ)=n∑i=1ρτ(yi−ρdi−X⊤iβ−Π⊤iΘ−ωiζ). (3.2)

Following Chernozhukov and Hansen (2006, 2008) and Galvao (2011), and assuming the availability of instrumental variables , we can derive the IVQR estimator via the following three steps:

• Step 1: For a given quantile , define a suitable set of values . One then minimizes the objective function for to obtain the ordinary QR estimators of :

 (^β(ρ,τ),^Θ(ρ,τ),^ζ(ρ,τ))=argminβ,Θ,ζRIV(τ,ρ,β,Θ,ζ). (3.3)
• Step 2: Choose among which makes a weighted distance function defined on closest to zero:

 ^ρ(τ)=argminρ∈R{^ζ(ρ,τ)⊤^A(τ)^ζ(ρ,τ)}, (3.4)

where is a positive definite matrix, .

• Step 3: The estimation of can be obtained, which is respectively and . Accordingly, the polynomial spline estimator is given by for each , .

Remark 1. Throughout the paper, we use the cubic spline in the B-spline approximation. For the objective function (3.2), the knots are chosen as the minimizer to the following Schwarz-type Information Criterion:

 SIC(kn) =log{n∑i=1ρτ(yi−^ρ(kn)n∑j=1wijyj−X⊤i^β(kn)−Π⊤i^Θ(kn)−ωi^ζ(kn))} +logn2n(2+p+qkn).

where are the th quantile estimators with knots. More details can be found in Kim (2003).

Remark 2. For an IVQR estimation, we need instruments for the endogenous variable . In practice, we can choose , , , etc. as instrumental variable matrix. In this paper, is chosen as instrumental variable matrix.

### 3.2 Asymptotic theory

The following are sufficient conditions for the proposed IVQR estimator based on polynomial spine approximation.

Assumption 1

(i) are independent and identically distributed (i.i.d.) for each fixed with conditional distribution function for .

(ii) The conditional distribution of given has a bounded density , which satisfies uniformly in and for some constants .

(iii) Uniformly over , has a bounded density function that is continuously differentiable in the neighbourhood of 0 with first derivative bounded.

Assumption 2

(i) , where denotes the class of varying coefficient functions. For some , , .

Here, we say function belongs to the class of varying coefficient functions if and . And denote the collection of all functions on whose th order derivative satisfies the Hölder condition of order with . That is, for any , , for any and .

(ii) For any varying coefficient function defined on , .

Assumption 3

(i) For all , is in the interior of the set , and is compact and convex.

(ii) Let

 Φ(ρ,β,Θ,ζ,τ) =E[(τ−I(y

where , , . The Jacobian matrices and are continuous and have full rank uniformly over . The parameter space is a connected set and the image of under the map is simply connected.

(iii) Denote , where . Let . Then, the following matrices are positive definite:

 Jη =limn→∞1n~X⊤Ω~X, (3.7) Jρ =limn→∞1n~X⊤ΩD, (3.8) S =limn→∞τ(1−τ)n~X⊤~X. (3.9)

Let be a conformable partition of and . Hence, is invertible and is also invertible.

(iv) , , , , and .

###### Theorem 3.1 (Uniformly Convergence)

Under Assumptions 1-3, are consistently estimable. And if , then

 supl∈{1,⋯,q}supu∈U∥^γl(u,τ))−γl(u,τ)∥=Op((kn+h+1)−r).
###### Theorem 3.2 (Asymptotic Distribution)

(i) Under Assumptions 1-3, for a given , converges to a Gaussian distribution:

 √n(^ϑ(τ)−ϑ(τ))\lx@stackreld→N(0,J⊤SJ), (3.10)

where , , , , , , , , , , and is a conformable partition of .

(ii) Consequently, under Assumptions 1-3, for a given , , converges to a Gaussian distribution:

 √n(^γl(u,τ)−γl(u,τ))\lx@stackreld→N(0,L(l)3SL(l)⊤3), (3.11)

where , , , is divided as .

The confidence intervals for the coefficients are considered, which are given in the following Theorem.

###### Theorem 3.3 (Confidence Interval)

(i) Under Assumptions 1-3, for a given , a confidence interval for the constant coefficient is

 [^β(τ)−Zα/2nσβ,^β(τ)+Zα/2nσβ].

where , is the th diagonal element of , .

(ii) Under Assumptions 1-3, for a given and , a confidence interval for the varying coefficient , is

 [^γl(u,τ)−Zα/2nσ(l)γ,^γl(u,τ)+Zα/2nσ(l)γ],

where , is the th diagonal element of , .

## 4 Rank score test

### 4.1 Inference on nonvarying coefficients

In this section, we propose a large sample inference procedures for testing the nonvarying coefficients . We partition the original model as

 Qτ(y|X,Z,U) =ρ(τ)D+X1β1(τ)+X2β2(τ)+Zγ(u,τ), (4.1) ≈ρ(τ)D+X1β1(τ)+X2β2(τ)+ΠΘ(τ), (4.2) =X1β1(τ)+X∗ϕ(τ), (4.3)

where are partitioned into two parts and with , and are respectively and design matrices corresponding to and , , .

Suppose we want to test , the quantile rank score test can be employed (see, Gutenbrunner, et al., 1990). Denote be the IVQR estimates of obtained under . The rank score test statistic takes the form:

 RSn=S⊤nQ−1nSn, (4.4)

where , , , , , , .

We modify Assumption 2(i) as Assumption 2(i) and add an Assumption 4 for deriving the asymptotic distribution of the rank score statistic :

Assumption 2(i) There exists some such that , .

Assumption 4 The minimum eigenvalue of is bounded away from zero for sufficient large .

###### Theorem 4.1

Under Assumptions 1-4 and Assumption 2(i), suppose , then has an asymptotic distribution under the null hypothesis .

### 4.2 Constancy of varying coefficients

In this section, we also employ the rank score test for testing whether one or some of the varying coefficients is constant. Without loss of generality, we consider testing whether the first coefficients functions are constant:

 H0:γl(τ,u)=γl(τ),  l=1,⋯,q1,

For this purpose, we may consider the quantile regression under null hypothesis

 Qτ(y|X,Z,U) =ρ(τ)D+Xβ(τ)+Z∗1γ∗1(τ)+Z∗2γ∗2(u,τ), ≈ρ(τ)D+Xβ(τ)+Z∗1γ∗1(τ)+Π2Θ2(τ), =˘Xφ(τ)+Z∗1γ∗1(τ), (4.5)

where are partitioned into two parts and with , and are respectively and design matrices corresponding to and , , , .

Then we propose the test procedure as follows:

• Step 1: Obtain the IVQR estimation of under model (4.2) (i.e., null hypothesis ).

• Step 2: We can estimate the varying coefficients by considering quantile regression of on .

• Step 3: The quantile rank score test can be employed (see, Gutenbrunner, et al., 1990). Denote be the IVQR estimates of obtained under . Then the rank score test statistic takes the form:

 RS∗n=S∗⊤nQ∗−1nS∗n, (4.6)

where , , , , , , .

We modify Assumption 4 as Assumption 4 for deriving the asymptotic distribution of the rank score statistic :

Assumption 4 The minimum eigenvalue of is bounded away from zero for sufficient large .

###### Theorem 4.2

(i) If is bounded corresponding to model (4.2), then under Assumptions 1-3, Assumption 2(i) and Assumption 4, suppose , then has an asymptotic distribution under the null hypothesis .

(ii) For growing as the sample size becomes larger, then under Assumptions 1-3, Assumption 2(i) and Assumption 4, , suppose the number of knots satisfies , then under , we have

 RS∗n−q1√2q1\lx@stackreld→N(0,1)  as kn→∞. (4.7)

## 5 Monte Carlo simulations

In this section, we conduct Monte Carlo simulations to investigate the finite sample performance of the proposed estimation and inference methods. The Monte Carlo simulations are repeated 1000 times for each sample size . The quantile regression based estimators are calculated for quantiles .

Example 1. The samples are generated as follows:

 yi=ρn∑i=1wijyj+Xiβ+Z1iγ1(Ui)+Z2iγ2(Ui)+εi,  i=1,⋯,n, (5.1)

where , , , , , is the common CDF of . Therefore, the random errors are centered to have zero th quantile. Here, respectively follow the , , , and distributions.

Example 2. The samples are generated as follows:

 yi=ρn∑i=1wijyj+Xiβ+Z1iγ1(Ui)+Z2iγ2(Ui)+(1+0.5Z1i)εi,  i=1,⋯,n, (5.2)

where , , , , , is the common CDF of . Therefore, the random errors are centered to have zero th quantile. In this example, respectively follow the , , , and distributions.

Following Dai, et al. (2016), the spatial weight matrix in the two examples is generated based on mechanism that , where , . A standardized transformation then is used to convert the matrix to have row-sums of unit.

### 5.1 Estimation

Firstly, we compare the performance of the partially linear varying coefficient spatial autoregressive model to the spatial autoregressive model. In example 1, the spatial autoregressive model is of the form

 yi=ρn∑i=1wijyj+Xiβ+Z1iγ1+Z2iγ2+εi,  i=1,⋯,n, (5.3)

where , , the rest variables are the same as those defined in model (5.1). In example 2, the spatial autoregressive model is given by

 yi=ρn∑i=1wijyj+Xiβ+Z1iγ1+Z2iγ2+(1+0.5Zi)εi,  i=1,⋯,n, (5.4)

where , , the rest variables are the same as those defined in model (5.2). Table 1 gives the comparison results of bias and RMSE of the PLVCSAR model and SAR model at and . and denote the IVQR estimates in PLVCSAR models, and and denote the IVQR estimates in SAR models. From Table 1, we can see that when data is generated from the PLVCSAR model, fitting SAR model leads to less efficient estimations in two examples, the bias and RMSE of and is smaller than those of and . When data is generated from the SAR model, fitting PLVCSAR model and SAR model have similar performance in homoscedastic case; in heteroscedastic case, fitting PLVCSAR model still does not lose much efficiency. Thus the PLVCSAR model is efficient and more flexible than the SAR model.

Table 2 summarizes the comparison results of QR and IVQR estimators with homoscedastic error term. Table 3 reports the comparison results of QR and IVQR estimators with heteroscedastic error term. Table 2 and 3 show that the IVQR estimator of has much smaller bias and RMSE than QR estimator on the whole, and the IVQR estimators of and have similar bias and RMSE as QR estimators.

The confidence intervals of the varying coefficients are also considered. The results are reported in Figure 1. The -axis presents the smoothing variables, and -axis presents the estimations of the varying coefficients at quantile 0.5 and sample size 200 (red lines) and their corresponding confidence intervals (blue lines) at significance level 0.05. Figure 1(a)-(b) and (c)-(d) respectively gives the confidence intervals of in Example 1 (with homoscedastic error term) and Example 2 (with heteroscedastic error term).