1 Introduction

In this paper, we propose a new semiparametric regression estimator by using a hybrid technique of a parametric approach and a nonparametric penalized spline method. The overall shape of the true regression function is captured by the parametric part, while its residual is consistently estimated by the nonparametric part. Asymptotic theory for the proposed semiparametric estimator is developed, showing that its behavior is dependent on the asymptotics for the nonparametric penalized spline estimator as well as on the discrepancy between the true regression function and the parametric part. As a naturally associated application of asymptotics, some criteria for the selection of parametric models are addressed. Numerical experiments show that the proposed estimator performs better than the existing kernel-based semiparametric estimator and the fully nonparametric estimator, and that the proposed criteria work well for choosing a reasonable parametric model.

Semiparametric Penalized Spline Regression

TAKUMA YOSHIDA AND KANTA NAITO

Graduate School of Science and Engineering, Shimane University, Matsue, Japan
Department of Mathematics, Shimane University, Matsue, Japan

Keywords Asymptotic theory; Bias reduction; -spline; Parametric model; Penalized spline;
     Semiparametric regression

Mathematics Subject Classification Primary 62G08; Secondary 41A15, 62G20

1 Introduction

There have been several nonparametric smoothing techniques used in regression problems, such as lowess, kernel smoothing, spline smoothing, wavelet, the series method, and so on. The nonparametric estimators generally have consistency, which is an advantage of this approach. Hence, if the nonparametric estimator is used, we can expect that the true regression can be captured as the sample size increases. However, because the form of a nonparametric estimator is sometimes complicated, the interpretation of the estimated structure might not be clear.

On the other hand, in a parametric regression problem with the true regression function controlled by a finite-dimensional parameter vector, the estimated structure is easy to understand, however, the estimator does not have consistency. Therefore, there are advantages and disadvantages associated with each of these approaches. This motivates us to consider a hybrid of parametric and nonparametric methods for the regression problem and we, in fact, introduce a semiparametric regression method so that the estimator has the advantages of both approaches.

The semiparametric method in this paper consists of two steps. In the first step, we utilize an appropriate parametric estimator. In the second step, we apply a certain nonparametric smoother to the residual data associated with the parametric estimator in the first step. The parametric estimator in the first step and the nonparametric smoother in the second step are combined into the proposed semiparametric estimator.

Similar semiparametric approaches for smoothing have been developed by many authors. Hjort and Glad (1995) and Naito (2004) discussed similar methods in density estimation literature. Glad (1998) and Naito (2002) addressed the semiparametric regression method. Martins et al. (2008) introduced general decomposition, including additive and multiplicative corrections in regression. Recently, Fan et al. (2009) discussed the semiparametric approach in the framework of a generalized linear model. Note that the aforementioned works all used kernel smoothing in the second step estimation.

Our proposal is to utilize the penalized spline method for residual smoothing in the second step. This is a typical technique used in nonparametric regression problems with sufficient fitness and appropriate smoothness, which was developed by O’Sullivan (1986) and Eilers and Marx (1996). Many of its applications are summarized in Ruppert, et al (2003). Throughout this paper, the fully nonparametric penalized spline estimator is designated by NPSE, while the semiparametric penalized spline estimator, including the two-step manipulations mentioned above, is denoted by SPSE. In this paper, the advantages of using the penalized spline method instead of the kernel method are described both theoretically and numerically. In particular, we found that the SPSE has better behavior than the semiparametric local linear estimator (SLLE) in simulation.

This paper is organized as follows. We elaborate on the proposed SPSE in Section 2. Section 3 discusses the asymptotic properties of the SPSE, which can be obtained using a combination of the asymptotic results for the parametric estimator and for the NPSE developed by Claeskens et al. (2009). The asymptotic bias of the SPSE depends on the initial parametric model utilized in the first step. The form of the asymptotic bias suggests a method of choosing the parametric model for the first step. A theoretical comparison of SPSE with SLLE is also given in the context of asymptotic bias, which reveals that the use of the penalized spline rather than a kernel smoother in the second step is valid. In Section 4, some criteria for parametric model selection will be clarified. If a parametric model chosen by the criteria discussed in Section 4 is used as the parametric part of the SPSE, its asymptotic bias will become smaller than that of the NPSE. The results of a simulation are reported in Section 5. The simulation studies include checking the accuracy of the SPSE and comparing it with the NPSE and the SLLE as regression estimators. The performance of the parametric model selection discussed in Section 4 is also investigated. Related discussion and issues for future research are provided in Section 6. Proofs for the theoretical results are given in the Appendix.

2 Semiparametric penalized spline estimator

Consider the relationship of the dataset as the regression model

where the explanatory is generated from density with its support on , is an unknown regression function, and the errors are assumed to be uncorrelated with and . Let be a parametric model. We now construct the semiparametric estimator of . First we obtain an appropriate estimator of via a suitable method of estimation. Then can be written as

(1)

where for some . When , this decomposition becomes , which is called an additive correction. When , on the other hand, we have a multiplicative correction . By using the parameter , we can treat additive and multiplicative corrections systematically (see, Fan et al. (2009)). In the second step, is estimated by applying a nonparametric technique to . The SPSE is obtained as

(2)

where is a nonparametric estimator of .

We adopt the penalized spline to estimate . Let be a marginal -spline basis of degree with equally spaced knots . Then we consider the -spline model

as an approximation to , where ’s are unknown parameters. The definition and fundamental properties of the -spline basis are detailed in de Boor (2001). Let be the -vector with th element and let and . The penalized spline estimator of is defined as the minimizer of

where is the smoothing parameter and is the th difference matrix. The estimator of is defined as

(3)

where .

In Figure 1, an example of the SPSE is drawn. In the left panel, the true function and the least square estimator of are shown. In the middle panel, the residuals of and the penalized spline estimator of are drawn. In the right panel, the true function and the SPSE as given in (2) are drawn. As the interpretation of for this example, the parametric part captures the overall shape of and the nonparametric part explains details which could not be captured by the . Similarly, we can construct an SPSE with multiplicative correction.

Figure 1: Plots for one random sample of true (dashed) and the parametric estimator (solid) in the left panel, the residuals and the penalized spline estimator of (solid) in the middle panel, and the true (dashed) and the SPSE (solid) in the right panel.

3 Asymptotic Result

Asymptotics for the NPSE were developed by Claeskens et al. (2009). By using their results, we show the asymptotic bias and variance, and asymptotic distribution of the SPSE. We now give some assumptions regarding the asymptotics of the SPSE.

Assumptions

  1. There exists such that for all , .

  2. .

  3. , ,

  4. , ,

  5. , .

  6. and .

Define the matrix , where

and the matrix , where

Let be a best approximation to . This means that satisfies

where

is the indicator function of the interval and is the th Bernoulli polynomial.

We now discuss a condition of the parametric estimator. Let be the true distribution of and let be the corresponding empirical distribution. The estimator of is defined as the functional form , where is a real valued function defined on the set of all distributions. We can then see that , where is defined as the optimizer of some distance measure . We assume that is the best approximation of . By the definition of , can be expressed as

(4)

where is the influence function defined as

with and finite covariance matrix, the delta function has probability 1 at a point , and is the bias of . The remaining term has mean for each component.

We investigate the asymptotic property of by a two-step procedure for clarity. First we derive the asymptotic expectation and variance of . Here, is the penalized spline smoother of . Second, we show that the difference between and vanishes asymptotically. Since is no longer stochastic, the asymptotic property of is dependent only on the nonparametric penalized spline estimator of . Hence we obtain

Here for a random variable , and are the conditional expectation and variance of given . The asymptotic property of can be directly obtained by using Theorem 2 (a) of Claeskens et al. (2009).

Proposition 1.

Let . Then, under the Assumptions, for a fixed ,

where

We now give the asymptotic result for . By using (4), and are expanded about and , respectively. From the details of the proof in the Appendix, we find that the asymptotic expectation and variance of are dominated by those of and we obtain the following theorem.

Theorem 1.

Let . Then under the Assumptions, for a fixed ,

where and are those given in Proposition 1.

Theorem 1 and Lyapunov’s theorem yield the asymptotic distribution of the SPSE.

Theorem 2.

Suppose that for some and the Assumptions are satisfied. Then, using and ,

where and are those given in Proposition 1.

If , we obtain the semiparametric regression spline estimator from (2). Thus, it is clear that the asymptotic result of the semiparametric regression spline is contained in Theorems 1 and 2. These are obtained from one parametric model. If we choose a polynomial model as , we obtain the following Corollary.

Corollary 1.

Let be the th polynomial model. Then, under and , or and , using , and equidistant knots, the SPSE is the same as the NPSE.

Remark 1  From Theorem 2, as the advanced analysis, we can construct the asymptotic pointwise confidence interval of by estimating the variance of the error.

Remark 2  Theorems 1 and 2 can be applied for . When , the results become those for additive correction. When , and the variance agrees with that of the estimator for multiplicative correction. In , it is understood that is a best approximation of . Therefore, can be written as , where is a best approximation of and is a vector with all components equal to 1. In conclusion, can be written as

because all components of have vanished.

Remark 3  When is assumed, we obtain and by choosing as a best approximation of 0. For , in particular, and both hold even in cases where with any constant .

Remark 4  If we use the local th polynomial technique in the second step estimation, we obtain the asymptotic bias as

where is bandwidth and is the th order kernel function. If and are equal and is odd, the difference between and is only

(5)

If we can calculate (5), we would be able to compare the bias of the SPSE with that of the semiparametric local polynomial kernel estimator. As an example, when , it is easy to show that for , while we have for the Gaussian kernel and for the Epanechnikov kernel . Therefore is smaller than in this situation, which reveals that the SPSE is superior than the SLLE.

4 Parametric model selection

In this section, we describe how to choose a parametric model. From Remark 3, if the true regression function satisfies , the bias of the SPSE is reduced. Hence we determine the initial parametric model in a bias reduction context. Specifically, our purpose is to choose a parametric model such that the asymptotic bias of the SPSE becomes smaller than that of the NPSE:

(6)

where and are the asymptotic biases of the NPSE. If is constant, and are equivalent to and , respectively. When the same and are used in both the SPSE and the NPSE, (6) can be rewritten as and for all , where

and

where is a best approximation to . As a pilot estimator of and its th derivative, we can use the local polynomial estimator with degree . Then the estimator of and can be obtained as

and by using empirical form,

where and is an -vector with th component . Here, we use the fact that

which is detailed in the proof of Theorem 2 (a) of Claeskens et al. (2009). We choose one parametric model by relative evaluation. Let

for a given parametric model and some finite grid points on . Here, for a set , is the cardinality of . After preparing a class of candidate parametric models , we choose a parametric model satisfying

(7)

In summary, for each parametric model , we calculate , and . By using the parametric model which satisfies (7), we construct the SPSE. If we can choose a good parametric model and a good , the SPSE will have better behavior than the NPSE.

Remark 5  When we construct the semiparametric regression spline estimator (SPSE with ), we obtain . Therefore, depends only on .

Remark 6  We see that the bias term appears due to the use of the -spline model. On the other hand, arises from the penalty component. If we use the regression spline, vanishes and the bias of the estimator becomes less than that of the penalized spline estimator. However, the regression spline often provides overfitting. Thus, we use the penalized method for obtaining a smooth curve. If , a certain amount of smoothness in the estimator is assured. However, may grow too large because of the influence of the parametric model. Therefore under , we suggest choosing such that becomes less than . Hence, together with , the parametric model chosen by appears to bring fitness and smoothness to the SPSE.

5 Simulation

In this section, we examine the results of a numerical study to confirm the effects of the SPSE on a finite sample. We choose a parametric model by the criteria discussed in Section 4. We also compare the performance of the SPSE to those of the NPSE, the SLLE and the fully nonparametric local linear estimator (NLLE). In all situations, we utilize the linear and cubic splines and the second difference penalty for the second step nonparametric estimation. The SPSEs with linear and cubic splines are designated as SPSE1 and SPSE3, respectively. NPSE1 and NPSE3 are labeled similarly. The number of knots and the smoothing parameter are determined by GCV. The design points are drawn from a uniform density on and the errors are generated from the normal with mean 0 and variance . Let

where . We prepare a class of candidate parametric models . For each , we calculate , and . We use a number of repetitions . For each iteration, we pick up from candidate models which maximize . The same manipulation is implemented for and . Finally we count the number of times that is picked up during the iterations. For comparison, we also show the model selection by using the AIC and the Takeuchi information criterion (TIC) detailed in Konishi and Kitagawa (2008).

Let

where is the estimator for the th repetition. Let , and be the estimates of integrated squared bias, integrated variance and mean integrated squared error of , respectively. For comparison, the ISB, V and MISE of the SLLE and the NLLE were also calculated. In the SLLE and the NLLE, we used the Gaussian kernel and its bandwidth was obtained by the direct plug-in approach (Ruppert et al. (1995)).

Example 1  The true function is . We use three different specified parametric models:

The true curve can be approximated by sin. The curve poly1 is a rough model and poly3 is close to the true . The variance of the error is and the sample size is . The coefficients of the covariate are estimated by the maximum likelihood method for each model. This set-up is similar to that used by Glad (1998).

SPSE1 SPSE3 IC
method model AIC TIC
sin 1000 901 1000 1000 1000 1000 850 938
poly1 0 0 0 0 0 0 0 0
poly3 0 99 0 0 0 0 150 62
sin 997 917 974 997 837 953 850 938
poly1 0 34 3 0 77 4 0 0
poly3 1 33 20 3 86 43 150 62
Table 1: The results of parametric model selection in Example 1.

Table 1 includes the number of times that each parametric model was chosen based on each criterion. In , and , sin was selected in almost all iterations. This result is desirable because sin coincides with the true function . We also observe that the AIC and the TIC often choose sin. When the number of times sin is chosen is taken into consideration, it seems that is a better selector than the AIC and the TIC.

Results for ISB, V and MISE of the SPSE and the NPSE are given in Table 2. The SPSE with sin succeeds in regards to bias reduction even with a small sample size, and variance and MISE of the SPSE are also smaller than those of the NPSE. In additive correction, the result of SPSE1 with poly1 is exactly the same as that of the NPSE (see Corollary 1). If we use poly3, MISE of the SPSE is smaller than that of the NPSE, although the squared bias is somewhat larger in multiplicative correction. In both , V and MISE, the values of the SPSE are smaller than those of the SLLE. We implemented the same method of analysis for the case . The , V and MISE of the SPSE and those of the NPSE were almost the same, although these are not shown in this paper.

SPSE1 SPSE3 SLLE
method model V MISE V MISE V MISE
sin 0.009 8.308 8.318 0.009 7.907 7.917 0.029 9.032 9.061
poly1 1.450 12.111 13.562 1.110 10.056 11.166 2.370 14.105 16.476
poly3 1.250 10.949 12.199 0.873 9.636 10.510 2.071 15.825 17.898
sin 0.011 8.394 8.405 0.010 8.292 8.302 0.026 10.708 10.734
poly1 1.571 12.322 13.893 1.565 12.212 13.777 2.357 13.860 16.217
poly3 2.016 11.198 13.215 1.016 10.198 11.215 2.942 12.472 15.415
NPSE1 NPSE3 NLLE
Fully nonparametric ISB V MISE ISB V MISE ISB V MISE
method 1.450 12.111 13.562 1.108 11.030 12.138 2.370 14.105 16.476
Table 2: Results of integrated squared bias, variance and mean integrated squared bias of Example 1. All entries for ISB,V and MISE are times their actual values.

Example 2  The same true function used in Example 1 is adopted and the sample size is . A class of initial parametric models is chosen, consisting of th degree polynomials ranging from to and designated as poly1, …, poly6, respectively, and . This parametric model clearly does not contain the true and the estimator becomes unstable because the variance of error is relatively large.

SPSE1 SPSE3 IC
method model AIC TIC
poly1 0 0 0 0 0 0 0 0
poly2 0 49 30 0 8 0 0 0
poly3 956 472 511 0 939 0 415 693
poly4 6 43 6 5 2 15 116 8
poly5 6 356 312 967 37 982 306 298
poly6 0 3