Effects of associated kernels in nonparametric multiple regressions
Associated kernels have been introduced to improve the classical continuous kernels for smoothing any functional on several kinds of supports such as bounded continuous and discrete sets. This work deals with the effects of combined associated kernels on nonparametric multiple regression functions. Using the Nadaraya-Watson estimator with optimal bandwidth matrices selected by cross-validation procedure, different behaviours of multiple regression estimations are pointed out according the type of multivariate associated kernels with correlation or not. Through simulation studies, there are no effect of correlation structures for the continuous regression functions and also for the associated continuous kernels; however, there exist really effects of the choice of multivariate associated kernels following the support of the multiple regression functions bounded continuous or discrete. Applications are made on two real datasets.
keywords:Bandwidth matrix, continuous associated kernel, correlation structure, cross-validation, discrete associated kernel, Nadaraya-Watson estimator.
Mathematics Subject Classification 2010: 62G05(08); 62H12 Short Running Title: Associated kernels in multiple regressions
Considering the relation between a response variable and a -vector of explanatory variables given by
where is the unknown regression function from to and the disturbance term with null mean and finite variance. Let be a sequence of independent and identically distributed (iid) random vectors on with of (1.1). The Nadaraya (1964) and Watson (1964) estimator of , using continuous classical (symmetric) kernels is
where is the symmetric and positive definite bandwidth matrix of dimension and the function is the multivariate kernel assumed to be spherically symmetric probability density function. Since the choice of the kernel is not important in classical case, we use the common notation for classical kernel regression. The multivariate classical kernel (e.g. Gaussian) suits only for regression functions on unbounded supports (i.e. ); see also Scott (1992). Racine and Li (2004) proposed product of kernels composed by univariate Gaussian kernels for continuous variables and Aitchison and Aitken (1976) kernels for categorical variables; see also Hayfield and Racine (2007) for some implementations and uses of these multiple kernels. Notice that the use of symmetric kernels gives weights outside variables with unbounded supports. In the univariate continuous case, Chen (1999, 2000ab) is one of the pioneers who has proposed asymmetric kernels (i.e. beta and gamma) which supports coincide with those of the functions to be estimated. Zhang (2010) and Zhang and Karunamuni (2010) studied the performance of these beta and gamma kernel estimators at the boundaries in comparison with those of the classical kernels. Recently, Libengué (2013) investigated several families of these univariate continuous kernels that he called univariate associated kernels; see also Kokonendji et al. (2007), Kokonendji and Senga Kiéssé (2011), Zougab et al. (2012) and Wansouwé et al. (2014) for univariate discrete situations. A continuous multivariate version of these associated kernels have been studied by Kokonendji and Somé (2015) for density estimation.
The main goal of this work is to consider multivariate associated kernels and then to investigate the importance of their choice in multiple regression. These associated kernels are appropriated for both continuous and count explanatory variables. In fact, in order to estimate the regression function in (1.1), we propose multiple (or product of) associated kernels composed by univariate discrete associated kernels (e.g. binomial, discrete triangular) and continuous ones (e.g. beta, Epanechnikov). We will also use a bivariate beta kernel with correlation structure. Another motivation of this work is to investigate the effect of correlation structure for explanatory variables in continuous regression estimation. These associated kernels suit for this situation of mixing axes as they fully respect the support of each explanatory variable. In other words, we will measure the effect of type of associated kernels, denoted , in multiple regression by simulations and applications.
The rest of the paper is organized as follows. Section 2 gives a general definition of multivariate associated kernels which includes the continuous classical symmetric and the multiple composed by univariate discrete and continuous. For each definition, the corresponding kernel regression appropriated for both continuous and discrete explanatory variables are given. In Section 3, we explore the importance of the choice of appropriated associated kernels according to the support of the variables through simulations studies and real data analysis. Finally, summary and final remarks are drawn in Section 4.
2 Multiple regression by associated kernels
In order to include both discrete and continuous regressors, we assume is any subset of . More precisely, for , let us consider on the measure where is a Lesbesgue or count measure on the corresponding univariate support . Under these assumptions, the associated kernel which replaces the classical kernel of (1.2) is a probability density function (pdf) in relation to a measure . This kernel can be defined as follows.
Let be the support of the regressors, a target vector and a bandwidth matrix. A parametrized pdf of support is called “multivariate (or general) associated kernel” if the following conditions are satisfied:
where denotes the random vector with pdf and both and tend, respectively, to the null vector and the null matrix as goes to .
From this definition and in comparison with (1.2), the Nadaraya-Watson estimator using associated kernels is
where is the bandwidth matrix such that as , and represents the type of associated kernel , parametrized by and . Without loss of generality and to point out the effect of , we will in hereafter use since the bandwidth matrix is here investigated only by cross validation.
The following two examples provide the well-known and also interesting particular cases of multivariate associated kernel estimators. The first can be seen as an interpretation of classical associated kernels through continuous symmetric kernels. The second deals on non-classical associated kernels without correlation structure.
Given a target vector and a bandwidth matrix , it follows that the classical kernel in (1.2) with null mean vector and covariance matrix induces the so-called (multivariate) classical associated kernel:
on with (i.e. ) and ;
on with (i.e. ) and .
A second particular case of Definition 2.1, appropriate for both continuous and count explanatory variables without correlation structure is presented as follows.
Let and with . Let be a (discrete or continuous) univariate associated kernel (see Definition 2.1 for ) with its corresponding random variable on for all . Then, the multiple associated kernel is also a multivariate associated kernel:
on with and = . In other words, the random variables are independent components of the random vector .
Here, in addition to the Nadaraya-Watson estimator using general associated kernels given in (2.4), we proposed a slight one. In fact, for multivariate supports composed of continuous and discrete univariate support, we lack appropriate general associated kernels. Therefore, the estimator (2.4) becomes with multiple associated kernels (2.6):
In theory and in practice, one often uses (2.7) from multiple associated kernels (2.6) which are more manageable than (2.4); see, e.g., Scott (1992) and also Bouerzmarni and Rombouts (2009) for density estimation.
2.2 Associated kernels for illustration
In order to point out the importance of the type of kernel in a regression study, we motivate below some kernels that will be used in simulations. These concern seven basic associated kernels for which three of them are univariate discrete, three others are univariate continuous and the last one is a bivariate beta with correlation structure.
The binomial kernel (Bin) is defined on the support with and then :
where denote the indicator function of any given event . Note that is the probability mass function (pmf) of the binomial distribution with its number of trials and its success probability in each trial . It is appropriated for count data with small or moderate sample sizes and, also, it does not satisfy (2.3); see Kokonendji and Senga Kiéssé (2011) and also Zougab et al. (2012) for a bandwidth selection by Bayesian method.
From Aitchison and Aitken (1976), Kokonendji and Senga Kiéssé (2011) deduced the following discrete kernel that we here label DiracDU (DirDU) as “Dirac Discrete Uniform”. For fixed the number of categories, we define and
We finally consider the bivariate beta kernel (Bivariate beta) defined by
with , and . For , the characteristics in (2.8) are given by , , , and the constraints
It satisfies Definition 2.1 and is adapted for bivariate rates. The full bandwidth matrix allows any orientation of the kernel. Therefore, it can reach any point of the space which might be inaccessible with diagonal matrix. This type of kernel is called beta-Sarmanov kernel by Kokonendji and Somé (2015); see Sarmanov (1966) and also Lee (1996) for this construction of multivariate densities with correlation structure from independent components. Like Bertin and Klutnitchkoff (2014), the miminax properties of this bivariate beta kernel are also possible and more generally for associated kernels.
Figure 2.1 shows some forms of the above-mentioned univariate associated kernels. The plots highlight the importance given to the target point and around it in both discrete and continuous cases. Furthermore, for a fixed bandwidth , the classical associated kernel of Epanechnikov, and also the categorical DiracDU kernel, keep their respective same shapes along the support; however, they change according to the target for the others non-classical associated kernels. This explains the inappropriateness of the Epanechnikov kernel for density or regression estimation in any bounded interval (Figure 2.1(a)) and of the DiracDU kernel for count regression estimation (see simulations below).
2.3 Bandwidth matrix selection by cross validation
In the context of multivariate kernel regression, the bandwidth matrix selection is here obtained by the well-known least squares cross-validation. In fact, for a given associated kernel, the optimal bandwidth matrix is with
where is computed as of (2.4) excluding and, is the set of bandwidth matrices ; see, e.g., Kokonendji et al. (2009) in univariate case and also Zhang et al. (2014) and Zougab et al. (2014a) for univariate bandwidth estimation by sampling algorithm methods. For diagonal bandwidth matrices (i.e. multiple associated kernels) the LSCV method use the set of diagonal matrices . Concerning the beta-Sarmanov kernel (2.8) with full bandwidth matrix, this LSCV method is used under , a subset of verifying the constraint (2.9) of the associated kernel. Their algorithms are described below and used for numerical studies in the following section.
Algorithms of LSCV method (2.10) for some type of associated kernels and their correponsding bandwidth matrices
Bivariate beta (2.8) with full bandwidth matrices and dimension .
Choose two intervals and related to and , respectively.
For and ,
Compute the interval related to from constraints in (2.9);
Compose the full bandwidth matrix with , and .
Apply LSCV method on the set of all full bandwidth matrices .
Multiple associated kernels (i.e. diagonal bandwidth matrices) for .
Choose two intervals , , related to , , , respectively.
For , , ,
Compose the diagonal bandwidth matrix .
Apply LSCV method on the set of all diagonal bandwidth matrices .
For a given interval , the notation is the total number of subdivisions of and denotes the real value at the subdivision of . Also, for practical uses of (A1) and (A2), the intervals are taken generally according to the chosen associated kernel.
3 Simulation studies and real data analysis
We apply the multivariate associated kernel estimators of (2.4) and (2.7) to some simulated target regressions functions and then to two real datasets. The multivariate and multiple associated kernels used are built from those of Section 2.2. The optimal bandwidth matrix is here chosen by LSCV method (2.10) using Algorithms A1 and A2 of Section 2.3 and their indications. Besides the criterion of kernel support, we retain three measures to examine the effect of different associated kernels on multiple regression. In simulations, it is the average squared errors (ASE) defined as
For real datasets, we use the root mean squared error (RMSE) which linked to ASE through squared root and by changing the simulated value into the observed value :
Also, we consider the practical coefficient of determination which quantifies the proportion of variation of the response variable explained by the non-intercept regressor
with . All these criteria above have their simulated or real data counterparts by replacing with and vice versa. Computations have been performed on the supercomputer facilities of the Mésocentre de calcul de Franche-Comté using the R software; see R Development Core Team (2014).
3.1 Simulation studies
Expect as otherwise, each result is obtained with the number of replications .
3.1.1 Bivariate cases
We consider seven target regression functions labelled A, B, C, D and E with dimension .
Function A is a bivariate beta without correlation :
with and as parameter values in univariate beta density.
Function B is the bivariate Dirichlet density
where is the classical gamma function, with parameter values , and, therefore, the moderate value of .
Function C is a bivariate Poisson with null correlation :
Function D is a bivariate Poisson with correlation structure
with parameter values , and and, therefore, the moderate value of ; see, e.g., Yahav and Shmueli (2012).
Function E is a bivariate beta without correlation :
Table 3.1 presents the execution times needed for computing the LSCV method for both bivariate beta kernels with respect to only one replication of sample sizes and for the target function A. The computational times of the LSCV method for the bivariate beta with correlation structure (2.8) are obviously longer than those without correlation structure. Let us note that for full bandwidth matrices, the execution times become very large when the number of observations is large; however, these CPU times can be considerably reduced by parallelism processing, in particular for the bivariate beta kernel with full LSCV method (2.10). These constraints (2.9) reflect the difficulty for finding the appropriate bandwidth matrix with correlation structure by LSCV method.
Table 3.2 reports the average which we denote for three continuous associated kernels with respect to functions A and B and according to sample sizes . We can see that both beta kernels in dimension work better than the multiple Epanechnikov kernel for all sample sizes and all correlation structure in the regressors. This reflects the appropriateness of the beta kernels which are suitable to the support of rate regressors. Then, the explanatory variables with correlation structure give larger than those without correlation structure. Also, both beta kernels give quite similar results. Furthermore, all are better when the sample size increases.
Finally, Tables 3.1 and 3.2 highlight that the use of bivariate beta kernels with correlation structure is not recommend in regression with rates explanatory variables. Thus, we focus on multiple associated kernels for the rest of the simulations studies.
Table 3.3 shows the values with respect to five associated kernels for sample size and and count datasets generated from C and D. Globally, the discrete associated kernels in multiple case perform better than the multiple Epanechnikov kernel for all sample sizes and correlation structure in the regressors. The use of categorical DiracDU kernels gives the best result in term of but DiracDU does not suit for these count datasets. Also, the discrete triangular kernels gives the most interesting result with an advantage to the discrete triangular with small arm . This discrete triangular is the best since it concentrates always on the target and a few observations around it; see Figure 2.1(a). The results become much better when the sample size increases. The values for regressors with or without correlation structure are comparable; and thus, we can focus on target regression functions without correlation structure for the remaining simulations.
Table 3.4 presents the values for sample sizes and for five associated kernels . The datasets are generated from E and the beta kernel is applied on the continuous rate variable of E. We observe the superiority of the multiple associated kernels using discrete kernels over those defined with the Epanechnikov kernel for all sample sizes. Then, the multiple associated kernel with the categorical DiracDU gives the best but it is not appropriate for the count variable of E. Also, the values are getting better when the sample size increases.
3.1.2 Multivariate cases
Since the appropriate associated kernels perform better than the inappropriate ones, we focus in higher dimension on regression with only suitable associated kernels. Then, we consider two target regression functions labelled F and G for and 4 respectively. The formulas of the functions are given below.
Function F is a 3-variate with null correlation:
Function G is a 4-variate without correlation:
with and .
Table 3.5 presents the regression study for dimension and 4 with respect to functions F and G and for sample size . The values show the superiority of the multiple associated kernels using the discrete triangular kernel with over the one with the binomial kernel. Some results with respect to function G for an associated kernel composed by two beta and two discrete triangular kernels with are also provided. The errors become smaller when the sample size increases.
3.2 Real data analysis
The dataset consists on a sample of 38 family economies from a US large city and is available as the FoodExpenditure object in the betareg package of Cribari-Neto and Zeilis (2010). The dataset in its current form gives not available (NA) responses for associated kernel regressions especially when we use the discrete triangular or the DiracDU kernel. Then, we extend the original FoodExpenditure dataset with its first 20 observations which guarantees some results for the regression, and thus . The dependent variable is food/income, the proportion of household income spent on food. Two explanatory variables are available: the previously mentioned household income () and the number of residents () living in the household with . We use the Gamma or the Epanechnikov kernel for the continuous variable and the discrete (of Figure 2.1(a)) or the Epanechnikov for the count variable number of residents.
The results of the multiple associated kernels for regression are divided in two in Table 3.6. The appropriate associated kernels which strictly follow the support of each variable give comparable results in terms of both RMSE and R. In fact, the associated kernels that use the discrete triangular with arm and 3 give some R approximately equal to . The inappropriate kernels give various results. The multiple Epanechnikov kernel and the type of kernel with DiracDU give R higher than while the GammaEpanechnikov gives R less than . Then, a little difference in terms of RMSE can induce a high incidence on the R.
Table 3.7 of the second dataset aims to explain the turnover of a large company by two proportions explanatory variables obtained by survey. The first variable is the rate of people who like the company and the second one is the percentage of people who like the strong product of this company. The dataset is obtained in 80 branch of this company. Obviously, there is a significant correlation between these explanatory variables: .
Table 3.8 presents the results for the nonparametric regressions with three associated kernels . Both beta kernels offer the most interesting results with R approximately equal to . Note that, the multiple Epanechnikov kernel gives lower performance mainly because this continuous unbounded kernel does not suit for these bounded explanatory variables.
4 Summary and final remarks
We have presented associated kernels for nonparametric multiple regression and in presence of a mixture of discrete and continuous explanatory variables; see, e.g., Zougab et al. (2014b) for a choice of the bandwidth matrix by Bayesian methods. Two particular cases including the continuous classical and the multiple (or product of) associated kernels are highlight with the bandwidth matrix selection by cross-validation. Also, six univariate associated kernels and a bivariate beta with correlation structure are presented and used for computational studies.
Simulation experiments and analysis of two real datasets provide insight into the behaviour of the type of associated kernel for small and moderate sample sizes. Tables 3.1, 3.2 and 3.8 on bivariate rate regressions can be conceptually summarized as follows. The use of associated kernels with correlation structure is not recommend. In fact, it is time consuming and have the same performance as the multiple beta kernel. Also, these appropriate beta kernels are better than the inappropriate multiple Epanechnikov. For count regressions, the multiple associated kernels built from the binomial and the discrete triangular with small arms are superior to those with the optimal continuous Epanechnikov. Furthermore, the categorical DiracDU kernel gives misleading results since it does not suit for count variables, see Tables 3.3 and 3.4. We advise beta kernels for rates variables and gamma kernels for non-negative dataset for small and moderate sample sizes, and also for all dimension ; see, e.g., Tables 3.5 and 3.6. Finally, more than the performance of the regression, it is the correct choice of the associated kernel according to the explanatory variables which is the most important. In other words, the criterion for choosing an associated kernel is the support; however, for several kernels matching the support, we use common measures such as the mean integrated squared error. It should be noted that a large coefficient of determination does not mean good adjustment of the data; see Tables 3.6 and 3.8. Further research on associated kernels for functional regression is conceivable; see, e.g., Amiri et al. (2014) for classical kernels.
- Aitchison and Aitken (1976) Aitchison, J., Aitken, C.G.G., 1976. Multivariate binary discrimination by the kernel method. Biometrika 63, 413–420.
- Amiri et al. (2014) Amiri, A., Crambes, C., Thiam, B., 2014. Recursive estimation of nonparametric regression with functional covariate, Computational Statistics and Data Analysis 69, 154–172.
- Bertin and Klutnitchkoff (2014) Bertin, K., Klutchnikoff, N., 2014. Adaptative estimation of a density function using beta kernels, ESAIM: Probability and Statistics 18, 400–417.
- Bouerzmarni and Rombouts (2009) Bouezmarni, T., Rombouts, J.V.K., 2010. Nonparametric density estimation for multivariate bounded data, Journal of Statistical Planning and Inference 140, 139–152.
- Chen (1999) Chen, S.X., 1999. A beta kernel estimation for density functions, Computational Statistics and Data Analysis 31, 131–145.
- Chen (2000a) Chen, S.X., 2000a. Probability density function estimation using gamma kernels, Annals of the Institute of Statistical Mathematics 52, 471–480.
- Chen (2000ab) Chen, S.X., 2000b. Beta kernels smoothers for regression curves, Statistica Sinica 52, 73–91.
- Cribari-Neto and Zeilis (2010) Cribari-Neto, F., Zeilis, A., 2010. Beta regression in R, Journal of Statistical Software 34, 1–24.
- Epanechnikov (1969) Epanechnikov, V.A., 1969. Nonparametric estimation of a multivariate probability density, Theory of Probability and Its Applications 14, 153–158.
- Hayfield and Racine (2007) Hayfield, T., Racine, J.S., 2007. Nonparametric econometrics: the np package, Journal of Statistical Software 27, 1–32.
- Kokonendji and Senga Kiéssé (2011) Kokonendji, C.C., Senga Kiéssé, T., 2011. Discrete associated kernels method and extensions, Statistical Methodology 8, 497–516.
- Kokonendji et al. (2009) Kokonendji, C.C., Senga Kiéssé, T., Demétrio, C.G.B., 2009. Appropriate kernel regression on a count explanatory variable and applications, Advances and Applications in Statistics 12, 99–125.
- Kokonendji et al. (2007) Kokonendji, C.C., Senga Kiéssé, T., Zocchi, S.S., 2007. Discrete triangular distributions and nonparametric estimation for probability mass function, Journal of Nonparametric Statistics 19, 241–254.
- Kokonendji and Somé (2015) Kokonendji, C.C., Somé, S.M., 2015. On multivariate associated kernels for smoothing some density function, arXiv:1502.01173.
- Kokonendji and Zocchi (2010) Kokonendji, C.C., Zocchi, S.S., 2010. Extensions of discrete triangular distribution and boundary bias in kernel estimation for discrete functions, Statistics and Probability Letters 80, 1655–1662.
- Lee (1996) Lee, M-L.T., 1996. Properties and applications of the Sarmanov family of bivariate distributions, Communications in Statistics - Theory and Methods 25, 1207–1222.
- Libengué (2013) Libengué, F.G., 2013. Méthode Non-Paramétrique par Noyaux Associés Mixtes et Applications. Ph.D. Thesis Manuscript (in French) to Université de Franche-Comté, Besançon, France & Université de Ouagadougou, Burkina Faso, June 2013, LMB no. 14334, Besançon.
- Nadaraya (1964) Nadaraya, E.A., 1964. On estimating regression, Theory of Probability and its Applications 9, 141–142.
- R Development Core Team (2014) R Development Core Team, 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://cran.r-project.org/.
- Racine and Li (2004) Racine, J., Li, Q., 2004. Nonparametric estimation of regression functions with both categorical and continuous data, Journal of Econometrics 119, 99–130.
- Sarmanov (1966) Sarmanov, O.V., 1966. Generalized normal correlation and two-dimensionnal Frechet classes, Doklady (Soviet Mathematics) 168, 596–599.
- Scott (1992) Scott, W.D., 1992. Multivariate Density Estimation, John Wiley and Sons, New York.
Wansouwé et al. (2014)
Wansouwé, W.E., Kokonendji, C.C., Kolyang, D.T., 2015. Disake: Discrete associated kernel estimators,
- Watson (1964) Watson, G.S., 1964. Smooth regression analysis, Sankhya Series A 26, 359–372.
- Yahav and Shmueli (2012) Yahav, I., Shmueli, G., 2012. On generating multivariate Poisson data in management science applications, Applied Stochastic Models in Business and Industry 28, 91–102.
- Zhang (2010) Zhang, S., 2010. A note on the performance of the gamma kernel estimators at the boundary, Statistics and Probability Letters 80, 548–557.
- Zhang and Karunamuni (2010) Zhang, S., Karunamuni, R.J., 2010. Boundary performance of the beta kernel estimators, Journal of Nonparametric Statistics 22, 81–104.
- Zhang et al. (2014) Zhang, X., King, M.L., Shang, H.L., 2014. A sampling algorithm for bandwidth estimation in a nonparametric regression model with a flexible error density, Computational Statistics and Data Analysis 78, 218–234.
- Zougab et al. (2012) Zougab, N., Adjabi, S., Kokonendji, C.C., 2012. Binomial kernel and Bayes local bandwidth in discrete functions estimation, Journal of Nonparametrics Statistics 24, 783–795.
- Zougab et al. (2014a) Zougab, N., Adjabi, S., Kokonendji, C.C., 2014a. Bayesian approach in nonparametric count regression with binomial kernel, Communications in Statistics - Simulation and Computation 43, 1052–1063.
- Zougab et al. (2014b) Zougab, N., Adjabi, S., Kokonendji, C.C., 2014b. Bayesian estimation of adaptive bandwidth matrices in multivariate kernel density estimation, Computational Statistics and Data Analysis 75, 28–38.