On Combining Estimation Problems Under Quadratic Loss: A Generalization
The main theorem in Judge and Mittelhammer [Judge, G. G., and Mittelhammer, R. (2004), A Semiparametric Basis for Combining Estimation Problems under Quadratic Loss; JASA, 99, 466, 479–487] stipulates that, in the context of nonzero correlation, a sufficient condition for the Stein rule (SR)-type estimator to dominate the base estimator is that the dimension should be at least 5. Thanks to some refined inequalities, this dominance result is proved in its full generality; for a class of estimators which includes the SR estimator as a special case. Namely, we prove that, for any member of the derived class, is a sufficient condition regardless of the correlation factor. We also relax the Gaussian condition of the distribution of the base estimator, as we consider the family of elliptically contoured variates. Finally, we waive the condition on the invertibility of the variance-covariance matrix of the base and the competing estimators. Our theoretical findings are corroborated by some simulation studies, and the proposed method is applied to the Cigarette dataset.
Keywords: Elliptically contoured variate; Least-squares estimators; Quadratic loss; Restricted estimator; Semiparametric inference; Shrinkage estimators; Stein-type estimator.
1 Introduction and statistical model
The multiple regression model is a common statistical tool for investigating the relationship between a response variable and several explanatory variables. One of the main issues in regression analysis consists in estimating the regression coefficients. In particular, in the context of a linear regression model, it is common to use the ordinary least squares estimator (OLSE). Indeed, under the normality of the errors term, OLSE is known to be the maximum likelihood estimator as well as the minimum variance unbiased estimator. However, in case some prior information (from outside the sample) is available, OLSE may not be optimal. For instance, this prior information may be due to past statistical investigations, when these investigations could have concluded that some regression coefficients are not statistically significant. Another source of prior information may be the expertise in a certain field, which establishes an association between the regressor variables. Such a situation arises in economic theory where, for example, it is common to consider that the sum of the exponents in a Cobb-Douglas production (see Douglas and Cobb, 1928) is equal to one.
From the statistical inference point of view, it is important to incorporate the available prior information in the estimation method in order to improve upon the OLSE. For instance, if such prior information can be expressed in the form of exact linear restrictions binding the regression coefficients, instead of using the OLSE, one can resort to a competing estimator which is also known as the restricted least squares estimator (RLSE); it is known that the RLSE dominates the OLSE in such cases. In the sequel, the OLSE will be referred to as the base estimator while the RLSE will be referred to as the restricted estimator or the competing estimator. Thus, in the case where some exact prior information is available, the practitioners should use the restricted estimator in order to estimate the target parameter while if only the sample information is available, the base estimator is to be preferred.
Nevertheless, in some circumstances, the prior information is nearly correct and thus, we want to incorporate an additional information but we are not completely sure about it. Such uncertainty about the additional information may be induced by a change in the phenomenon underlying the regression model. Another context is the one where the prior information comes from experts in a field, the uncertainty reflects the imprecision in the experts’ information or judgements. In the case where the prior information is that, from the past statistical investigations, some regression coefficients are not statistically significant, the uncertainty may reflect the fact that a field specialist believes that the nonsignificant explanatory variables are important.
In these cases, we have to choose how to incorporate uncertain prior information into the inference procedure. Technically, in order to use both the sample and the uncertain prior, we can combine the base estimator and the restricted estimator and thus it is important to find an optimal combination. In the context of the linear regression model, Judge and Mittelhammer (2004) proposed a Stein-type estimator and derived a sufficient condition for the risk dominance of Stein-type estimator relative to a certain base estimator. However, the main result, in Judge and Mittelhammer (2004)[JM], has some limitations. First, the error term is supposed to be normally distributed. Second, the variance-covariance matrix of the joint distribution of the base estimator and the competing estimator is supposed to be invertible. This last assumption excludes, for example, a case where the prior information is about the non-significance of some regression coefficients. Third, the derived sufficient condition is too restrictive in the sense that it excludes the case of a multiple regression model with less than five regressors. Thus, the condition in JM (2004) is not applicable to the cases of quadratic or cubic regression models. However, in many applications (see Ashton et al., 2008, Fernandez-Juricic et al., 2003, among others), if a linear fit is not appropriate, a quadratic or cubic regression proves to be a simple and an adequate model. The last example is the Cigarette dataset produced by the USA Federal Trade Commission which can be found in Mendenhall and Sincich (1992). For this data set, the method in JM (2004) is not applicable since we have only three explanatory variables. In Section 4, we analyse this dataset and we show that our method performs very well.
In this paper, we generalize in four ways the main result in JM (2004) which gives a sufficient condition for the risk dominance of Stein-type estimator relative to a certain base estimator. First, we present a class of estimators which includes as a special case the Stein rule-type estimator given in JM (2004). Second, we relax the condition on the dimension of the parameter space. Third, we waive the condition on the invertibility of the variance-covariance matrix of the base estimator and the competing estimator. Thus, the proposed methodology works also in the case where the practitioners suspect some linear restrictions binding the regression coefficients. Fourth, we extend the main result to the case of a family of elliptically contoured distributions. To this end, recall that the normal distribution is a member of the elliptically contoured distributions, and, as explained in Provost and Cheong (2000), many test statistics and optimality properties underlying Gaussian random samples remain unchanged for elliptically contoured random samples. For further discussions and advantages of elliptically contoured distributions, we refer for example to Abdous et al. (2004), Liu et al. (2009) and references therein. Finally, the main key for establishing our results consists in deriving some inequalities and bounds which are more refined than that used in JM (2004).
The remaining of this paper is organized as follows. Section 1.2 presents the statistical model which is given in JM (2004) as well as the highlights of our contributions. In Section 2, we present a class of Stein rule-type of estimators and their risk function. Section 3 gives the main results of this paper in the Gaussian case and, more generally, in the elliptically contoured random case. We also show, in Section 3, that the proposed method works in the context where the variance-covariance of the base estimator and the competing estimator is singular. In Section 4, we present some simulation results for small sample sizes as well as an analysis of a real data set. Section 5 gives some concluding remarks. For the convenience of the reader, technical proofs are given in the Appendix.
1.2 Statistical model and main contributions
In this section, we recall the statistical model and the assumptions as well as some preliminary results which are given in JM (2004). Thus, this section presents only the model for which the error term is normally distributed. As mentioned in the Introduction, this is a preliminary step as we show later that the result established under the normality assumption holds also in the cases of elliptically contoured variables.
Following JM (2004), we consider the estimation problem of a -dimensional location parameter vector when one observes an -dimensional sample vector such that , where is an design matrix of rank and is an -dimensional random vector such that and . Further, as in the quoted paper, we consider the scenario where there exists some uncertainty concerning the above statistical model, which leads to uncertainty concerning the appropriate inference method. For more details about these issues, we refer, for example, to JM (2004), Saleh (2006), Hossain et al. (2009), Morris and Lysy (2012) among others.
In the case where the above statistical model is appropriate, it is natural to estimate the target parameter by using the least-squares estimator (LS) . Further, in the context of an alternative statistical model, one can consider the competing estimator , which is such that , , . Thus, as in JM (2004), the two estimators and are assumed to be correlated, and may be biased with bias . In the context of uncertainty about which one of the two statistical models is more appropriate, it is common to consider an estimator which combines the two estimators in an optimal way. Originally, this type of method was introduced by James and Stein (1961). Over the last 50 years, numerous papers have been written around the topic so that it would be impossible to summarize all of them. To give some closely related references, we mention Bock (1975), Judge and Bock (1978), JM (2004), Saleh (2006), Nkurunziza and Ahmed (2010), Nkurunziza (2011), and Tan (2015) and references therein.
In our paper, we extend the following: JM (2004) stipulate that (see their main theorem), in the case of nonzero correlation between the base estimator and the alternative estimator , a sufficient condition for the Stein rule (SR)-type estimator to dominate the base estimator is .
In this paper, we extend this result in four ways. First, we construct a class of estimators which includes as a special case the SR estimator given in JM (2004). Second, we prove that, regardless of the presence of correlation, the condition remains sufficient for any estimator of the proposed class of SR estimators to dominate in mean squared error the base estimator. The impact of this finding consists in the fact that, unlike the result in JM (2004), the established method can be applied to the case where the number of regressors is less than five as, for example, the case of a quadratic or a cubic regression model. Third, we also generalize the method in JM (2004) to the case where the joint distribution of the base estimator and the restricted estimator may be singular. This last result can be very useful in the case where the statistician suspects some linear restrictions binding the regression coefficients. This includes, for example, the case where the prior information from past statistical investigations is that some regression coefficients are not statistically significant, while the expert in the field of application believes that the corresponding explanatory variables should be in the model. Fourth, we prove that the established results hold if the normality assumption is replaced with that of elliptically contoured variates. Technically, in order to derive our findings, we establish some inequalities which are more refined than that in JM (2004). Finally, let us note that the simulation results are in agreement with the above theoretical findings. More specifically, the simulations show that the risk dominance of some SR estimators increases as the correlation increases.
2 A class of Stein rule estimators and the risk function
2.1 A class of Stein rule-type estimators
In this subsection, we present a class of Stein rule (SR)-type estimators which includes as a special case the SR estimator in JM (2004). First, recall that the results given in this paper hold under a very general statistical model than that in JM (2004). More precisely, the established results hold whenever the estimators and follow jointly an elliptically contoured distribution. First, suppose that the estimators and are jointly Gaussian. Thus, let
where, as in JM (2004), the matrices , , are assumed to be positive definite. This assumption will be waived in Subsection 3.2 to study the case where the matrices and may be singular. Further, let be real number and let be real-valued measurable and square-integrable (with respect to the Gaussian measure). We consider the following class of SR estimators
For a given -column vector ,let . If , for a known real number , from (2.2), we get the estimator
that is the Stein rule (SR)-type estimator given in JM (2004).
Let , , is an unbiased and/or a consistent estimator for . If , , the estimator in (2.2) becomes the Semiparametric Stein-Like (SPSL) estimator given in JM (2004). Namely, we have
If , the estimator in (2.2) yields the base estimator .
If , , we have .
As an important point, in this paper, the random quantity is a statistic in the sense that it can be computed whenever we have the observations. As for the real value which is assumed to be known in (2.2), this is similar to that used in SR-estimator in JM (2004). The impact of replacing by its corresponding consistent estimator should be similar to that in JM (2004). In practice, the value of can be obtained by using a re-sampling technique as the bootstrap.
2.2 Risk function
The performance of the proposed class of estimators is studied under the quadratic loss function. Thus, the quadratic risk function, so-called the mean squared error (MSE), of the class of the estimators in (2.2) is
assuming that these expectations are defined. Thus, from (2.5), it is obvious that, for all , we have
provided that and exist. Further, from (2.5), one concludes that the optimal choice of is and thus,
Assuming that and are defined, (2.7) implies that, for a fixed value of , the MSE of the SR estimator decreases as increases. Further, for a fixed value of , the MSE of the SR estimator decreases as decreases. The simulation results given in Section 4.1 are in agreement with this analysis. To make this idea more precise, we first note that, for a given , the values of and depend both on , the bias of the estimator , and on the covariance between the estimator and . In particular, the simulation results show that the risk dominance of the SPSL increases as the correlation increases. Also, the simulation results show that the risk dominance of the SPSL decreases as the norm of the bias increases.
As mentioned above, the derivation of the MSE in (2.5) and (2.7) assumes the existence of and . Thus, it is important to derive the conditions under which these expectations are defined. To this end, we require that the function satisfies the following assumption.
Assumption : The function is such that is bounded i. e.
It should be noticed that the function which gives the SR estimator satisfies the above assumption. Indeed, in this case, we have . Also, the function which gives the base estimator satisfies the above assumption. As another example of a function which satisfies the Assumption , one can take for some .
Below, we prove that, under the above assumption, regardless of the presence of correlation, the condition remains sufficient for any estimator of the class in (2.2) to dominate in mean square error the base estimator. In particular, since the SR estimator is a member of the class of the estimators in (2.2), the established result proves that, regardless of the presence of correlation, the condition remains sufficient for the SR estimator to dominate in mean square error the base estimator. We also prove that this conclusion holds if the normality assumption is replaced by that of elliptically contoured variates.
3 Main results
In this section, we present the main results of this paper. As an intermediate step, we derive below three propositions and a theorem which play a central role in deriving the main result. In summary, these results are useful in deriving a more refined inequality than that used in JM (2004). In order to simplify the presentation of the main results, we define some notations which will be used for the remaining of the paper. Let where and . From (2.1), we have
From the Cholesky decomposition, let be a nonsingular matrix such that , and let
The proof of this proposition is given in the Appendix. From Proposition 3.1, it is clear that, in order to prove that and are defined, it is sufficient to prove that and . Below, we establish a theorem which proves that, provided that , , and this implies that . To introduce some notations, let denote the indicator function of the event , let be -symmetric matrix, let denote the eigenvalue of , and let , , , be respectively the first, the second, , the the eigenvalue of .
The proof of this proposition is given in the Appendix.
Let , and let . We have
By using Theorem 3.4, we establish the following corollary which shows that (and so ) is bounded by a positive real number which is finite provided that .
For , we have . Then, by using Theorem 3.4, we get the statement of the corollary. ∎
From Corollary 3.5, it should be noticed that, if , then . This is an interesting finding which shows that the nonzero correlation does not affect the condition for the risk dominance of the SR estimator relative to the base estimator. Thus, under normality, in order to guarantee the existence of the MSE in (2.5) and (2.7), it is sufficient to let .
Under normality, implies and .
If , from the proof in JM (2004), . Then, by using Theorem 3.4, we get . ∎
Suppose that Assumption holds. Under normality, implies and .
Note that Corollary 3.6 generalizes the main theorem in JM (2004). Further, Corollary 3.7 extends the result of Corollary 3.6 to a class of SR-type estimators which includes the SR-type estimator in JM (2004) as a special case.
3.1 Extension to elliptically contoured random samples
In this subsection, we show that the result given in Corollary 3.6 remains valid in the context of some elliptically contoured random samples. The importance of such a family of distributions is the primary source of our motivation. Indeed, as discussed in the literature, elliptically contoured distributions have been particularly useful in several areas of applications such as actuarial science (see Furman and Landsman, 2006, Landsman and Valdez, 2003), or economics and finance (see Bingham and Kiesel, 2001).
Recall that a class of elliptically contoured distributions includes for example the multivariate Gaussian, t, Pearson type II and VII, as well as Kotz distributions. To simplify the notation, let stand for a -column random vector distributed as an elliptically contoured vector with mean and scale parameter matrix , where is a positive definite matrix, and is the probability density function (p.d.f) generator. For the sake of simplicity, we consider the case where the p.d.f of is assumed to be written as
where denotes the p.d.f of a random vector which follows a normal distribution with mean and variance-covariance , and is a weighting function that satisfies . Note that the weighting function does not need to be nonnegative. In the case where the function is nonnegative, then is a p.d.f, and the subclass of elliptically contoured distributions is known as a mixture of multivariate normal distributions. For more details, we refer to Chmielewski (1981), Gupta and Verga (1995), Nkurunziza and Chen (2013) among others. In particular, Gupta and Verga (1995) give the conditions on the p.d.f generator for the pdf of to be rewritten as in (3.10). From now on, we suppose that has a p.d.f which can be rewritten as in (3.10), and in a similar way to Section 2, let
where , , , are as defined in Section 2.
Suppose that is distributed as in (3.11) and suppose that the weighting function satisfies . Then, implies and .
First, recall that the family of elliptically contoured distribution is closed under linear transformations. Then, if is distributed as in (3.11), (2.5) and (2.7) hold, and then (3.4) holds with . Therefore, by using Remark 3.1, we conclude that Corollary (3.5) holds. Further, as in JM (2004), we get
Then, it suffices to prove that for all .
From (3.10) and Fubini’s Theorem, we have
where . Note that,
for all . This completes the proof. ∎
Suppose that Assumption holds. Also, suppose that
distributed as in (3.11) and suppose that the weighting function satisfies
. Then, implies and .
3.2 Further extensions and statistical practice
Singular distributions case
In the previous sections, we derived the results under the assumption that the joint distribution of and is not singular (see the relation (2.1)). This is a limitation which excludes, for example, the case where the imprecise prior information is in the form of a linear restriction between the parameters. Nevertheless, this is particulary the case where there is a restriction binding some regression coefficients. Indeed, such a situation is common in economic theory where for example, as introduced by Douglas and Cobb (1928), the sum of the exponents in a Cobb-Douglas production is known to be one. Thus, in this subsection, we consider that has the same distribution as in (2.1) where the matrices and are (possibly) singular. For this kind of problem, the joint distribution of and is (possibly) singular and thus, it is important to show how the proposed methodology works in this case. To this end, let be the rank of with . Briefly, we show that, under some conditions, the established results hold by replacing by . Namely, a sufficient condition for the risk dominance of any member of the class of SR-type estimators relative to the base estimator is to let . Of course, this condition implies that since . Namely, we suppose that the following conditions hold.
Assumption : The function is a measurable function of only.
Note that the function which gives the SR estimator satisfies Assumption . Namely, for the SR estimator, we have .
Assumption : There exists a symmetric and positive definite matrix such that is idempotent and .
It should be noted that in the case where is invertible, it suffices to take . Below we give another, more specific, example of a matrix in the case where the prior information is a linear restriction on the regression coefficients.
Suppose that Assumptions -
hold. Under normality,
implies and .
The proof of this theorem is given in the Appendix.
Special singular case: Linear restriction
In this subsection, we show that the proposed methodology works in a very special case where the uncertain prior information refers to a certain linear restriction. In particular, we consider the case where the restriction is of the form
where is a known -full matrix with ; is a known -column vector. With a suitable choice of the matrix and the vector , the constraint (3.12) yields the case where some regression coefficients are not statistically significant i. e. their corresponding explanatory variables should be excluded from the model.
where and . Thus, here, the variance-covariance matrix of is singular and so is the variance-covariance matrix of . The following proposition shows that the Assumption holds by taking . Thus, the proposed methodology works in this practical case.
Suppose that the base and restricted estimators follow the distribution in (3.13) and let , then, Assumption holds.
We have and . Then, the proof follows after applying standard algebraic computations. ∎
Actually, an even more general result could be proved. Indeed, by using the similar transformation as in Nkurunziza (2013), one can extend Theorem 3.10 to the case of singular elliptically contoured distribution.
4 A simulation study and data analysis
4.1 A Simulation Study
In this section, we carry out Monte Carlo simulation studies to examine the mean square error (MSE) performance of the SPSL over the base estimator. To this end, we follow the similar sampling experiments as in JM (2004). Namely, for and , we consider the general linear model
for small and large sample sizes. In order to save the space, we report only the results for and . Although, not reported here, similar results hold for and (they are available from the author upon request). For the dimension of the parameter vector , here, we focus only on the cases where and , as the case has been studied in JM (2004). The -matrix and the noise were generated by following the sampling design described in JM (2004). For the convenience of the reader, we outline below this sampling design.
Briefly, as in the quoted paper, for and , the first column of the matrix is a column of unit values and the remaining columns of the ’s are generated independently from a -dimensional normal distribution with a mean vector of 1s, standard deviations all equal to 1, and various levels of pairwise correlations. Further, the observations of the ’s were generated independently based on various normal probability distributions, all defined to have zero means over a range of standard deviations. For every sample size, 5,000 replications were carried out in order to compute the empirical quadratic risk estimates.
As in JM (2004), we take , where denotes a -diagonal matrix. Further, as in JM (2004), the comparison between the SPLS and LS estimators is based on the quantity called the relative mean square efficiency (RMSE) of the estimators with respect to LS, namely
Therefore, we have
Thus, a relative efficiency less than one indicates the degree of superiority of the new estimator over the LS estimator.
More precisely, in this paper, by following the results in
Subsection 5.2 of JM (2004), we examine the relative performance of
the SPLS estimator as a function of the parameter norms
and , where the parameter vector
is chosen such that
, which are the values used in JM (2004, Figure 3).
For the small sample sizes and , the results are presented in Figures 1 and 2. These figures show that, as the norms or increase, the risk of the SPSL estimator increases and approaches the risk of the LS estimator. Further, Figures 3-4 show a similar pattern for the case where and/or for the cases of moderate and large sample sizes. This result is in agreement with that given in JM (2004) for the cases where . Also, these figures confirm the findings in JM (2004) in that the correlation among the variables increases the relative performance of the SPSL estimator over the LS estimator.
4.2 Data analysis
In this subsection, we illustrate the application of the proposed method to a real data set. The data set consists of a sample of 25 brands of cigarettes (see Mendenhall and Sincich, 1992). For each brand of cigarette, the measurements of weight as well as tar, nicotine, and carbon monoxide content are have been recorded.
The choice of this data set is justified by several health and environmental issues concerning the cigarettes as mentioned by some medical studies. Thus, as explained in Mendenhall and Sincich (1992), ”the United States Surgeon General considers each of these substances hazardous to a smoker’s health”. The authors also mention that ”past studies have shown that increases in the tar and nicotine content of a cigarette are accompanied by an increase in the carbon monoxide emitted from the cigarette smoke”.
Accordingly, in order to illustrate the application of the proposed method, the response variable is taken as the carbon monoxide content, while the three covariates are: : weight; : tar content, and : nicotine content. So, including the intercept, we apply the proposed method to the regression model for which and . It should be noticed that, for such a data set whose , the result in JM (2004) cannot be used to justify the efficiency of the SPSL over the base estimator. In contrast, the result established in this paper justifies very well the relative efficiency of the SPSL estimator provided that the underlying distribution of the error terms is an elliptically contoured distribution. To give some numerical descriptive measures, the sample mean is 12.5280 for the response variable, and for the covariates, the sample means are 12.2160, 0.8764 and 0.9703 for the weight, tar and nicotine content, respectively. The correlation coefficients between the response and the covariates are shown in Table 1. This table indicates that the weight and the tar content are highly correlated to the response, while the correlation between the nicotine content and the response is modest. Nevertheless, the correlation coefficients are statistically significant at the 5 % level. Further, the covariates seem pairwise correlated at significance level 5 %.
|carbon monoxide||Weight||Tar content||Nicotine content|
|(0.0195)||(0.0127 )||( 0.0109 )||(-)|
By applying the method, we obtain the point estimates based on the LS and SPSL estimators as reported in Table 2. To asses the performance of the estimators, we compute the mean squared error based on a bootstrap method with 5000 replications. The relative efficiency of the estimators is given in Table 3.
From Table 3, one can clearly see that the relative efficiency of the SPSL estimstor is less than 1, that is the relative efficiency of the base estimator. This illustrates that SPSL dominates the base estimator.
In order to conclude, let us first recall that the main result in JM (2004) gives a sufficient condition for the Stein rule (SR)-type estimator to dominate the base estimator. In this paper, we provided more refined inequalities and bounds which are used in establishing this result in its full generality. Namely, we generalized in four ways the result in JM (2004) which gives a sufficient condition for the Stein rule (SR)-type estimator to dominate the base estimator. To this end, we provided an alternative and a versatile approach for establishing the dominance result in its full generality. In particular, we proved theoretically that the nonzero correlation does not change the condition for the risk dominance of the SR estimator. The impact of this result is that, unlike the method in JM (2004), our method is also applicable to a wide range of regression models, including, for instance, the quadratic or cubic regressions, where the number of regressors is less than 5. In addition, we relax the condition of normality of the sampling distribution of the base estimator. We also generalize the method in JM (2004) to the case where the variance-covariance matrix of the base estimator and the restricted estimator may be singular. The significance of this finding is that our method is also efficient in the case where the past statistical investigations may have established that some regression coefficients are not statistically significant while a field specialist believes that the nonsignificant explanatory variables are important. From the practical point of view, we evaluate numerically the relative efficiency of a data-based semiparametric Stein-like (SPSL) estimator. The simulation studies corroborate this theoretical finding that the sufficient condition, for the SPSL to dominate the LS estimator (), holds also regardless of the correlation factor. Nevertheless, Figures 1-8 show that the correlation may amplify the risk dominance of the SPSL. Finally, the proposed method is applied to the Cigarette dataset, produced by USA Federal Trade Commission, for which . An interesting result is that, by using a bootstrap method, we see that the SPSL dominates the base estimator. This finding is in agreement with the theoretical result proved in the present paper.
Appendix A Appendix
Let be a -symmetric matrix. Then,
, for all -column vector .
Since the matrix is symmetric, there exist orthogonal matrix such that
this completes the proof. ∎
Let be a -matrix. Let and column vectors. Then, we have
Since is a real number, we have . Therefore, since is a symmetric matrix, the first statement follows directly from Proposition A.1. To prove the second statement, note that can be rewritten as
The rest of the proof follows from the first statement. ∎