Improved robust Bayes estimators of the error variance in linear models

# Improved robust Bayes estimators of the error variance in linear models

\fnmsYuzo \snmMaruyama\thanksreft1,m1 label=e1]maruyama@csis.u-tokyo.ac.jp [    \fnmsWilliam, E. \snmStrawderman \thanksreft2,m2 label=e2]straw@stat.rutgers.edu [ University of Tokyo\thanksmarkm1 and Rutgers University\thanksmarkm2
###### Abstract

We consider the problem of estimating the error variance in a general linear model when the error distribution is assumed to be spherically symmetric, but not necessary Gaussian. In particular we study the case of a scale mixture of Gaussians including the particularly important case of the multivariate- distribution. Under Stein’s loss, we construct a class of estimators that improve on the usual best unbiased (and best equivariant) estimator. Our class has the interesting double robustness property of being simultaneously generalized Bayes (for the same generalized prior) and minimax over the entire class of scale mixture of Gaussian distributions.

[
\kwd
\arxiv

1004.0234 \doi10.1016/j.jspi.2013.01.007 \startlocaldefs \endlocaldefs

\runtitle

Improved robust Bayes Estimators

{aug}

and

\thankstext

t1This work was partially supported by KAKENHI #21740065 & #23740067. \thankstextt2This work was partially supported by a grant from the Simons Foundation (#209035 to William Strawderman).

class=AMS] \kwd[Primary ]62C20 \kwd62F15 \kwd[; secondary ]62A15

estimation of variance \kwdharmonic prior \kwdrobustness

## 1 Introduction

Suppose the linear regression model is used to relate to the predictors ,

 y=α1n+Xβ+σϵ (1.1)

where is an unknown intercept parameter, is an vector of ones, is an design matrix, and is a vector of unknown regression coefficients. In the error term, is an unknown scalar and has a spherically symmetric distribution,

 ϵ∼f(ϵ′ϵ) (1.2)

where is the probability density, , and . We assume that the columns of have been centered so that for . We also assume that and are linearly independent, which implies that

 rankX=p.

The class of error distributions we study includes the class of (spherical) multivariate- distributions, probably the most important of the possible alternative error distributions. It is often felt in practice that the error distribution has heavier tails than the normal and the class of multivariate- distributions is a flexible class that allows for this possibility. They are also contained in the class of scale mixture of normal distributions and thus, by De Finetti’s Theorem, represent exchangeable distributions regardless of the sample size .

In this paper we consider estimation of , the variance of each component of error term, under Stein’s loss (See James and Stein (1961)),

 LS(δ,σ2)=δ/σ2−log(δ/σ2)−1. (1.3)

Hence the risk function is given by . The best equivariant estimator is the unbiased estimator given by

where RSS is Residual Sum of Squares given by

In the Gaussian case, the Stein effect in the variance estimation problem has been studied in many papers including Stein (1964); Strawderman (1974); Brewster and Zidek (1974); Maruyama and Strawderman (2006). Stein (1964) showed that

 δST=min(δU,∥y−¯y1n∥2n−1) (1.5)

dominates . For smooth (generalized Bayes) estimators, Brewster and Zidek (1974) gave the improved estimator

 δBZ=ϕBZ(R2)δU

where is a smooth increasing function given by

 ϕBZ(R2)=1−2(1−R2)(n−p−1)/2n−1{∫10tp/2−1(1−R2t)(n−p−1)/2dt}−1 (1.6)

and is the coefficient of determination given by

 R2=∥X(X′X)−1X′{y−¯y1n}∥2∥y−¯y1n∥2. (1.7)

Maruyama and Strawderman (2006) proposed another class of improved generalized Bayes estimators. The proofs in all of these papers seem to depend strongly on the normality assumption. So it seems then, that it may be difficult or impossible to extend the dominance results to the non-normal case. Also many statisticians have thought that estimation of variance is more sensitive to the assumption of error distribution compared to estimation of the mean vector, where some robustness results have been derived by Maruyama and Strawderman (2005).

Note that we use the term “robustness” in this sense of distributional robustness over the class of spherically symmetric error distributions. We specifically are not using the term to indicate a high breakdown point. The use of the term “robustness” in our sense is however common (if somewhat misleading) in the context of insensitivity to the error distribution in the context of shrinkage literature.

In this paper, we derive a class of generalized Bayes estimators relative to a class of separable priors of the form and show that the resulting generalized Bayes estimator is independent of the form of the (spherically symmetric) sampling distribution. Additionally, we show, for a particular subclass of these separable priors, , that the resulting robust generalized Bayes estimator has the additional robustness property of being minimax and dominating the unbiased estimator simultaneously, for the entire class of scale mixture of Gaussians.

A similar (but somewhat stronger) robustness property has been studied in the context of estimation of the vector of regression parameters by Maruyama and Strawderman (2005). They gave separable priors of a form similar to priors in this paper for which the generalized Bayes estimators are minimax for the entire class of spherically symmetric distributions (and not just scale mixture of normals). We suspect that the distributional robustness property of the present paper also extends well beyond the class of scale mixture of normal distributions but have not been able to demonstrate just how much further it does extend.

The organization of this paper is as follows. In Section 2 we derive generalized Bayes estimators under separable priors and demonstrate that the resulting estimator is independent of the (spherically symmetric) sampling density. In Section 3 we show that a certain subclass of estimators which are minimax under normality remains minimax for the entire class of scale mixture of normals. Further, we show that certain generalized Bayes estimators studied in Section 2 have this (double) robustness property. Some comments are given in Section 4 and an appendix gives proofs of certain of the results.

## 2 A generalized Bayes estimator with respect to the harmonic prior

In this section, we show that the generalized Bayes estimator of the variance with respect to a certain class of priors is independent of the particular sampling model under Stein’s loss. Also we will give an exact form of this estimator for a particular subclass of “(super)harmonic” priors that, we will later show, is minimax for a large subclass of spherically symmetric error distributions.

###### Theorem 2.1.

The generalized Bayes estimator with respect to under Stein’s loss (1.3) is independent of the particular spherically symmetric sampling model and hence is given by the generalized Bayes estimator under the Gaussian distribution.

###### Proof.

See Appendix. ∎

Now let and . This is related to a family of (super)harmonic functions as follows. If, in the above joint prior for , we make the change of variables, , the joint prior of becomes

 π(α,θ)=∥θ∥−(p−a). (2.1)

The Laplacian of is given by

 p∑i=1∂2∂θ2i∥θ∥−(p−a)=(p−a)(2−a)∥θ∥−(p−a)−2,

which is negative (i.e. super-harmonic) for and is zero (i.e. harmonic) for .

###### Theorem 2.2.

Under the model (1.1) with spherically symmetric error distribution (1.2) and Stein’s loss (1.3), the generalized Bayes estimator with respect to for is given by

where

 ϕGBa(R2)=n−p−1n−a−1∫10tp/2−a/2−1(1−t)a/2−1(1−R2t)(n−p−a−1)/2dt∫10tp/2−a/2−1(1−t)a/2−1(1−R2t)(n−p−a+1)/2dt. (2.3)

See Appendix. ∎

## 3 Minimaxity

In this section, we demonstrate robustness of minimaxity under scale mixture of normals for a class of estimators which are minimax under normality.

###### Theorem 3.1.

Assume where is monotone nondecreasing, improves on the unbiased estimator, , under normality and Stein’s loss. Then also improves on the unbiased estimator, , under scale mixture of normals and Stein’s loss.

###### Proof.

Let be a scale mixture of normals where the scalar satisfies , that is,

 f(t)=∫∞0(2πτ)−n/2exp(−t/{2τ2})g(τ2)dτ2.

Then and the risk difference between these estimators is given by

 (3.1)

In the the first term of the right-hand side of the above equality,

is the risk difference under the Gaussian assumption, which is given by

 R({α,β,τ2σ2},δU)−R({α,β,τ2σ2},δϕ)

where . From the assumption of the theorem, it is non-negative for any . Hence it suffices to show that the second term is non-negative.

For given , and are independently distributed as with and . Since is given by , the second term of the right-hand side of (3.1) is written as

where

 ψ(τ2,v)=1−E[ϕ({1+v/χ2p(λ/τ2)}−1)]. (3.2)

By the monotone likelihood ratio property of non-central , is non-decreasing in for any fixed . Further, by the covariance inequality,

 Eτ2|V[ψ(τ2,V)(τ2−1)]≥Eτ2|V[τ2−1]Eτ2|V[ψ(τ2,V)]=0 (3.3)

since and are mutually independent and . The inequality (3.3) implies that the second term of the right-hand side of (3.1) is non-negative. ∎

Under the normality assumption, Brewster and Zidek (1974) showed that the estimator with nondecreasing dominates the unbiased estimator if , where is given by (1.6). Maruyama and Strawderman (2006) demonstrated that the generalized Bayes estimator of Theorem 2.2 with satisfies this condition. Hence our main result shows that the generalized Bayes estimator of Theorem 2.2 with , is minimax for the entire class of variance mixture of normal distributions.

###### Theorem 3.2.

Let . Under Stein’s loss, the estimator given by

where

 ϕH(R2)=n−p−1n−3∫10tp/2−2(1−R2t)(n−p−3)/2dt∫10tp/2−2(1−R2t)(n−p−1)/2dt (3.5)

is minimax and generalized Bayes with respect to the harmonic prior

 π(α,β,σ2)=(β′X′Xβ)−(p−2)/2{σ2}−1 (3.6)

for the entire class of scale mixture of normals.

###### Remark 3.1.

Note that the coefficient of determination is given in (1.7) and that the expectations of the numerator and the denominator are given by

 E[∥X(X′X)−1X′{y−¯y1n}∥2]=σ2{ξ+p},E[∥y−¯y1n∥2]=σ2{ξ+n−1},

where . Hence the smaller corresponds to the smaller since .

Our class of improved estimators utilizes the coefficient of determination in making a (smooth) choice between (when and are large) and (when and are small) and reflects the relatively common knowledge among statisticians, that is stochastically closer to when is small.

###### Remark 3.2.

The estimator is not the only minimax generalized Bayes estimator under scale mixture of normals. In Theorem 2.2, we also provided the generalized Bayes estimator with respect to superharmonic prior given by . In Maruyama and Strawderman (2006), we show that for with is minimax in the normal case with a monotone . Hence for in this range is also minimax and generalized Bayes for the entire class of scale mixture of normals. The bound has a somewhat complicated form and we omit the details (however, see Maruyama and Strawderman (2006) for details).

Note that corresponds to with since the corresponding prior to is given by (3.6). Note also that the unbiased estimator which is derived as the Jeffrey’s prior corresponds to . Therefore we conjecture that with any is minimax.

###### Remark 3.3.

Under the normality assumption, Maruyama and Strawderman (2006) gave a subclass of minimax generalized Bayes estimators with the particularly simple form

for where has a slightly complicated form, which we omit (see Maruyama and Strawderman (2006) for details). Under spherical symmetry, this estimator is not necessarily derived as generalized Bayes (See the following Remark), but is still minimax under scale mixture of normals.

###### Remark 3.4.

Interestingly, when , the generalized Bayes estimator with respect to is given by

for the entire class of spherically symmetric distributions (See Maruyama and Strawderman (2006) for the technical details). Hence when

 (2p−n+1)/(n−p−1)≤c(n,p), (3.9)

is minimax and generalized Bayes for the entire class of scale mixture of normals. Unfortunately, numerical calculations indicate that, for in the range , the inequality (3.9) is only satisfied for for odd and and for even.

Actually, under the Gaussian assumption, given in (3.7) with larger than can be demonstrated to be minimax numerically even though our analytic upper bound on for minimaxity is . In practice, since given in (3.5) can be calculated quickly and precisely, we recommend the use of given in (3.4).

###### Remark 3.5.

For Theorems 2.1, 3.1 and 3.2, the choice of the loss function is the key. Many of the results introduced in Section 1 were initially proved under the quadratic loss function . Under the Gaussian assumption, the corresponding results can be obtained by replacing by . On the other hand, the generalized Bayes estimator with respect to depends on the particular sampling model and hence robustness results do not hold under non-Gaussian assumption.

## 4 Concluding Remarks

In this paper, we have studied estimation of the error variance in a general linear model with a spherically symmetric error distribution. We have shown, under Stein’s loss, that separable priors of the form have associated generalized Bayes estimators which are independent of the form of the (spherically symmetric) sampling distribution. We have further exhibited a subclass of “superharmonic” priors for which these generalized Bayes estimators dominate the usual unbiased and best equivariant estimator, , for the entire class of scale mixture of normal error distributions.

We have previously studied a very similar class of prior distributions in the problem of estimating the regression coefficients under quadratic loss (See Maruyama and Strawderman (2005)). In that study we demonstrated a similar double robustness property: to wit, that the generalized Bayes estimators are independent of the form of the sampling distribution and that they are minimax over the entire class of spherically symmetric distributions.

The main difference between the classes of priors in the two settings are a) in the present study, the prior on is proportional to while it is proportional to in the earlier study; and b) in this paper, the prior on is also separable with being uniform on the real line and having the “superharmonic” form, while in the earlier paper jointly had the superharmonic form.

The difference a) is essential since a prior on proportional to gives the best equivariant and minimax estimator , while such a restriction is not necessary when estimating the regression parameters .

The difference in b) is inessential, and either form of priors on the regression parameters will give estimators with the double robustness properties in each of the problems studied. The form of the estimators, of course, will be somewhat different. In the case of the present paper, the main difference would be to replace by and to replace by

 {n¯y2+∥X(X′X)−1X′y∥2}/∥y∥2.

As a consequence, the results in these papers suggest that separable priors, and in particular the “harmonic” prior given (3.6), are very worthy candidates as objective priors in regression problems. They produce generalized Bayes minimax procedures dominating the classical unbiased, best equivariant estimators of both regression parameters and scale parameters simultaneously and uniformly over a broad class of spherically symmetric error distributions.

## Appendix A Proof of Theorem 2.1

The (generalized) Bayes estimator with Stein’s loss is given by . Under the improper density , the generalized Bayes estimator is given by

 ∬mf0(y|α,β)π(α,β)dαdβ∬mf1(y|α,β)π(α,β)dαdβ

where for is the conditional marginal density of with respect to given and ,

 mfi(y|α,β)=∫∞0σ−nf(∥y−α1n−Xβ∥2σ2)(σ2)−i−1dσ2.

Further we have

 mfi(y|α,β)=∥y−α1n−Xβ∥−n−2i∫∞0t{n+2i}/2−1f(t)dt=∫∞0t(n+2i)/2−1f(t)dt∫∞0t(n+2i)/2−1fG(t)dt∫∞0fG(∥y−α1n−Xβ∥2σ2)(σ2)−i−1σndσ2

where

 fG(t)=1(2π)n/2exp(−t/2).

Hence the generalized Bayes estimator is

 ∬mf0(y|α,β)π(α,β)dαdβ∬mf1(y|α,β)π(α,β)dαdβ=∫∞0tn/2−1f(t)dt∫∞0tn/2f(t)dt∫∞0tn/2fG(t)dt∫∞0tn/2−1fG(t)dtmG0(y)mG1(y)

where

 mGi(y)=∭fG(∥y−α1n−Xβ∥2σ2)π(α,β)(σ2)n/2+i+1dαdβdσ2.

Since has a spherically symmetric density and and , as well as satisfies

 ∫Rnf(ϵ′ϵ)dϵ=πn/2Γ(n/2)∫∞0sn/2−1f(s)ds=1, (A.1)

and

 ∫Rnϵ′ϵf(ϵ′ϵ)dϵ=πn/2Γ(n/2)∫∞0sn/2f(s)ds=n. (A.2)

Hence we have

 ∫∞0tn/2−1f(t)dt∫∞0tn/2f(t)dt∫∞0tn/2fG(t)dt∫∞0tn/2−1fG(t)dt=1n⋅n1=1

and hence the generalized Bayes estimator is given by which is independent of .

## Appendix B Proof of Theorem 2.2

Note, for ,

 (β′X′Xβ)−(p−a)/2=2a/2πp/2Γ({p−a}/2)|X′X|1/2×{σ2}a/2∫∞0ga/2−1|X′X|1/2(2πσ2g)p/2exp(−β′X′Xβ2σ2g)dg. (B.1)

Then

 mGi(y)=A∫∞−∞∫Rp∫∞0∫∞01(2πσ2)n/2exp(−∥y−α1n−Xβ∥22σ2)×ga/2−1{σ2}−a/2+1+i|X′X|1/2(2πσ2)p/2gp/2exp(−β′X′Xβ2σ2g)dαdβdσ2dg, (B.2)

where . In the following, we calculate the integration in (B.2) with respect to , , , and , in this order.

By the simple relation

 y−α1n−Xβ=(−α+¯y)1n+v−Xβ

where mean the mean of and , we have the Pythagorean relation,

 ∥y−α1n−Xβ∥2=n(−α+¯y)2+∥v−Xβ∥2,

since has been already centered. Then we have

 ∫∞−∞1(2πσ2)n/2exp(−∥y−α1n−Xβ∥22σ2)dα=n1/2(2πσ2)(n−1)/2exp(−∥v−Xβ∥22σ2).

Next we consider the integration with respect to . Note the relation of completing squares with respect to

 ∥v−Xβ∥2+g−1β′X′Xβ=1+gg(β−g1+g^β)′X′X(β−g1+g^β)+∥v∥21+g{g(1−R2)+1}

where and is the coefficient of determination. Hence we have

 (B.3)

Next we consider integration with respect to . By (B.3), we have

 ∫∞−∞∫Rp∫∞01(2πσ2)n/2exp(−∥y−α1n−Xβ∥22σ2)×|X′X|1/2(2πσ2)p/2gp/2exp(−β′X′Xβ2σ2g){σ2}a/2−1−idαdβdσ2=2−a/2+in1/2Γ({n−a−1+2i}/2)π(n−1)/2∥v∥n−a−1+2i(1+g)(n−p−a−1+2i)/2{g(1−R2)+1}(n−a−1+2i)/2. (B.4)

Finally we consider integration with respect to . By (B.4) we have

 mGi(y)=A2−a/2+in1/2Γ({n−a−1+2i}/2)π(n−1)/2∥v∥n−a−1+2i×∫∞0ga/2−1(1+g)(n−p−a−1+2i)/2{g(1−R2)+1}(n−a−1+2i)/2dg=A2−a/2+in1/2Γ({n−a−1+2i}/2)π(n−1)/2∥v∥n−a−1+2i(1−R2)(n−p−1−2i)/2×∫10tp/2−a/2−1(1−t)a/2−1(1−R2t)(n−p−a−1+2i)/2dt. (B.5)

The second equality follows from the change of variables . By using the relation , is written as (2.2).

## Acknowledgements

We are very grateful to the associate editor, and referees for wonderful insights which substantially helped us to strengthen this paper.

## References

• Brewster and Zidek (1974) {barticle}[author] \bauthor\bsnmBrewster, \bfnmJ. F.\binitsJ. F. \AND\bauthor\bsnmZidek, \bfnmJ. V.\binitsJ. V. (\byear1974). \btitleImproving on equivariant estimators. \bjournalAnn. Statist. \bvolume2 \bpages21–38. \bmrnumberMR0381098 \endbibitem
• James and Stein (1961) {bincollection}[author] \bauthor\bsnmJames, \bfnmW.\binitsW. \AND\bauthor\bsnmStein, \bfnmCharles\binitsC. (\byear1961). \btitleEstimation with quadratic loss. In \bbooktitleProc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I \bpages361–379. \bpublisherUniv. California Press, \baddressBerkeley, Calif. \bmrnumberMR0133191 \endbibitem
• Maruyama and Strawderman (2005) {barticle}[author] \bauthor\bsnmMaruyama, \bfnmYuzo\binitsY. \AND\bauthor\bsnmStrawderman, \bfnmWilliam E.\binitsW. E. (\byear2005). \btitleA new class of generalized Bayes minimax ridge regression estimators. \bjournalAnn. Statist. \bvolume33 \bpages1753–1770. \bmrnumberMR2166561 \endbibitem
• Maruyama and Strawderman (2006) {barticle}[author] \bauthor\bsnmMaruyama, \bfnmYuzo\binitsY. \AND\bauthor\bsnmStrawderman, \bfnmWilliam Edward\binitsW. E. (\byear2006). \btitleA new class of minimax generalized Bayes estimators of a normal variance. \bjournalJ. Statist. Plann. Inference \bvolume136 \bpages3822–3836. \bmrnumberMR2299167 \endbibitem
• Stein (1964) {barticle}[author] \bauthor\bsnmStein, \bfnmCharles\binitsC. (\byear1964). \btitleInadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. \bjournalAnn. Inst. Statist. Math. \bvolume16 \bpages155–160. \bmrnumberMR0171344 \endbibitem
• Strawderman (1974) {barticle}[author] \bauthor\bsnmStrawderman, \bfnmWilliam E.\binitsW. E. (\byear1974). \btitleMinimax estimation of powers of the variance of a normal population under squared error loss. \bjournalAnn. Statist. \bvolume2 \bpages190–198. \bmrnumberMR0343442 \endbibitem
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters