References

and Elliptic Differential Equations

J.A.Hartigan ,Yale University

Abstract We evaluate priors by the second order asymptotic behaviour of the corresponding estimators. Under certain regularity conditions, the risk differences between efficient estimators of parameters taking values in a domain , an open connected subset of , are asymptotically expressed as elliptic differential forms depending on the asymptotic covariance matrix . Each efficient estimator has the same asymptotic risk as a “local Bayes” estimate corresponding to a prior density . The asymptotic decision theory of the estimators identifies the smooth prior densities as admissible or inadmissible, according to the existence of solutions to certain elliptic differential equations. The prior is admissible if the quantity is sufficiently small near the boundary of . We exhibit the unique admissible invariant prior for . A detailed example is given for a normal mixture model.

1 Introduction

A parameter takes values in a domain , an open connected subset of .
I use the partial differential equation symbol rather than the usual statistical symbol because the evaluation of asymptotic risk reduces to existence problems in the theory of partial differential equations. The parameter indexes a probability density with respect to some measure , say, for data .

We use the Kullback-Leibler loss function

 (1) ^x,x∈D:Ln(^x,x)=∫log[p(yn|x)/p(yn|^x)]p(yn|x)dμn(yn)

to define the risk of the estimator , a function of taking values in :

 (2) Rn(^xn,x)=∫Ln(^xn(yn),x)p(yn|x)dμn.

For a prior density p, the posterior Bayes estimator minimizes the posterior Kullback-Leibler risk

 (3) R(^x|yn)=∫L(^x,x)p(yn|x)p(x)dx/∫p(yn|x)p(x)dx.

Define

 (4) Vn(x)=−1/∫∂∂x∂∂x′log[p(yn|x)]p(yn|x)dμn(yn).

We will assume the asymptotic covariance matrix.

Following Brown[Br79], and letting denote the prior uniform over D, we consider estimators of form for fixed decision functions Under smoothness conditions [Ha98] requiring smooth variation of the data density and the prior with , the asymptotic risks for different decision functions differ only by terms of order ; therefore we define the asymptotic risk for decision function , relative to the decision function corresponding to the uniform prior , by the assumed limit

 (5) R(b,x)=limn→∞n2[Rn(b,x)−Rn(U,x)]=∑i,j{∂i(Vijbj)+12bibjVij}

where denotes the partial derivative .

For a prior with density , the posterior bayes estimate corresponds asymptotically to the decision function and then the risk may be expressed in elliptic operator form

 (6) R(bp)=2∑ij∂i(Vij∂j√p)/√p.

It turns out that, under certain conditions of smoothness and boundedness for and , there is a risk matching prior density p for which Thus the behaviour of asymptotic risk for all smooth decisions is captured in the theory of elliptic differential equations, equations whose relevance to decision theory were first indicated in Stein[St56], but which were extensively elucidated for the normal location problem in Brown[Br71]. See also Strawderman and Cohen[SC71]. The asymptotic behavior of Bayes estimators near maximum likelihood estimators have been studied for loss functions of form by Levit in [Le82], [Le83], [Le85]; in particular, he shows that the Bayes estimators form a complete class under certain regularity conditions.

2 Risk matching priors

For each decision function we will find a risk matching prior p for which . Then we need only consider decision functions of form
and risks of form in the asymptotic decision theory. This result will be proved under boundedness and smoothness assumptions using some standard tools from Pinsky[Pi95].

For the domain with closure , a function f is uniformly Holder continuous with exponent in if

 (7) ||f||0,α,¯D=supx,y∈¯D,x≠y|f(x)−f(y)||x−y|α<∞.

The Holder spaces consist of functions whose -th order partial derivatives are uniformly Holder continuous with exponent in . Say if is bounded and properly included in . The Holder spaces consist of functions that lie in for each .

The domain has a boundary if for each point , there is a ball B centered at and a 1-1 mapping , such that

 Condition ¯A : D bounded,∂D∈C2,α,Vij∈C1,α(¯D),V positive definite in ¯D. Condition A : Vij∈C1,α(D),V positive definite in D.

We follow a standard approach which first proves results under the strong condition , and then extends the results to the weak condition by approximating by an increasing sequence of bounded subdomains .

Theorem 1: Suppose and assumption holds.
Then there exists a prior in such that

 (8) R(b)=R(bp)=2∑ij∂i(Vij∂j√p)/√p.

Proof:

From [Pi95, theorem 5.5], for some eigenvalue , there exists
in in satisfying

 (9) 2∑ij∂i(Vij∂ju)−R(b)u=λu.

Since

 (10) ∫Dλu2 = ∫D{2∑ij∂i(Vij∂ju)−R(b)u}u (11) = ∫D−12∑ij(ubi−2∂iu)(ubj−2∂ju)Vij≤0,

it follows that If the corresponding eigenvector provides the -matching prior with If from [Pi95, theorem 6.5], for each there exists a unique -matching solution
in , to the equation

Theorem 2: Suppose and assumption A holds.
Then there exists a prior in such that

 (12) R(b)=R(bp)=2∑ij∂i(Vij∂j√p)/√p.

Proof:

Select an increasing sequence of bounded domains Within each domain, assumption holds, so there exists a sequence of solutions
in such that

For some , without loss of generality set all n. Note that any solution is also a solution to A Harnack inequality[Pi95, p 124] implies that for for some constants .

The Schauder interior estimate[Pi95,p86] implies, for some constant , for all ,

 (13) ||uN||2,α,Dn=supx,y∈Dnx≠y∑i,j|∂i∂j(uN(x)−uN(y))||x−y|α≤Cn.

Thus, for each n, the sequence is precompact in the norm. By diagonalization, there exists a subsequence, say , that converges to in for which

 (14) R(b)=R(bp)=2∑ij∂i(Vij∂ju)/u,x∈D.

Finally, we need to show that First note that , since

 (15) ||uN||2,α,Dn≤cn for N>n⇒||u||2,α,Dn≤cn.

For any , the compact set is covered by , and therefore by a finite subcovering, and so by a particular . Thus for every which implies

3 Explicit matching priors using the Feynman-Kac integral

When in equation (9), the decision function is the posterior bayes decision corresponding to the prior . We could determine the ratio by the integral over any path connecting and .

When , so that is no longer a gradient, it is plausible to attempt to find an approximating by averaging these integrals over all paths between and .

Consider the stochastic differential equation with initial condition

 (16) dXi(t)=∑ijV1/2ijdWj(t)+12(bi−∑j∂jVij)dt

We propose the Feynman Kac integral formula to specify a risk matching prior:

 (17) u(x)=E[exp(−12∫T0∑ibi∘dXi)√p(X(T))]

where is the Stratonovitch stochastic integral, and T is the time to reach the boundary of D. I suspect that the condition is sufficient for the existence of the stochastic process and the integral when assumption holds . When the formula is valid, we see that is determined as a weighted combination of its boundary values, with the weight at each boundary point determined by the path integral over the various paths that reach that particular boundary point. Many different priors risk matching are available corresponding to different smooth assignments to the boundary values.

4 Decision theory for asymptotic risks

We now apply decision theoretic classifications to the asymptotic risk formula. Our conclusions about the prior will depend only on the domain and the asymptotic variance . From now on we will drop the term asymptotic. We consider a particular variance and the set of risks, real valued functions on the domain , corresponding to priors satisfying

 (18) Assumption B:p∈C2,α(D),Vij∈C1,α(D);p,V>0% in D.

The risks will be written .

The posterior bayes decision is locally Bayes: for any alternative decision

where and in , some integration by parts shows

 (19) ∫D[R(bp+v)−R(bp)]p≥0.

Theorem 3: Under assumption B, with D bounded, the following conditions are equivalent:

is Bayes: there exists no with

is Admissible: there exists no with

is Unique Risk: there exists no with

is Brown: no non-trivial positive solves

Proof:

Bayes implies Admissible because violating Admissible also violates Bayes. Admissible implies Unique Risk because violating Unique Risk also violates Admissible. Brown[Br71,1.3.9] and Unique Risk are equivalent, because if and only if

It only remains to show that failure of Bayes implies failure of Unique Risk.

Without loss of generality, assume is uniform, so that If is not Bayes, there exists with

 (20) 0≥∫R(p∗)=12∫D∑ijVijb∗ib∗j+∫D∑ij∂i(Vijb∗j)

Since and the middle integral is positive, then

 (21) ∫D∑ij∂i(Vijb∗j)=C<0.

Let be the lebesgue measure of the boundary when is smooth and bounded, let denote the outward pointing normals on the boundary, and note that

 (22) ∫∂D∑ij{τib∗iVij}=∫D∑ij∂i(Vijb∗j)=C<0.

Applying Theorem 6.31 from [GT97], for , since ,
there exists a solution to the oblique derivative problem

 (23) R(pn)=0 in Dn,un=2∗C∑ij{τi∂iuVij}/|∂Dn| in ∂Dn

so that

Repeating the compactness argument of theorem 2 on , there exists with

 (24) R(p0)=0,∫D∑ij∂i(Vij∂ip0/p0)=C<0.

The first condition states that and have the same risk, and the second condition guarantees that , so that the unique risk condition fails, as required.

5 When is Bayes on R ?

Brown’s condition shows that the locally Bayes estimate is Bayes or not depending only on the product . For example, is Bayes with , if and only if is Bayes with . We will therefore rephrase the admissibility question in terms of the product : the prior-scaled covariance matrix is Bayes on if and only if there is no non-trivial solution to Brown’s equation.

Theorem 4. Let , where ranges over the surface of the unit sphere. Define Suppose that, uniformly over ,

 (25) limR→∞W(R,s)=W(s),limR→∞W(s)W−1(R,s)W(s)=W(s).

Then is Bayes on only if

 (26) ∫s∈S∑ijsisjWij(s)ds=0.

Proof:
Let be a test function with relative risk

 (27) R(bp+Q)−R(b)=∑ij{∂i(pVijQj)+12QiQjpVij}.

From theorem 2 for every there exists a prior with . Thus is Bayes if and only if

 (28) ∫D{R(bp+Q)−R(bp)}p=∫D∑ij{∂i(pVijQj)+12QiQjpVij}≥0

for every test where the integral is defined. Equivalently, with the test ,

 (29) t(Q)=∫D{R(bp+(pV)−1Q)−R(bp)}p=∫D{∑i∂iQi+12∑ijQiQjV−1ij/p}≥0

for every test where the integral is defined.

The possible negative term is determined by values in the neighbourhood of infinity, so being Bayes is determined by the behaviour of near the infinite boundary. In particular if two functions are identical outside a compact subset of , they have the same admissibility classification.
We therefore consider a test function that is zero inside the unit sphere:

 (30) Q(rs)=g(r)r1−dq(s),q(s)=−Ws

where is twice differentiable, for for for .

 (31) Let t(Q,R)=∫|x|

Consider the contribution to the test integral for a particular :

 (32) I(s,R) = ∑isiQi(Rs)Rd−1+∫R0{12∑ijQiQjV−1ij/p}rd−1dr (33) = ∑isiqi(s)+12∑ijqiqj∫R0g(r)2r1−dV−1ij/pdr (34) ≤ −∑ijsisjWij+12∑ijklsisjWikWjlW−1kl(R,s) (35) → −12∑ijsisjWij uniformly in s % as R→∞

Thus

 (36) t(Q)=limR→∞∫s∈SI(s,R)ds<0

unless

 (37) ∫s∈S∑ijsisjWij(s)ds=0.

which shows that the condition in the theorem is necessary for to be Bayes.

Failure of the condition in the theorem allows construction of an explicit test function for showing to be not Bayes. I suspect that the weaker condition
is also necessary. It may be that the condition is also sufficient. A similar condition for the recurrence of diffusion processes is given in [Ic78].

Brown[Br71] studies the admissibility of estimates for the normal location problem in dimensions in which it is assumed that the data are gaussian with unknown mean and identity covariance matrix. He shows that an estimate corresponding to the marginal density of the data is admissible if is bounded and if

 (38) ∫∞1[∫s∈Spds]−1r1−ddr=∞.

Brown, Theorem 6.4.4, also shows that an estimate corresponding to the marginal density is admissible only if

 (39) ∫s∈S[∫∞1p−1r1−ddr]−1ds=0.

The asymptotic version requires data , with . Theorem 4 implies (40): a prior is Bayes only if

 (40) ∫s∈S[∫∞1p−1r1−ddr]−1ds=0

or equivalently, almost everywhere on

 (41) ∫∞1p−1r1−ddr=∞.

If the prior density is expressed as a density on the polar co-ordinates the condition simplifies to almost everywhere on . See Strawderman and Cohen[SC71], theorem 4.4.1. For example, the prior corresponding to being uniformly distributed is Bayes in every dimension for but not Bayes for

7 When is V Bayes on bounded D?

Let D be a bounded domain with boundary in . Let denote the outward pointing normal at a point on the boundary of , assumed defined almost everywhere in , lebesgue measure on the boundary. It will be assumed that, for almost all , the inward pointing normal lies in for small enough.

Theorem 5. The covariance matrix is Bayes only if

 (42) limε→0∫s∈∂Dνiνj[∫ε01pV−1ij(s−uν)du]−1ds=0.

This is proved similarly to theorem 4, using test functions that are constant on the inward pointing normal segments.

It may happen that, for each , the normal vector at is the limit of some eigenvector of as s, (the normal eigenvector case) in which case the condition in the theorem simplifies to

 (43) ∫ε0νiνjpV−1ij(s−uν)du=∞%almostalls.

We will say that the integral condition fails at s if The theorem now states that is admissible only if the integral condition fails on a set of measure .

The admissible are those where fast enough near the boundary. If is inadmissible, we can render it admissible by attenuating near the boundary. In the normal eigenvector case, let consist of those points within of the boundary, and suppose that each such point is closest to a unique boundary point . Each such point may be written for some .

Let be an attenuation factor defined at each point in by:

 a(x) = 1 for x∈D−Dε, a(x) = 1 for x∈Dε, integral condition % holds at x(s), g(x) = ∂∂u(∫u(x)0[νiνj1pV−1ij(s(x)−wν(x))]1/2dw)2, a(x) = min[1,1−(1−g(x)g(ε)3] for x∈Dε, integral condition fails at x(s).

The proposed attenuation factor will be 1 except near boundary points where the integral condition fails, where it will approach zero. With the prior , the integral condition becomes

 (44) ∫10νiνj1apV−1ij(s−uν)du≥16g(ε)∫ε0∂∂u[log(∫u0[νiνj1pV−1ij(s−wν)dw]du=∞

9 One dimension

For the one dimensional parameter on with variance , Brown’s condition implies that is admissible if and only if

 (45) ∫a(Vp)−1=∫b(Vp)−1=∞.

Since a smooth monotone transformation renders equal to on , an equivalent result is that there exists a non-zero differentiable test function on such that if and only if either or are finite.

Jeffreys’ density is admissible on if and only if

 (46) ∫aJ=∫bJ=∞,

which means that Jeffreys must be ”improper” in both tails to be admissible. I take a certain delight in this impropriety, because although ”improper” priors abound in decision theory and in Bayesian analysis, they remain objects of suspicion. See for example the excellent review in [KW96]. However, in decision theory, the prior appears only when multiplied with a loss function, which may be arbitrarily scaled, so improper priors form a natural part of the range of procedures we need to study. Asymptotically, the prior appears only as a product with the covariance matrix in admissibility questions, and again it makes no sense to constrain priors to be improper. In the Jeffreys’ case, the admissibility of the product requires that Jeffreys be improper in the tails.

The pearson correlation coefficient computed for n bivariate normal observations with true correlation has asymptotic variance .

Thus a prior on is admissible if and only if

 (47) ∫−1((1−ρ2)2p)−1=∫1((1−ρ2)2p)−1=∞.

For priors of form , the prior p is admissible if and only if . Thus if we wish to skirt the edge of inadmissibility, we might use .

Since the Kullback-Leibler loss function does not change under smooth transformation of the parameter space, differences between the asymptotic risk functions for two priors are also invariant under such transformations. We are free to transform to a convenient in deciding admissibility problems. If a transformation takes into say , then the admissibility of in equals the admissibility of in . A prior is relatively invariant if , where is the Jacobian of the the transformation and when is one to one such that .

For example if , arbitrary rotations and scalings leave invariant, and change the covariance by a constant, so the only invariant priors are of form . From Brown’s condition, the prior p is admissible if

 (48) ∫0r1−d−αdr=∫∞r1−d−α=∞,

which occurs only when . In this case there is a single admissible invariant prior . This prior, discussed in [Br71] and [SC71], corresponds to being uniform over .

If , the invariant transformations are rotations, which require that an invariant depends only on . Admissibility requires

 (49) ∫R11pdr=∫R21pdr=∞.

Although invariance considerations no longer always apply, the above solution can be extended to general bounded with , namely . For general , define as the path length between points in the metric . Then is bounded in this metric if all paths have finite length, and we again define . We offer this merely as a suggestion for an admissible prior that flirts with inadmissiblity near the boundaries, and is consistent under transformations of the data.

11 A mixture model

Suppose that is a sample of size from the normal mixture

 (50) Y=Z+(1−B(q))x1+B(q)x2

where

The parameter lies in the domain

The density of a single observation is

 (51) f(y)={x2ϕ(y+x1)+x1ϕ(y−x2)}/(x1+x2).

The asymptotic variance V is the inverse of the information matrix of expected values of products of the score functions:

 (52) l1=−1x1+x2+ϕ(y−x2)(x1+x2)f−(y+x1)x2ϕ(y+x1)(x1+x2)f
 (53) l2=−1x1+x2+ϕ(y+x1)(x1+x2)f+(y−x2)x1ϕ(y−x2)(x1+x2)f
 (54) Lij=∫liljfdy
 (55) V=L−1

Asymptotic admissibility for the prior is determined by behaviour of Vp near the boundaries. Let

 (56) x1→0:L11 → (exp(x22)−1−x22)/x22,L12→0,L22→0, (57) x2→0:L22 → (exp(x21)−1−x21)/x21,L12→0,L11→0, (58) r→∞:L11 → s2/(s1+s2),L12→0,L22→s1/(s1+s2).

At the boundary the normal is an eigenvector at all points , and the integral condition for admissibility for that boundary is almost all which reduces to almost all . Similarly, the condition for admissibility on the boundary is almost all .

For the infinite “boundary” , the integral condition for admissibility is

 (59) limR→∞∫s∈Ssisj[∫R11pV−1ij(rs)r−1dr]−1ds=0,

where S is the intersection of the boundary of the unit circle and the upper right quadrant. Using the behavior of as , this condition becomes

 (60) ∫R11rp(sr)dr→∞ almost all s∈S.

Choosing a prior p to make pV admissible requires that

 (61) ∫101pdx1=∫101pdx2,∫∞11prdr=∞.

Roughly, we need that be of order near , of order near ,and of order near . For example, will do the job, as will many other priors with the correct behavior near the boundary. The uniform is inadmissible because it fails at and .

The plot of confidence ellipses when 1000 points are sampled from the mixture model shows how the boundaries affect asymptotic admissibility. For the boundaries and , the asymptotic variances orthogonal to the boundary in fact approach a positive limit; thus the integral of the inverse variances up to the boundary is positive rather than infinite, and the uniform density is therefore inadmissible. For the boundary at infinity, the variances are bounded away from infinity, so the integral of the inverse variance is infinite, and this boundary is admissible for a uniform prior.

12 A prior beating the uniform

It is of interest to exhibit a prior with asymptotic risk everywhere smaller than an inadmissible prior such as the the uniform in this problem. Brown’s condition exhibits a prior satisfying The asymptotic risk of , relative to the uniform, is . There are many solutions to the elliptic differential equation, depending on boundary values of . The solutions are not necessarily admissible.

We have computed solutions for the discrete approximation where each lie in the grid 0.1, 0.2,…10. The boundary values for are We set these values so that p will satisfy the conditions for admissibility at the different boundaries. The following prior is obtained by using a relaxation method to solve the finite difference form of the differential equation; at the solution, the finite difference expressions are everywhere less than .01. A similar prior was developed in [Em02].

It will be noted that the prior density approaches zero at the lower and left boundary, but not at the other two boundaries, as required by the admissibility conditions.

The risk gains against the uniform are everywhere positive ( as required by the theory), but are far greater near the low and boundaries. This is to be expected, because the prior is made admissible by changes near the boundaries, so that larger improvements in the risk should occur there.