Asymptotic Admissibility of Priors

and Elliptic Differential Equations

J.A.Hartigan ,Yale University

Abstract
We evaluate priors by the second order asymptotic behaviour of the corresponding estimators.
Under certain regularity conditions, the risk differences between efficient
estimators of parameters taking values in a domain , an open connected subset of
, are asymptotically expressed as elliptic differential forms
depending on the asymptotic covariance matrix . Each efficient
estimator has the same asymptotic risk as a “local Bayes” estimate
corresponding to a prior density . The asymptotic decision theory of the
estimators identifies the smooth prior densities as admissible or
inadmissible, according to the existence of solutions to certain elliptic
differential equations. The prior is admissible if the quantity is sufficiently small
near the boundary of . We exhibit the unique admissible invariant prior for .
A detailed example is given for a normal mixture model.

1 Introduction

A parameter takes values in a domain , an open connected subset of .

I use the partial differential equation symbol rather than the usual statistical
symbol because the evaluation of asymptotic risk
reduces to existence problems in the theory of partial differential
equations. The parameter indexes a probability density with respect to some measure , say, for data .

We use the Kullback-Leibler loss function

(1) |

to define the risk of the estimator , a function of taking values in :

(2) |

For a prior density p, the posterior Bayes estimator minimizes the posterior Kullback-Leibler risk

(3) |

Define

(4) |

We will assume the asymptotic covariance matrix.

Following Brown[Br79], and letting denote the prior uniform over D, we consider estimators of form for fixed decision functions Under smoothness conditions [Ha98] requiring smooth variation of the data density and the prior with , the asymptotic risks for different decision functions differ only by terms of order ; therefore we define the asymptotic risk for decision function , relative to the decision function corresponding to the uniform prior , by the assumed limit

(5) |

where denotes the partial derivative .

For a prior with density , the posterior bayes estimate corresponds
asymptotically to the decision function and then
the risk may be expressed in elliptic operator form

(6) |

It turns out that, under certain conditions of smoothness and boundedness
for and , there is a risk matching prior density p for which Thus the
behaviour of asymptotic risk for all smooth decisions is captured in the
theory of elliptic differential equations, equations whose relevance to
decision theory were first indicated in Stein[St56], but which were
extensively elucidated for the normal location problem in Brown[Br71].
See also Strawderman and Cohen[SC71]. The
asymptotic behavior of Bayes estimators near maximum likelihood estimators
have been studied for loss functions of form by Levit in [Le82], [Le83], [Le85]; in particular, he shows that the Bayes
estimators form a complete class under certain regularity conditions.

2 Risk matching priors

For each decision function we will find a risk matching prior p for
which . Then we need only consider decision functions of form

and risks of form in the asymptotic
decision theory. This result will be proved under boundedness and smoothness
assumptions using some standard tools from Pinsky[Pi95].

For the domain with closure , a function f is uniformly
Holder continuous with exponent in if

(7) |

The Holder spaces consist of functions whose -th order partial derivatives are uniformly Holder continuous with exponent in . Say if is bounded and properly included in . The Holder spaces consist of functions that lie in for each .

The domain has a boundary if for each point , there is a ball B centered at and a 1-1 mapping , such that

We follow a standard approach which first proves results under the strong condition ,
and then extends the results to the weak condition by approximating by an increasing sequence of bounded subdomains .

Theorem 1: Suppose and assumption holds.

Then there exists a prior in such that

(8) |

Proof:

From [Pi95, theorem 5.5], for some eigenvalue , there exists

in in satisfying

(9) |

Since

(10) | |||||

(11) |

it follows that
If the corresponding eigenvector provides the -matching prior
with
If from [Pi95, theorem 6.5], for each there exists a unique -matching solution

in , to the
equation

Theorem 2: Suppose and assumption A holds.

Then there exists a prior in such that

(12) |

Proof:

Select an increasing sequence of
bounded domains Within each
domain, assumption holds, so there exists a sequence of solutions

in such
that

For some , without loss of generality set all n. Note that any solution
is also a solution to
A Harnack inequality[Pi95, p 124] implies that for for some constants .

The Schauder interior estimate[Pi95,p86]
implies, for some constant , for all ,

(13) |

Thus, for each n, the sequence is precompact in the norm. By diagonalization, there exists a subsequence, say , that converges to in for which

(14) |

Finally, we need to show that First note that , since

(15) |

For any , the compact set
is covered by , and therefore by a finite subcovering, and so by a
particular . Thus for every
which implies

3 Explicit matching priors using the Feynman-Kac integral

When in equation (9), the decision function is the posterior bayes
decision corresponding to the prior . We could
determine the ratio
by the integral
over any path connecting and .

When , so that is no longer a gradient, it is plausible
to attempt to find an approximating by averaging these integrals over all
paths between and .

Consider the stochastic differential equation with initial condition

(16) |

We propose the Feynman Kac integral formula to specify a risk matching prior:

(17) |

where is the Stratonovitch stochastic integral, and T is the time to reach the boundary of D. I suspect that the condition is sufficient for the existence of the stochastic process and the integral when assumption holds . When the formula is valid, we see that is determined as a weighted combination of its boundary values, with the weight at each boundary point determined by the path integral over the various paths that reach that particular boundary point. Many different priors risk matching are available corresponding to different smooth assignments to the boundary values.

4 Decision theory for asymptotic risks

We now apply decision theoretic classifications to the asymptotic risk
formula. Our conclusions about the prior will depend only on the domain
and the asymptotic variance . From now on we will drop the term asymptotic.
We consider a particular variance and the set of risks, real valued
functions on the domain , corresponding to priors satisfying

(18) |

The risks will be written .

The posterior bayes decision is locally Bayes: for any alternative decision

where and in , some integration by parts shows

(19) |

Theorem 3: Under assumption B, with D bounded, the following conditions are equivalent:

is Bayes: there exists no with

is Admissible: there exists no with

is Unique Risk: there exists no with

is Brown: no non-trivial positive solves

Proof:

Bayes implies Admissible because violating Admissible also
violates Bayes.
Admissible implies Unique Risk because violating Unique Risk also
violates Admissible.
Brown[Br71,1.3.9] and Unique Risk are equivalent, because if and only if

It only remains to show that failure of Bayes
implies failure of Unique Risk.

Without loss of generality, assume is uniform, so that If is not Bayes, there exists with

(20) |

Since and the middle integral is positive, then

(21) |

Let be the lebesgue measure of the boundary when is smooth and bounded, let denote the outward pointing normals on the boundary, and note that

(22) |

Applying Theorem 6.31 from [GT97], for ,
since ,

there exists a solution
to the oblique derivative problem

(23) |

so that

Repeating the compactness argument of theorem 2 on , there exists with

(24) |

The first condition states that and have the same risk, and the second condition guarantees that ,
so that the unique risk condition fails, as required.

5 When is Bayes on R ?

Brown’s condition shows that the locally Bayes estimate is Bayes or not depending only on the product . For example, is Bayes with , if and only if is Bayes with . We will therefore rephrase the admissibility question in terms of the product : the prior-scaled covariance matrix is Bayes on if and only if there is no non-trivial solution to Brown’s equation.

Theorem 4. Let , where ranges over the surface of the unit
sphere. Define
Suppose that, uniformly over ,

(25) |

Then is Bayes on only if

(26) |

Proof:

Let be a test function with
relative risk

(27) |

From theorem 2 for every there exists a prior with . Thus is Bayes if and only if

(28) |

for every test where the integral is defined. Equivalently, with the test ,

(29) |

for every test where the integral is defined.

The possible negative term is determined by values
in the neighbourhood of infinity, so being Bayes is determined by the
behaviour of near the infinite boundary. In particular if two functions are identical outside a compact subset of ,
they have the same admissibility classification.

We therefore consider a test function that
is zero inside the unit sphere:

(30) |

where is twice differentiable, for for for .

(31) |

Consider the contribution to the test integral for a particular :

(32) | |||||

(33) | |||||

(34) | |||||

(35) |

Thus

(36) |

unless

(37) |

which shows that the condition in the theorem is necessary for to be Bayes.

Failure of the condition in the theorem allows construction of an explicit test function for showing to be not Bayes. I suspect that the weaker condition

is also necessary.
It may be that the condition is also sufficient.
A similar condition for the recurrence of diffusion processes is given in [Ic78].

Brown[Br71] studies the admissibility of estimates for the normal location
problem in dimensions in which it is assumed that the data are gaussian with unknown mean and identity covariance matrix.
He shows that an estimate corresponding to the marginal
density of the data is admissible if is bounded and if

(38) |

Brown, Theorem 6.4.4, also shows that an estimate corresponding to the marginal density is admissible only if

(39) |

The asymptotic version requires data , with . Theorem 4 implies (40): a prior is Bayes only if

(40) |

or equivalently, almost everywhere on

(41) |

If the prior density is expressed as a density on the polar co-ordinates
the condition simplifies to almost everywhere on . See Strawderman and Cohen[SC71], theorem 4.4.1.
For example, the prior corresponding to being uniformly
distributed is Bayes in every dimension for but not Bayes for

7 When is V Bayes on bounded D?

Let D be a bounded domain with boundary in . Let denote the outward pointing normal at a point on the boundary
of , assumed defined almost everywhere in , lebesgue measure on the boundary. It will be assumed
that, for almost all , the inward pointing normal lies in
for small enough.

Theorem 5. The covariance matrix is Bayes only if

(42) |

This is proved similarly to theorem 4, using test functions that are constant on the inward pointing normal segments.

It may happen that, for each , the normal vector at is the limit of some eigenvector of as s, (the normal eigenvector case) in which case the condition in the theorem simplifies to

(43) |

We will say that the integral condition fails at s if The theorem now states that is admissible only if the integral condition fails on a set of measure .

The admissible are those where fast enough near the boundary. If is inadmissible, we can render it admissible by attenuating near the boundary. In the normal eigenvector case, let consist of those points within of the boundary, and suppose that each such point is closest to a unique boundary point . Each such point may be written for some .

Let be an attenuation factor defined at each point in by:

The proposed attenuation factor will be 1 except near boundary points where the integral condition fails, where it will approach zero. With the prior , the integral condition becomes

(44) |

9 One dimension

For the one dimensional parameter on with variance , Brown’s
condition implies that is admissible if and only if

(45) |

Since a smooth monotone transformation renders equal to on ,
an equivalent result is that there exists a non-zero differentiable test function on
such that if and only if either or are finite.

Jeffreys’ density is admissible on if and only if

(46) |

which means that Jeffreys must be ”improper” in both tails to be admissible. I take a certain delight in this impropriety, because although ”improper” priors abound in decision theory and in Bayesian analysis, they remain objects of suspicion. See for example the excellent review in [KW96]. However, in decision theory, the prior appears only when multiplied with a loss function, which may be arbitrarily scaled, so improper priors form a natural part of the range of procedures we need to study. Asymptotically, the prior appears only as a product with the covariance matrix in admissibility questions, and again it makes no sense to constrain priors to be improper. In the Jeffreys’ case, the admissibility of the product requires that Jeffreys be improper in the tails.

The pearson correlation coefficient computed for n bivariate normal observations with true correlation has asymptotic variance .

Thus a prior on is admissible if and only if

(47) |

For priors of form , the prior p is admissible if
and only if . Thus if we wish to skirt the edge of
inadmissibility, we might use .

10 Invariant admissible priors

Since the Kullback-Leibler loss function does not change under smooth transformation of the parameter space,
differences between the asymptotic risk functions for two priors are also invariant under such transformations.
We are free to transform to a convenient in deciding admissibility problems.
If a transformation takes into say , then the admissibility of in equals the admissibility
of in . A prior is relatively invariant if , where is the Jacobian of the
the transformation and when is one to one such that .

For example if , arbitrary rotations and scalings leave invariant, and change the covariance by a constant, so
the only invariant priors are of form . From Brown’s condition, the prior p is admissible if

(48) |

which occurs only when . In this case there is a single admissible invariant prior . This prior,
discussed in [Br71] and [SC71], corresponds to being uniform over .

If , the invariant transformations are rotations, which require that an invariant depends only
on . Admissibility requires

(49) |

Admissibility is achieved by .

Although invariance considerations no longer always apply, the above solution can be extended to general
bounded with , namely . For general , define as the path
length between points in the metric . Then is bounded in this metric if all
paths have finite length, and we again define .
We offer this merely as a suggestion for an admissible prior that flirts with inadmissiblity near the boundaries, and is consistent
under transformations of the data.

11 A mixture model

Suppose that is a sample of size from the normal
mixture

(50) |

where

The parameter lies in the domain

The density of a single observation is

(51) |

The asymptotic variance V is the inverse of the information matrix of expected values of products of the score functions:

(52) |

(53) |

(54) |

(55) |

Asymptotic admissibility for the prior is determined by behaviour of Vp near the boundaries. Let

(56) | |||||

(57) | |||||

(58) |

At the boundary the normal is an eigenvector at all points , and the integral condition for admissibility for that boundary is almost all which reduces to almost all . Similarly, the condition for admissibility on the boundary is almost all .

For the infinite “boundary” , the integral condition for admissibility is

(59) |

where S is the intersection of the boundary of the unit circle and the upper right quadrant. Using the behavior of as , this condition becomes

(60) |

Choosing a prior p to make pV admissible requires that

(61) |

Roughly, we need that be of order near , of order near ,and of order near . For example, will do the job, as will many other priors with the correct behavior near the boundary. The uniform is inadmissible because it fails at and .

The plot of confidence ellipses when 1000 points are sampled from the mixture model shows how the boundaries affect asymptotic admissibility. For the boundaries and , the asymptotic variances orthogonal to the boundary in fact approach a positive limit; thus the integral of the inverse variances up to the boundary is positive rather than infinite, and the uniform density is therefore inadmissible. For the boundary at infinity, the variances are bounded away from infinity, so the integral of the inverse variance is infinite, and this boundary is admissible for a uniform prior.

12 A prior beating the uniform

It is of interest to exhibit a prior with asymptotic risk everywhere smaller
than an inadmissible prior such as the the uniform in this problem. Brown’s condition exhibits
a prior satisfying The asymptotic risk of
, relative to the uniform, is .
There are many solutions to the elliptic differential equation, depending on
boundary values of . The solutions are not necessarily admissible.

We have computed solutions for the discrete approximation where each lie in the grid 0.1, 0.2,…10. The boundary values for are We set these values so that p will satisfy the conditions for admissibility at the different boundaries. The following prior is obtained by using a relaxation method to solve the finite difference form of the differential equation; at the solution, the finite difference expressions are everywhere less than .01. A similar prior was developed in [Em02].

It will be noted that the prior density approaches zero at the lower and left boundary, but not at the other two boundaries, as required by the admissibility conditions.

The risk gains against the uniform are everywhere positive ( as required by
the theory), but are far greater near the low and boundaries. This is
to be expected, because the prior is made admissible by changes near the boundaries, so
that larger improvements in the risk should occur there.