Asymptotic Admissibility of Priors
and Elliptic Differential Equations
J.A.Hartigan ,Yale University
We evaluate priors by the second order asymptotic behaviour of the corresponding estimators.
Under certain regularity conditions, the risk differences between efficient
estimators of parameters taking values in a domain , an open connected subset of
, are asymptotically expressed as elliptic differential forms
depending on the asymptotic covariance matrix . Each efficient
estimator has the same asymptotic risk as a “local Bayes” estimate
corresponding to a prior density . The asymptotic decision theory of the
estimators identifies the smooth prior densities as admissible or
inadmissible, according to the existence of solutions to certain elliptic
differential equations. The prior is admissible if the quantity is sufficiently small
near the boundary of . We exhibit the unique admissible invariant prior for .
A detailed example is given for a normal mixture model.
A parameter takes values in a domain , an open connected subset of .
I use the partial differential equation symbol rather than the usual statistical symbol because the evaluation of asymptotic risk reduces to existence problems in the theory of partial differential equations. The parameter indexes a probability density with respect to some measure , say, for data .
We use the Kullback-Leibler loss function
to define the risk of the estimator , a function of taking values in :
For a prior density p, the posterior Bayes estimator minimizes the posterior Kullback-Leibler risk
We will assume the asymptotic covariance matrix.
Following Brown[Br79], and letting denote the prior uniform over D, we consider estimators of form for fixed decision functions Under smoothness conditions [Ha98] requiring smooth variation of the data density and the prior with , the asymptotic risks for different decision functions differ only by terms of order ; therefore we define the asymptotic risk for decision function , relative to the decision function corresponding to the uniform prior , by the assumed limit
where denotes the partial derivative .
For a prior with density , the posterior bayes estimate corresponds asymptotically to the decision function and then the risk may be expressed in elliptic operator form
It turns out that, under certain conditions of smoothness and boundedness
for and , there is a risk matching prior density p for which Thus the
behaviour of asymptotic risk for all smooth decisions is captured in the
theory of elliptic differential equations, equations whose relevance to
decision theory were first indicated in Stein[St56], but which were
extensively elucidated for the normal location problem in Brown[Br71].
See also Strawderman and Cohen[SC71]. The
asymptotic behavior of Bayes estimators near maximum likelihood estimators
have been studied for loss functions of form by Levit in [Le82], [Le83], [Le85]; in particular, he shows that the Bayes
estimators form a complete class under certain regularity conditions.
2 Risk matching priors
For each decision function we will find a risk matching prior p for which . Then we need only consider decision functions of form
and risks of form in the asymptotic decision theory. This result will be proved under boundedness and smoothness assumptions using some standard tools from Pinsky[Pi95].
For the domain with closure , a function f is uniformly Holder continuous with exponent in if
The Holder spaces consist of functions whose -th order partial derivatives are uniformly Holder continuous with exponent in . Say if is bounded and properly included in . The Holder spaces consist of functions that lie in for each .
The domain has a boundary if for each point , there is a ball B centered at and a 1-1 mapping , such that
We follow a standard approach which first proves results under the strong condition ,
and then extends the results to the weak condition by approximating by an increasing sequence of bounded subdomains .
Theorem 1: Suppose and assumption holds.
Then there exists a prior in such that
From [Pi95, theorem 5.5], for some eigenvalue , there exists
in in satisfying
it follows that
If the corresponding eigenvector provides the -matching prior
If from [Pi95, theorem 6.5], for each there exists a unique -matching solution
in , to the equation
Theorem 2: Suppose and assumption A holds.
Then there exists a prior in such that
Select an increasing sequence of
bounded domains Within each
domain, assumption holds, so there exists a sequence of solutions
in such that
For some , without loss of generality set all n. Note that any solution is also a solution to A Harnack inequality[Pi95, p 124] implies that for for some constants .
The Schauder interior estimate[Pi95,p86] implies, for some constant , for all ,
Thus, for each n, the sequence is precompact in the norm. By diagonalization, there exists a subsequence, say , that converges to in for which
Finally, we need to show that First note that , since
For any , the compact set
is covered by , and therefore by a finite subcovering, and so by a
particular . Thus for every
3 Explicit matching priors using the Feynman-Kac integral
When in equation (9), the decision function is the posterior bayes decision corresponding to the prior . We could determine the ratio by the integral over any path connecting and .
When , so that is no longer a gradient, it is plausible to attempt to find an approximating by averaging these integrals over all paths between and .
Consider the stochastic differential equation with initial condition
We propose the Feynman Kac integral formula to specify a risk matching prior:
where is the Stratonovitch stochastic integral, and T is the time to reach the boundary of D. I suspect that the condition is sufficient for the existence of the stochastic process and the integral when assumption holds . When the formula is valid, we see that is determined as a weighted combination of its boundary values, with the weight at each boundary point determined by the path integral over the various paths that reach that particular boundary point. Many different priors risk matching are available corresponding to different smooth assignments to the boundary values.
4 Decision theory for asymptotic risks
We now apply decision theoretic classifications to the asymptotic risk formula. Our conclusions about the prior will depend only on the domain and the asymptotic variance . From now on we will drop the term asymptotic. We consider a particular variance and the set of risks, real valued functions on the domain , corresponding to priors satisfying
The risks will be written .
The posterior bayes decision is locally Bayes: for any alternative decision
where and in , some integration by parts shows
Theorem 3: Under assumption B, with D bounded, the following conditions are equivalent:
is Bayes: there exists no with
is Admissible: there exists no with
is Unique Risk: there exists no with
is Brown: no non-trivial positive solves
Bayes implies Admissible because violating Admissible also
Admissible implies Unique Risk because violating Unique Risk also
Brown[Br71,1.3.9] and Unique Risk are equivalent, because if and only if
It only remains to show that failure of Bayes implies failure of Unique Risk.
Without loss of generality, assume is uniform, so that If is not Bayes, there exists with
Since and the middle integral is positive, then
Let be the lebesgue measure of the boundary when is smooth and bounded, let denote the outward pointing normals on the boundary, and note that
Applying Theorem 6.31 from [GT97], for ,
there exists a solution to the oblique derivative problem
Repeating the compactness argument of theorem 2 on , there exists with
The first condition states that and have the same risk, and the second condition guarantees that ,
so that the unique risk condition fails, as required.
5 When is Bayes on R ?
Brown’s condition shows that the locally Bayes estimate is Bayes or not depending only on the product . For example, is Bayes with , if and only if is Bayes with . We will therefore rephrase the admissibility question in terms of the product : the prior-scaled covariance matrix is Bayes on if and only if there is no non-trivial solution to Brown’s equation.
Theorem 4. Let , where ranges over the surface of the unit sphere. Define Suppose that, uniformly over ,
Then is Bayes on only if
Let be a test function with relative risk
From theorem 2 for every there exists a prior with . Thus is Bayes if and only if
for every test where the integral is defined. Equivalently, with the test ,
for every test where the integral is defined.
The possible negative term is determined by values in the neighbourhood of infinity, so being Bayes is determined by the behaviour of near the infinite boundary. In particular if two functions are identical outside a compact subset of , they have the same admissibility classification.
We therefore consider a test function that is zero inside the unit sphere:
where is twice differentiable, for for for .
Consider the contribution to the test integral for a particular :
which shows that the condition in the theorem is necessary for to be Bayes.
Failure of the condition in the theorem allows construction of an explicit test function for showing to be not Bayes. I suspect that the weaker condition
is also necessary. It may be that the condition is also sufficient. A similar condition for the recurrence of diffusion processes is given in [Ic78].
Brown[Br71] studies the admissibility of estimates for the normal location problem in dimensions in which it is assumed that the data are gaussian with unknown mean and identity covariance matrix. He shows that an estimate corresponding to the marginal density of the data is admissible if is bounded and if
Brown, Theorem 6.4.4, also shows that an estimate corresponding to the marginal density is admissible only if
The asymptotic version requires data , with . Theorem 4 implies (40): a prior is Bayes only if
or equivalently, almost everywhere on
If the prior density is expressed as a density on the polar co-ordinates
the condition simplifies to almost everywhere on . See Strawderman and Cohen[SC71], theorem 4.4.1.
For example, the prior corresponding to being uniformly
distributed is Bayes in every dimension for but not Bayes for
7 When is V Bayes on bounded D?
Let D be a bounded domain with boundary in . Let denote the outward pointing normal at a point on the boundary of , assumed defined almost everywhere in , lebesgue measure on the boundary. It will be assumed that, for almost all , the inward pointing normal lies in for small enough.
Theorem 5. The covariance matrix is Bayes only if
This is proved similarly to theorem 4, using test functions that are constant on the inward pointing normal segments.
It may happen that, for each , the normal vector at is the limit of some eigenvector of as s, (the normal eigenvector case) in which case the condition in the theorem simplifies to
We will say that the integral condition fails at s if The theorem now states that is admissible only if the integral condition fails on a set of measure .
The admissible are those where fast enough near the boundary. If is inadmissible, we can render it admissible by attenuating near the boundary. In the normal eigenvector case, let consist of those points within of the boundary, and suppose that each such point is closest to a unique boundary point . Each such point may be written for some .
Let be an attenuation factor defined at each point in by:
The proposed attenuation factor will be 1 except near boundary points where the integral condition fails, where it will approach zero. With the prior , the integral condition becomes
9 One dimension
For the one dimensional parameter on with variance , Brown’s condition implies that is admissible if and only if
Since a smooth monotone transformation renders equal to on ,
an equivalent result is that there exists a non-zero differentiable test function on
such that if and only if either or are finite.
Jeffreys’ density is admissible on if and only if
which means that Jeffreys must be ”improper” in both tails to be admissible. I take a certain delight in this impropriety, because although ”improper” priors abound in decision theory and in Bayesian analysis, they remain objects of suspicion. See for example the excellent review in [KW96]. However, in decision theory, the prior appears only when multiplied with a loss function, which may be arbitrarily scaled, so improper priors form a natural part of the range of procedures we need to study. Asymptotically, the prior appears only as a product with the covariance matrix in admissibility questions, and again it makes no sense to constrain priors to be improper. In the Jeffreys’ case, the admissibility of the product requires that Jeffreys be improper in the tails.
The pearson correlation coefficient computed for n bivariate normal observations with true correlation has asymptotic variance .
Thus a prior on is admissible if and only if
For priors of form , the prior p is admissible if
and only if . Thus if we wish to skirt the edge of
inadmissibility, we might use .
10 Invariant admissible priors
Since the Kullback-Leibler loss function does not change under smooth transformation of the parameter space,
differences between the asymptotic risk functions for two priors are also invariant under such transformations.
We are free to transform to a convenient in deciding admissibility problems.
If a transformation takes into say , then the admissibility of in equals the admissibility
of in . A prior is relatively invariant if , where is the Jacobian of the
the transformation and when is one to one such that .
For example if , arbitrary rotations and scalings leave invariant, and change the covariance by a constant, so the only invariant priors are of form . From Brown’s condition, the prior p is admissible if
which occurs only when . In this case there is a single admissible invariant prior . This prior,
discussed in [Br71] and [SC71], corresponds to being uniform over .
If , the invariant transformations are rotations, which require that an invariant depends only on . Admissibility requires
Admissibility is achieved by .
Although invariance considerations no longer always apply, the above solution can be extended to general bounded with , namely . For general , define as the path length between points in the metric . Then is bounded in this metric if all paths have finite length, and we again define . We offer this merely as a suggestion for an admissible prior that flirts with inadmissiblity near the boundaries, and is consistent under transformations of the data.
11 A mixture model
Suppose that is a sample of size from the normal mixture
The parameter lies in the domain
The density of a single observation is
The asymptotic variance V is the inverse of the information matrix of expected values of products of the score functions:
Asymptotic admissibility for the prior is determined by behaviour of Vp near the boundaries. Let
At the boundary the normal is an eigenvector at all points , and the integral condition for admissibility for that boundary is almost all which reduces to almost all . Similarly, the condition for admissibility on the boundary is almost all .
For the infinite “boundary” , the integral condition for admissibility is
where S is the intersection of the boundary of the unit circle and the upper right quadrant. Using the behavior of as , this condition becomes
Choosing a prior p to make pV admissible requires that
Roughly, we need that be of order near , of order near ,and of order near . For example, will do the job, as will many other priors with the correct behavior near the boundary. The uniform is inadmissible because it fails at and .
The plot of confidence ellipses when 1000 points are sampled from the mixture model shows how the boundaries affect asymptotic admissibility. For the boundaries and , the asymptotic variances orthogonal to the boundary in fact approach a positive limit; thus the integral of the inverse variances up to the boundary is positive rather than infinite, and the uniform density is therefore inadmissible. For the boundary at infinity, the variances are bounded away from infinity, so the integral of the inverse variance is infinite, and this boundary is admissible for a uniform prior.
12 A prior beating the uniform
It is of interest to exhibit a prior with asymptotic risk everywhere smaller than an inadmissible prior such as the the uniform in this problem. Brown’s condition exhibits a prior satisfying The asymptotic risk of , relative to the uniform, is . There are many solutions to the elliptic differential equation, depending on boundary values of . The solutions are not necessarily admissible.
We have computed solutions for the discrete approximation where each lie in the grid 0.1, 0.2,…10. The boundary values for are We set these values so that p will satisfy the conditions for admissibility at the different boundaries. The following prior is obtained by using a relaxation method to solve the finite difference form of the differential equation; at the solution, the finite difference expressions are everywhere less than .01. A similar prior was developed in [Em02].
It will be noted that the prior density approaches zero at the lower and left boundary, but not at the other two boundaries, as required by the admissibility conditions.
The risk gains against the uniform are everywhere positive ( as required by
the theory), but are far greater near the low and boundaries. This is
to be expected, because the prior is made admissible by changes near the boundaries, so
that larger improvements in the risk should occur there.