A Lower Bound on the Bayesian MSE Based on the Optimal Bias Function

# A Lower Bound on the Bayesian MSE Based on the Optimal Bias Function

Zvika Ben-Haim, and Yonina C. Eldar,  The authors are with the Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel (e-mail: zvikabh@technion.ac.il; yonina@ee.technion.ac.il). This work was supported in part by the Israel Science Foundation under Grant no. 1081/07 and by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM++ (contract no. 216715).
###### Abstract

A lower bound on the minimum mean-squared error (MSE) in a Bayesian estimation problem is proposed in this paper. This bound utilizes a well-known connection to the deterministic estimation setting. Using the prior distribution, the bias function which minimizes the Cramér–Rao bound can be determined, resulting in a lower bound on the Bayesian MSE. The bound is developed for the general case of a vector parameter with an arbitrary probability distribution, and is shown to be asymptotically tight in both the high and low signal-to-noise ratio regimes. A numerical study demonstrates several cases in which the proposed technique is both simpler to compute and tighter than alternative methods.

{keywords}

Bayesian bounds, Bayesian estimation, minimum mean-squared error estimation, optimal bias, performance bounds.

## I Introduction

The goal of estimation theory is to infer the value of an unknown parameter based on observations. A common approach to this problem is the Bayesian framework, in which the estimate is constructed by combining the measurements with prior information about the parameter . In this setting, the parameter is random, and its distribution describes the a priori knowledge of the unknown value. In addition, measurements are obtained, whose conditional distribution, given , provides further information about the parameter. The objective is to construct an estimator , which is a function of the measurements, so that is close to in some sense. A common measure of the quality of an estimator is its mean-squared error (MSE), given by .

It is well-known that the posterior mean is the technique minimizing the MSE. Thus, from a theoretical perspective, there is no difficulty in finding the minimum MSE (MMSE) estimator in any given problem. In practice, however, the complexity of computing the posterior mean is often prohibitive. As a result, various alternatives, such as the maximum a posteriori (MAP) technique, have been developed . The purpose of such methods is to approach the performance of the MMSE estimator with a computationally efficient algorithm.

An important goal is to quantify the performance degradation resulting from the use of these suboptimal techniques. One way to do this is to compare the MSE of the method used in practice with the MMSE. Unfortunately, computation of the MMSE is itself infeasible in many cases. This has led to a large body of work seeking to find simple lower bounds on the MMSE in various estimation problems [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].

Generally speaking, previous bounds can be divided into two categories. The Weiss–Weinstein family is based on a covariance inequality and includes the Bayesian Cramér–Rao bound , the Bobrovski–Zakai bound , and the Weiss–Weinstein bound [9, 10]. The Ziv–Zakai family of bounds is based on comparing the estimation problem to a related detection scenario. This family includes the Ziv–Zakai bound  and its improvements, notably the Bellini–Tartara bound , the Chazan–Zakai–Ziv bound , and the generalization of Bell et al. . Recently, Renaux et al. have combined both approaches .

The accuracy of the bounds described above is usually tested numerically in particular estimation settings. Few of the previous results provide any sort of analytical proof of accuracy, even under asymptotic conditions. Bellini and Tartara  briefly discuss performance of their bound at high signal-to-noise ratio (SNR), and Bell et al.  prove that their bound converges to the true value at low SNR for a particular family of Gaussian-like probability distributions. To the best of our knowledge, there are no other results concerning the asymptotic performance of Bayesian bounds.

A different estimation setting arises when one considers as a deterministic unknown parameter. In this case, too, a common goal is to construct an estimator having low MSE. However, the term MSE has a very different meaning in the deterministic setting, since in this case, the expectation is taken only over the random variable . One elementary difference with far-reaching implications is that in the Bayesian case, the MSE is a single real number, whereas the deterministic MSE is a function of the unknown parameter [13, 14, 15].

Many lower bounds have been developed for the deterministic setting, as well. These include classical results such as the Cramér–Rao [16, 17], Hammersley–Chapman–Robbins [18, 19], Bhattacharya , and Barankin  bounds, as well as more recent results [22, 23, 24, 25, 26, 27]. By far the simplest and most commonly used of these approaches is the Cramér–Rao bound (CRB). Like most other deterministic bounds, the CRB deals explicitly with unbiased estimators, or, equivalently, with estimators having a specific, pre-specified bias function. Two exceptions are the uniform CRB [23, 25] and the minimax linear-bias bound [26, 27]. The CRB is known to be asymptotically tight in many cases, even though many later bounds are sharper than it [28, 25, 14].

Although the deterministic and Bayesian settings stem from different points of view, there exist insightful relations between the two approaches. The basis for this connection is the fact that by adding a prior distribution for , any deterministic problem can be transformed to a corresponding Bayesian setting. Several theorems relate the performance of corresponding Bayesian and deterministic scenarios . As a consequence, numerous bounds have both a deterministic and a Bayesian version [3, 10, 12, 29].

The simplicity and asymptotic tightness of the deterministic CRB motivate its use in problems in which is random. Such an application was described by Young and Westerberg , who considered the case of a scalar constrained to the interval . They used the prior distribution of to determine the optimal bias function for use in the biased CRB, and thus obtained a Bayesian bound. It should be noted that this result differs from the Bayesian CRB of Van Trees ; the two bounds are compared in Section II-C. We refer to the result of Young and Westerberg as the optimal-bias bound (OBB), since it is based on choosing the bias function which optimizes the CRB using the given prior distribution.

This paper provides an extension and a deeper analysis of the OBB. Specifically, we generalize the bound to an arbitrary -dimensional estimation setting . The bound is determined by finding the solution to a certain partial differential equation. Using tools from functional analysis, we demonstrate that a unique solution exists for this differential equation. Under suitable symmetry conditions, it is shown that the method can be reduced to the solution of an ordinary differential equation and, in some cases, presented in closed form.

The mathematical tools employed in this paper are also used for characterizing the performance of the OBB. Specifically, it is demonstrated analytically that the proposed bound is asymptotically tight for both high and low SNR values. Furthermore, the OBB is compared with several other bounds; in the examples considered, the OBB is both simpler computationally and more accurate than all relevant alternatives.

The remainder of this paper is organized as follows. In Section II, we derive the OBB for a vector parameter. Section III discusses some mathematical concepts required to ensure the existence of the OBB. In Section IV, a practical technique for calculating the bound is developed using variational calculus. In Section V, we demonstrate some properties of the OBB, including its asymptotic tightness. Finally, in Section VI, we compare the performance of the bound with that of other relevant techniques.

## Ii The Optimal-Bias Bound

In this section, we derive the OBB for the general vector case. To this end, we first examine the relation between the Bayesian and deterministic estimation settings (Section II-A). Next, we focus on the deterministic case and review the basic properties of the CRB (Section II-B). Finally, the OBB is derived from the CRB (Section II-C).

The focus of this paper is the Bayesian estimation problem, but the bound we propose stems from the theory of deterministic estimation. To avoid confusion, we will indicate that a particular quantity refers to the deterministic setting by appending the symbol to it. For example, the notation denotes expectation over both and , i.e., expectation in the Bayesian sense, while expectation solely over (in the deterministic setting) is denoted by . The notation indicates Bayesian expectation conditioned on .

Some further notation used throughout the paper is as follows. Lowercase boldface letters signify vectors and uppercase boldface letters indicate matrices. The th component of a vector is denoted , while signifies a sequence of vectors. The derivative of a function is a vector function whose th element is . Similarly, given a vector function , the derivative is defined as the matrix function whose th entry is . The squared Euclidean norm of a vector is denoted , while the squared Frobenius norm of a matrix is denoted . In Section III, we will also define some functional norms, which will be of use later in the paper.

### Ii-a The Bayesian–Deterministic Connection

We now review a fundamental relation between the Bayesian and deterministic estimation settings. Let be an unknown random vector in and let be a measurement vector. The joint probability density function (pdf) of and is , where is the prior distribution of and is the conditional distribution of given . For later use, define the set of feasible parameter values by

 Θ={θ∈Rn:pθ(θ)>0}. (1)

Suppose is an estimator of . Its (Bayesian) MSE is given by

 MSE=E{∥^θ−θ∥2}=∫∥^θ−θ∥2px,θ(x,θ)dxdθ. (2)

By the law of total expectation, we have

 MSE =∫(∫∥^θ−θ∥2px|θ(x|θ)dx)pθ(θ)dθ =E{E{∥^θ−θ∥2∣∣θ}}. (3)

Now consider a deterministic estimation setting, i.e., suppose is a deterministic unknown which is to be estimated from random measurements . Let the distribution of (as a function of ) be given by , i.e., the distribution of in the deterministic case equals the conditional distribution in the corresponding Bayesian problem.

The estimator defined above is simply a function of the measurements, and can therefore be applied in the deterministic case as well. Its deterministic MSE is given by

 E{∥^θ−θ∥2;θ}=∫∥^θ−θ∥2px;θ(x;θ)dx (4)

Since , we have

 E{∥^θ−θ∥2;θ}=E{∥^θ−θ∥2∣∣θ}. (5)

Combining this fact with (II-A), we find that the Bayesian MSE equals the expectation of the MSE of the corresponding deterministic problem, i.e.

 E{∥^θ−θ∥2}=E{E{∥^θ−θ∥2;θ}}. (6)

This relation will be used to construct the OBB in Section II-C.

### Ii-B The Deterministic Cramér–Rao Bound

Before developing the OBB, we review some basic results in the deterministic estimation setting. Suppose is a deterministic parameter vector and let be a measurement vector having pdf . Denote by the set of all possible values of . We assume for technical reasons that is an open set.111This is required in order to ensure that one can discuss differentiability of with respect to at any point . In the Bayesian setting to which we will return in Section II-C, is defined by (1); in this case, adding a boundary to essentially leaves the setting unchanged, as long as the prior probability for to be on the boundary of is zero. Therefore, this requirement is of little practical relevance.

Let be an estimator of from the measurements . We require the following regularity conditions to ensure that the CRB holds [31, §3.1.3].

1. is continuously differentiable with respect to . This condition is required to ensure the existence of the Fisher information.

2. The Fisher information matrix , defined by

 [J(θ)]ij=E{∂logpx;θ∂θi∂logpx;θ∂θj;θ} (7)

is bounded and positive definite for all . This ensures that the measurements contain data about the unknown parameter.

3. Exchanging the integral and derivative in the equation

 ∫t(x)∂∂θipx;θ(x;θ)dx=∂∂θi∫t(x)px;θ(x;θ)dx (8)

is justified for any measurable function , in the sense that, if one side exists, then the other exists and the two sides are equal. A sufficient condition for this to hold is that the support of does not depend on .

4. All estimators are Borel measurable functions which satisfy

 ∥∥∥∂px;θ∂θ^θT∥∥∥F≤g(x) for all θ (9)

for some integrable function . This technical requirement is needed in order to exclude certain pathological estimators whose statistical behavior is insufficiently smooth to allow the application of the CRB.

The bias of an estimator is defined as

 b(θ)=E{^θ;θ}−θ. (10)

Under the above assumptions, it can be shown that the bias of any estimator is continuously differentiable [5, Lemma 2]. Furthermore, under these assumptions, the CRB holds, and thus, for any estimator having bias , we have

 E {∥θ−^θ∥2;θ}≥CRB[b,θ] ≜Tr[(I+∂b∂θ)J−1(θ)(I+∂b∂θ)T]+∥b(θ)∥2. (11)

A more common form of the CRB is obtained by restricting attention to unbiased estimators (i.e., techniques for which ). Under the unbiasedness assumption, the bound simplifies to . However, in the sequel we will make use of the general form (II-B).

### Ii-C A Bayesian Bound from the CRB

The OBB of Young and Westerberg  is based on applying the Bayesian–deterministic connection described in Section II-A to the deterministic CRB (II-B). Specifically, returning now to the Bayesian setting, one can combine (6) and (II-B) to obtain that, for any estimator with bias function ,

 E{∥θ−^θ∥2}≥Z[b]≜∫ΘCRB[b,θ]pθ(dθ) (12)

where the expectation is now performed over both and . Note that (12) describes the Bayesian MSE as a function of a deterministic property (the bias) of . Since any estimator has some bias function, and since all bias functions are continuously differentiable in our setting, minimizing over all continuously differentiable functions yields a lower bound on the MSE of any Bayesian estimator. Thus, under the regularity conditions of Section II-B, a lower bound on the Bayesian MSE is given by

 s=infb∈C1∫Θ[∥b(θ)∥2+Tr((I+∂b∂θ)J−1(θ)(I+∂b∂θ)T)]pθ(dθ) (13)

where is the space of continuously differentiable functions .

Note that the OBB differs from the Bayesian CRB of Van Trees . Van Trees’ result is based on applying the Cauchy–Schwarz inequality to the joint pdf , whereas the deterministic CRB is based on applying a similar procedure to . As a consequence, the regularity conditions required for the Bayesian CRB are stricter, requiring that be twice differentiable with respect to . By contrast, the OBB requires differentiability only of the conditional pdf . An example in which this difference is important is the case in which the prior distribution is discontinuous, e.g., when is uniform. The performance of the OBB in this setting will be examined in Section VI.

In the next section, we will see that it is advantageous to perform the minimization (13) over a somewhat modified class of functions. This will allow us to prove the unique existence of a solution to the optimization problem, a result which will be of use when examining the properties of the bound later in the paper.

## Iii Mathematical Safeguards

In the previous section, we saw that a lower bound on the MMSE can be obtained by solving the minimization problem (13). However, at this point, we have no guarantee that the solution of (13) is anywhere near the true value of the MMSE. Indeed, at first sight, it may appear that for any estimation setting. To see this, note that is a sum of two components, a bias gradient part and a squared bias part. Both parts are nonnegative, but the former is zero when the bias gradient is , while the latter is zero when the bias is zero. No differentiable function satisfies these two constraints simultaneously for all , since if the squared bias is everywhere zero, then the bias gradient is also zero. However, it is possible to construct a sequence of functions for which both the bias gradient and the squared bias norm tend to zero for almost every value of . An example of such a sequence in a one-dimensional setting is plotted in Fig. 1. Here, a sequence of smooth, periodic functions is presented. The function period tends to zero, and the percentage of the cycle in which the derivative equals increases as increases. Thus, the pointwise limit of the function sequence is zero almost everywhere, and the pointwise limit of the derivative is almost everywhere. Fig. 1: A sequence of continuous functions for which both |b(θ)|2 and |1+b′(θ)|2 tend to zero for almost every value of θ.

In the specific case shown in Fig. 1, it can be shown that the value of does not tend to zero; in fact, tends to infinity in this situation. However, our example illustrates that care must be taken when applying concepts from finite-dimensional optimization problems to variational calculus.

The purpose of this section is to show that , so that the bound is meaningful, for any problem setting satisfying the regularity conditions of Section II-B. (This question was not addressed by Young and Westerberg .) While doing so, we develop some abstract concepts which will also be used when analyzing the asymptotic properties of the OBB in Section V.

As often happens with variational problems, it turns out that the minimum of (13) is not necessarily achieved by any continuously differentiable function. In order to guarantee an achievable minimum, one must instead minimize (13) over a slightly modified space, which is defined below. As explained in Section II-B, all bias functions are continuously differentiable, so that the minimizing function ultimately obtained, if it is not differentiable, will not be the bias of any estimator. However, as we will see, the minimum value of our new optimization problem is identical to the infimum of (13). Furthermore, this approach allows us to demonstrate several important theoretical properties of the OBB.

Let be the space of -measurable functions such that

 ∫Θ∥b(θ)∥2pθ(dθ)<∞. (14)

Define the associated inner product

 ⟨b(1),b(2)⟩L2≜n∑i=1∫Θb(1)i(θ)b(2)i(θ)pθ(dθ) (15)

and the corresponding norm . Any function has a derivative in the distributional sense, but this derivative might not be a function. For example, discontinuous functions have distributional derivatives which contain a Dirac delta. If, for every , the distributional derivative of is a function in , then is said to be weakly differentiable , and its weak derivative is the matrix function . Roughly speaking, a function is weakly differentiable if it is continuous and its derivative exists almost everywhere.

The space of all weakly differentiable functions in is called the first-order Sobolev space , and is denoted . Define an inner product on as

 (16)

The associated norm is . An important property which will be used extensively in our analysis is that is a Hilbert space.

Note that since is an open set, not all functions in are in . For example, in the case , the function , for some nonzero constant , is continuously differentiable but not integrable. Thus is in but not in , nor even in . However, any measurable function which is not in has , meaning that either or has infinite norm. Consequently, either the bias norm part or the bias gradient part of is infinite. It follows that performing the minimization (13) over , rather than over , does not change the minimum value. On the other hand, is dense in , and is continuous, so that minimizing (13) over rather than also does not alter the minimum. Consequently, we will henceforth consider the problem

 s=infb∈H1Z[b]. (17)

The advantage of including weakly differentiable functions in the minimization is that a unique minimizer can now be guaranteed, as demonstrated by the following result.

###### Proposition 1

Consider the problem

 ¯b=argminb∈H1Z[b] (18)

where is given by (12) and is positive definite and bounded with probability 1. This problem is well-defined, i.e., there exists a unique which minimizes . Furthermore, the minimum value is finite and nonzero.

Proving the unique existence of a minimizer for (17) is a technical exercise in functional analysis which can be found in Appendix B. However, once the existence of such a minimizer is demonstrated, it is not difficult to see that . To see that , we must find a function for which . One such function is , for which is finite since is bounded. Now suppose by contradiction that , which implies that there exists a function such that . Therefore, both the bias gradient and the squared bias parts of are zero. In particular, since the squared bias part equals zero, we have . Hence, , because is a normed space. But then, by the definition (12) of ,

 Z[¯b]=∫ΘTr(J−1(θ))pθ(dθ) (19)

which is positive; this is a contradiction.

Note that functions in are defined up to changes on a set having zero measure. In particular, the fact that is unique does not preclude functions which are identical to almost everywhere (which obviously have the same value ).

Summarizing the discussion of the last two sections, we have the following theorem.

###### Theorem 1

Let be an unknown random vector with pdf over the open set , and let be a measurement vector whose pdf, conditioned on , is given by . Assume the regularity conditions of Section II-B hold. Then, for any estimator ,

 E{∥θ−^θ∥2}≥minb∈H1∫ΘCRB[b,θ]pθ(θ)dθ. (20)

The minimum in (20) is nonzero and finite. Furthermore, this minimum is achieved by a function , which is unique up to changes having zero probability.

Two remarks are in order concerning Theorem 1. First, the function solving (20) might not be the bias of any estimator; indeed, under our assumptions, all bias functions are continuously differentiable, whereas need only be weakly differentiable. Nevertheless, (20) is still a lower bound on the MMSE. Another important observation is that Theorem 1 arises from the deterministic CRB; hence, there are no requirements on the prior distribution . In particular, can be discontinuous or have bounded support. By contrast, many previous Bayesian bounds do not apply in such circumstances.

## Iv Calculating the Bound

In finite-dimensional convex optimization problems, the requirement of a vanishing first derivative results in a set of equations, whose solution is the global minimum. Analogously, in the case of convex functional optimization problems such as (20), the optimum is given by the solution of a set of differential equations. The following theorem, whose proof can be found in Appendix C, specifies the differential equation relevant to our optimization problem.

In this section and in the remainder of the paper, we will consider the case in which the set is bounded. From a practical point of view, even when consists of the entire set , it can be approximated by a bounded set containing only those values of for which .

###### Theorem 2

Under the conditions of Theorem 1, suppose is a bounded subset of with a smooth boundary . Then, the optimal of (20) is given by the solution to the system of partial differential equations

 pθ(θ)bi(θ)=pθ(θ)∑j,k∂2bi∂θj∂θk(J−1)jk +∑j,k(δik+∂bi∂θk)((J−1)jk∂pθ∂θj+pθ(θ)∂(J−1)jk∂θj) (21)

for , within the range , which satisfies the Neumann boundary condition

 (I+∂b∂θ)J−1ν(θ)=0 (22)

for all points . Here, is a normal to the boundary at . All derivatives in this system of equations are to be interpreted in the weak sense.

Note that Theorem 1 guarantees the existence of a unique solution in to the differential equation (2) with the boundary conditions (22).

The bound of Young and Westerberg  is a special case of Theorem 2, and is given here for completeness.

###### Corollary 1

Under the settings of Theorem 1, suppose is a bounded interval in . Then, the bias function minimizing (20) is a solution to the second-order ordinary differential equation

 J(θ)b(θ)=b′′(θ)+(1+b′(θ))(dlogpθdθ−dlogJdθ) (23)

within the range , subject to the boundary conditions .

Theorem 2 can be solved numerically, thus obtaining a bound for any problem satisfying the regularity conditions. However, directly solving (2) becomes increasingly complex as the dimension of the problem increases. Instead, in many cases, symmetry relations in the problem can be used to simplify the solution. As an example, the following spherically symmetric case can be reduced to a one-dimensional setting equivalent to that of Corollary 1. The proof of this theorem can be found in Appendix D.

###### Theorem 3

Under the setting of Theorem 1, suppose that is a sphere centered on the origin, is spherically symmetric, and , where is a scalar function. Then, the optimal-bias bound (20) is given by

 E{∥θ−^θ∥2} ≥2πn/2Γ(n/2)∫r0[b2(ρ)+(1+b′(ρ))2J(ρ) +n−1J(ρ)(1+b(ρ)ρ)2]q(ρ)ρn−1dρ. (24)

Here, is the Gamma function, and is a solution to the ODE

 J(θ)b(θ)=b′′(θ) +(n−1)(b′(θ)θ−b(θ)θ2) +(1+b′(θ))(dlogqdθ−dlogJdθ) (25)

subject to the boundary conditions , . The bias function for which the bound is achieved is given by

 b(θ)=b(∥θ∥)θ∥θ∥. (26)

In this theorem, the requirement indicates that the Fisher information matrix is diagonal and that its components are spherically symmetric. Parameters having a diagonal matrix are sometimes referred to as orthogonal. The simplest case of orthogonality occurs when, to each parameter , there corresponds a measurement , in such a way that the random variables are independent. Other orthogonal scenarios can often be constructed by an appropriate parametrization .

The requirement that have spherically symmetric components occurs, for example, in location problems, i.e., situations in which the measurements have the form , where is additive noise which is independent of . Indeed, under such conditions, is constant in [31, §3.1.3]. If, in addition, the noise components are independent, then this setting also satisfies the orthogonality requirement, and thus application of Theorem 3 is appropriate. Note that this estimation problem is not separable, since the components of are correlated; thus, the MMSE in this situation is lower than the sum of the components’ MMSE. An example of such a setting is presented in Section VI.

## V Properties

In this section, we examine several properties of the OBB. We first demonstrate that the optimal bias function has zero mean, a property which also characterizes the bias function of the MMSE estimator. Next, we prove that, under very general conditions, the resulting bound is tight at both low and high SNR values. This is an important result, since a desirable property of a Bayesian bound is that it provides an accurate estimate of the ambiguity region between high and low SNR . Reliable estimation at the two extremes increases the likelihood that the transition between these two regimes will be correctly identified.

### V-a Optimal Bias Has Zero Mean

In any Bayesian estimation problem, the bias of the MMSE estimator has zero mean:

 (27)

so that

 E{b(^θopt)}=E{E{θ|x}−θ}=0. (28)

Thus, it is interesting to ask whether the optimal bias which minimizes (20) also has zero mean. This is indeed the case, as shown by the following theorem.

###### Theorem 4

Let be the solution to (20). Then,

 E{b(θ)}=0. (29)
{proof}

Assume by contradiction that has nonzero mean . Define . From (II-B), we then have

 CRB[b0,θ]−CRB[b,θ] =∥b0(θ)∥2−∥b(θ)∥2 =∥μ∥2−2μTb(θ). (30)

Using the functional defined in (12), we obtain

 Z[b0]−Z[b] =E{∥μ∥2−2μTb(θ)} =∥μ∥2−2μTE{b(θ)} =−∥μ∥2<0. (31)

Thus , contradicting the fact that minimizes (20).

### V-B Tightness at Low SNR

Bell et al.  examined the performance of the extended Ziv–Zakai bound at low SNR and demonstrated that, for a particular family of distributions, the extended Ziv–Zakai bound achieves the MSE of the optimal estimator as the SNR tends to . We now examine the low-SNR performance of the OBB, and demonstrate tightness for a much wider range of problem settings.

Bell et al. did not define the general meaning of a low SNR value, and only stated that “[a]s observation time and/or SNR become very small, the observations become useless …[and] the minimum MSE estimator converges to the a priori mean.” This statement clearly does not apply to all estimation problems, since it is not always clear what parameter corresponds to the observation time or the SNR. We propose to define the zero SNR case more generally as any situation in which with probability 1. This definition implies that the measurements do not contain information about the unknown parameter, which is the usual informal meaning of zero SNR. In the case , it can be shown that the MMSE estimator is the prior mean, so that our definition implies the statement of Bell et al.

The OBB is inapplicable when , since the CRB is based on the assumption that is positive definite. To avoid this singularity, we consider a sequence of estimation settings which converge to zero SNR. More specifically, we require all eigenvalues of to decrease monotonically to zero for -almost all . The following theorem, the proof of which can be found in Appendix E, demonstrates the tightness of the OBB in this low-SNR setting.

###### Theorem 5

Let be a random vector whose pdf is nonzero over an open set . Let be a sequence of observation vectors having finite Fisher information matrices , respectively. Suppose that, for all , the matrix is positive definite for -almost all , and that all eigenvalues of decrease monotonically to zero as for -almost all . Let denote the optimal-bias bound for estimating from . Then,

 limN→∞βN=E{∥θ−E{θ}∥2}. (32)

### V-C Tightness at High SNR

We now examine the performance of the OBB for high SNR values. To formally define the high SNR regime, we consider a sequence of measurements of a single parameter vector . It is assumed that, when conditioned on , all measurements are identically and independently distributed (IID). Furthermore, we assume that the Fisher information matrix of a single observation is well-defined, positive definite and finite for -almost all . We consider the problem of estimating from the set of measurements , for a given value of . The high SNR regime is obtained when is large.

When tends to infinity, the MSE of the optimal estimator tends to zero. An important question, however, concerns the rate of convergence of the minimum MSE. More precisely, given the optimal estimator of from , one would like to determine the asymptotic distribution of , conditioned on . A fundamental result of asymptotic estimation theory can be loosely stated as follows [28, §III.3], [13, §6.8]. Under some fairly mild regularity conditions, the asymptotic distribution of , conditioned on , does not depend on the prior distribution ; rather, converges in distribution to a Gaussian random vector with mean zero and covariance . It follows that

 limN→∞NE{∥^θ(N)−θ∥2}=E{Tr[J−1(θ)]}. (33)

Since the minimum MSE tends to zero at high SNR, any lower bound on the minimum MSE must also tend to zero as . However, one would further expect a good lower bound to follow the behavior of (33). In other words, if represents the lower bound for estimating from , a desirable property is . The following theorem, whose proof is found in Appendix E, demonstrates that this is indeed the case for the OBB.

Except for a very brief treatment by Bellini and Tartara , no previous Bayesian bound has shown such a result. Although it appears that the Ziv–Zakai and Weiss–Weinstein bounds may also satisfy this property, this has not been proven formally. It is also known that the Bayesian CRB is not asymptotically tight in this sense [34, Eqs. (37)–(39)].

###### Theorem 6

Let be a random vector whose pdf is nonzero over an open set . Let be a sequence of measurement vectors, such that are IID. Let be the Fisher information matrix for estimating from , and suppose is finite and positive definite for -almost all . Let be the optimal-bias bound (20) for estimating from the observation sequence . Then,

 limN→∞NβN=E{Tr(J−1(θ))}. (34)

Note that for Theorem 6 to hold, we require only that be finite and positive definite. By contrast, the various theorems guaranteeing asymptotic efficiency of Bayesian estimators all require substantially stronger regularity conditions [28, §III.3], [13, §6.8]. One reason for this is that asymptotic efficiency describes the behavior of conditioned on each possible value of , and is thus a stronger result than the asymptotic Bayesian MSE of (33).

## Vi Example: Uniform Prior

The original bound of Young and Westerberg  predates most Bayesian bounds, and, surprisingly, it has never been cited by or compared with later results. In this section, we measure the performance of the original bound and of its extension to the vector case against that of various other techniques. We consider the case in which is uniformly distributed over an -dimensional open ball , so that

 pθ(θ)=1Vn(r)\mathbbm1Θ (35)

where equals when and otherwise, and

 Vn(r)=πn/2rn−1Γ(1+n/2) (36)

is the volume of an -ball of radius . We further assume that

 x=θ+w (37)

where is zero-mean Gaussian noise, independent of , having covariance . We are interested in lower bounds on the MSE achievable by an estimator of from .

We begin by developing the OBB for this setting, as well as some alternative bounds. We then compare the different approaches in a one-dimensional and a three-dimensional setting.

The Fisher information matrix for the given estimation problem is given by , so that the conditions of Theorem 3 hold. It follows that the optimal bias function is given by , where is a solution to the differential equation

 bσ2=b′′+(n−1)(b′θ−bθ2) (38)

with boundary conditions , . The general solution to this differential equation is given by

 b(θ)=C1θ1−n/2In/2(θσ)+C2θ1−n/2Kn/2(θσ) (39)

where and are the modified Bessel functions of the first and second types, respectively . Since is singular at the origin, the requirement leads to . Differentiating (39) with respect to , we obtain

 b′(θ)=C1θ−n/2(In/2(θσ)+θσI1+n/2(θσ)) (40)

so that the requirement leads to

 C1=−rn/2In/2(r/σ)+r/σI1+n/2(r/σ). (41)

Substituting this value of into (3) yields the OBB, which can be computed by evaluating a single one-dimensional integral. Alternatively, in the one-dimensional case, the integral can be computed analytically, as will be shown below.

Despite the widespread use of finite-support prior distributions [10, 4], the regularity conditions of many bounds are violated by such prior pdf functions. Indeed, the Bayesian CRB of Van Trees , the Bobrovski–Zakai bound , and the Bayesian Abel bound  all assume that has infinite support, and thus cannot be applied in this scenario.

Techniques from the Ziv–Zakai family are applicable to constrained problems. An extension of the Ziv–Zakai bound for vector parameter estimation was developed by Bell et al. . From [11, Property 4], the MSE of the th component of is bounded by

 E{(θi−^θi)2}≥∫∞0V{maxδ:eTiδ=hA(δ)Pmin(δ)}hdh (42)

where is a unit vector in the direction of the th component, is the valley-filling function defined by

 V{f(h)}=maxη≥0f(h+η), (43)
 A(δ)≜∫Rnmin(pθ(θ),pθ(θ+δ))dθ, (44)

and is the minimum probability of error for the problem of testing hypothesis vs. . In the current setting, is given by , where is the tail function of the normal distribution. Also, we have

 A(δ)=VCn(r,∥δ∥)Vn(r) (45)

where

 VCn(r,h)=∫Rn\mathbbm1Θ\mathbbm1Θ+he1dθ (46)

and . Thus, is the volume of the intersection of two -balls whose centers are at a distance of units from one another. Substituting these results into (42), we have

 E{(θi−^θi)2} ≥∫∞0V{maxδ:eTiδ=hVCn(r,∥δ∥)Vn(r)Q(∥δ∥2σ)}hdh. (47)

Note that both and decrease with . Therefore, the maximum in (VI) is obtained for . Also, since the argument of is monotonically decreasing, the valley-filling function has no effect and can be removed. Finally, since for , the integration can be limited to the range . Thus, the extended Ziv–Zakai bound is given by

 E{∥θ−^θ∥2}≥∫2r0nVCn(r,h)Vn(r)Q(h2σ)hdh. (48) Fig. 2: Comparison of the MSE bounds and the minimum achievable MSE in a one-dimensional setting for which θ∼U[−r,r] and x|θ∼N(θ,σ2).

We now compute the Weiss–Weinstein bound for the setting at hand. This bound is given by

 E{∥θ−^θ∥2}≥Tr(HG−1HT) (49)

where is a matrix containing an arbitrary number of test vectors and is a matrix whose elements are given by

 Gij=E{r(x,θ;hi,si)r(x,θ;hj,sj)}E{Lsi(x;θ+hi,θ)}E{Lsj(x;θ+hj,θ)} (50)

in which

 r(x,θ;hi,si)≜Lsi(x;θ+hi,θ)−L1−si(x;θ−hi,θ) (51)

and

 L(x;θ1,θ2)≜pθ(θ1)px|θ(x|θ1)pθ(θ2)px|θ(x|θ2). (52)

The vectors and the scalars are arbitrary, and can be optimized to maximize the bound (49). To avoid a multidimensional nonconvex optimization problem, we restrict attention to , , and , as suggested by . This results in a dependency on a single scalar parameter .

Under these conditions, can be written as

 Gij=1M(hi)M(hj)[~M(hi−hj,−hj)+~M(hi−hj,hi)−~M(hi+hj,hj)−~M(hi+hj,hi)] (53)

where

 M(h)≜E{L1/2(x;θ+h,θ)} (54)

and

 ~M(h1,h2)≜E{L1/2(x;θ+h1,θ)\mathbbm1Θ+h2}. (55)

Note that we have used the corrected version of the Weiss–Weinstein bound . Substituting the probability distribution of and into the definitions of and , we have

 M(h) =E{e−∥θ+h−x∥2/4σ2e∥θ−x∥2/4σ2\mathbbm1Θ+h} =VCn(r,∥h∥)Vn(r)e−∥h∥2/8σ2 (56)

and, similarly,

 ~M(h1,h2) =e−∥h1∥2/8σ2Vn(r)∫\mathbbm1Θ\mathbbm1Θ+h1\mathbbm1Θ+h2dθ. (57)

Thus, is a function only of , and is a function only of , , and . Since , it follows that, for , the numerator of (53) vanishes. Thus, is a diagonal matrix, whose diagonal elements equal

 Gii=2~M(0,he1)−~M(2he1,he1)M2(he1). (58)

The Weiss–Weinstein bound is given by substituting this result into (49) and maximizing over , i.e.,

 E{∥θ−^θ∥2}≥maxh∈[0,2r]nh2M2(he1)2[~M(0,he1)−~M(2he1,he1)]. (59)

The value of yielding the tightest bound can be determined by performing a grid search.

To compare the OBB with the alternative approaches developed above, we first consider the one-dimensional case in which is uniformly distributed in the range . Let be a single noisy observation, where is zero-mean Gaussian noise, independent of , with variance . We wish to bound the MSE of an estimator of from .

The optimal bias function is given by (39). Using the fact that , we obtain

 b(θ)=−σsinh(θ/σ)cosh(r/σ) (60)

which also follows  from Corollary 1. Substituting this expression into (20), we have that, for any estimator ,

 E{(θ−^θ)2}≥σ2(1−tanh(r/σ)r/σ). (61)

Apart from the reduction in computational complexity, the simplicity of (61) also emphasizes several features of the estimation problem. First, the dependence of the problem on the dimensionless quantity , rather than on and separately, is clear. This is to be expected, as a change in units of measurement would multiply both and by a constant. Second, the asymptotic properties demonstrated in Theorems 5 and 6 can be easily verified. For , the bound converges to the noise variance , corresponding to an uninformative prior whose optimal estimator is ; whereas, for , a Taylor expansion of immediately shows that the bound converges to , corresponding to the case of uninformative measurements, where the optimal estimator is . Thus, the bound (61) is tight both for very low and for very high SNR, as expected.

In the one-dimensional case, we have and , so that the extended Ziv–Zakai bound (48) and the Weiss–Weinstein bound (59) can also be simplified somewhat. In particular, the extended Ziv–Zakai bound (48) can be written as

 E{∥θ−^θ∥2}≥∫2r0(1−h2r)hQ(h2σ)dh. (62)

Using integration by parts, (62) becomes

 E{∥θ−^θ