MAP Estimators and Their Consistency in Bayesian Nonparametric Inverse Problems

MAP Estimators and Their Consistency in Bayesian Nonparametric Inverse Problems

M. Dashti111Department of Mathematics, University of Sussex, Brighton BN1 9QH, UK, K.J.H. Law, A.M. Stuart222Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK and J. Voss333School of Mathematics, University of Leeds, Leeds, LS2 9JT, UK
Abstract

We consider the inverse problem of estimating an unknown function from noisy measurements of a known, possibly nonlinear, map applied to . We adopt a Bayesian approach to the problem and work in a setting where the prior measure is specified as a Gaussian random field . We work under a natural set of conditions on the likelihood which imply the existence of a well-posed posterior measure, . Under these conditions we show that the maximum a posteriori (MAP) estimator is well-defined as the minimiser of an Onsager-Machlup functional defined on the Cameron-Martin space of the prior; thus we link a problem in probability with a problem in the calculus of variations. We then consider the case where the observational noise vanishes and establish a form of Bayesian posterior consistency for the MAP estimator. We also prove a similar result for the case where the observation of can be repeated as many times as desired with independent identically distributed noise. The theory is illustrated with examples from an inverse problem for the Navier-Stokes equation, motivated by problems arising in weather forecasting, and from the theory of conditioned diffusions, motivated by problems arising in molecular dynamics.

1 Introduction

This article considers questions from Bayesian statistics in an infinite dimensional setting, for example in function spaces. We assume our state space to be a general separable Banach space . While in the finite-dimensional setting, the prior and posterior distribution of such statistical problems can typically be described by densities w.r.t. the Lebesgue measure, such a characterisation is no longer possible in the infinite dimensional spaces we consider here: it can be shown that no analogue of the Lebesgue measure exists in infinite dimensional spaces. One way to work around this technical problem is to replace Lebesgue measure with a Gaussian measure on , i.e. with a Borel probability measure  on such that all finite-dimensional marginals of are (possibly degenerate) normal distributions. Using a fixed, centred (mean-zero) Gaussian measure as a reference measure, we then assume that the distribution of interest, , has a density with respect to :

\hb@xt@.01(1.1)

Measures of this form arise naturally in a number of applications, including the theory of conditioned diffusions [18] and the Bayesian approach to inverse problems [33]. In these settings there are many applications where is a locally Lipschitz continuous function and it is in this setting that we work.

Our interest is in defining the concept of “most likely” functions with respect to the measure , and in particular the maximum a posteriori estimator in the Bayesian context. We will refer to such functions as MAP estimators throughout. We will define the concept precisely and link it to a problem in the calculus of variations, study posterior consistency of the MAP estimator in the Bayesian setting, and compute it for a number of illustrative applications.

To motivate the form of MAP estimators considered here we consider the case where is finite dimensional and the prior is Gaussian . This prior has density with respect to the Lebesgue measure where denotes the Euclidean norm. The probability density for with respect to the Lebesgue measure, given by (LABEL:e:radon), is maximised at minimisers of

\hb@xt@.01(1.2)

where . We would like to derive such a result in the infinite dimensional setting.

The natural way to talk about MAP estimators in the infinite dimensional setting is to seek the centre of a small ball with maximal probability, and then study the limit of this centre as the radius of the ball shrinks to zero. To this end, let be the open ball of radius centred at . If there is a functional , defined on , which satisfies

\hb@xt@.01(1.3)

then is termed the Onsager-Machlup functional [11, 21]. For any fixed , the function for which the above limit is maximal is a natural candidate for the MAP estimator of and is clearly given by minimisers of the Onsager-Machlup function. In the finite dimensional case it is clear that given by (LABEL:eq:I-finite) is the Onsager-Machlup functional.

From the theory of infinite dimensional Gaussian measures [25, 5] it is known that copies of the Gaussian measure shifted by are absolutely continuous w.r.t.  itself, if and only if lies in the Cameron-Martin space ; furthermore, if the shift direction is in , then shifted measure  has density

\hb@xt@.01(1.4)

In the finite dimensional example, above, the Cameron-Martin norm of the Gaussian measure is the norm  and it is easy to verify that (LABEL:eq:CM-intro) holds for all . In the infinite dimensional case, it is important to keep in mind that (LABEL:eq:CM-intro) only holds for . Similarly, the relation (LABEL:eq:OM1) only holds for . In our application, the Cameron-Martin formula (LABEL:eq:CM-intro) is used to bound the probability of the shifted ball from equation (LABEL:eq:OM1). (For an exposition of the standard results about small ball probabilities for Gaussian measures we refer to [5, 25]; see also [24] for related material.) The main technical difficulty that is encountered stems from the fact that the Cameron-Martin space , while being dense in , has measure zero with respect to . An example where this problem can be explicitly seen is the case where is the Wiener measure on ; in this example corresponds to a subset of the Sobolov space , which has indeed measure zero w.r.t. Wiener measure.

Our theoretical results assert that despite these technical complications the situation from the finite-dimensional example, above, carry over to the infinite dimensional case essentially without change. In Theorem LABEL:t:OM we show that the Onsager-Machlup functional in the infinite dimensional setting still has the form (LABEL:eq:I-finite), where is now the Cameron-Martin norm associated to  (using for ), and in Corollary LABEL:c:MAPmin we show that the MAP estimators for lie in the Cameron-Martin space  and coincide with the minimisers of the Onsager-Machlup functional .

In the second part of the paper, we consider the inverse problem of estimating an unknown function in a Banach space , from a given observation , where

\hb@xt@.01(1.5)

here is a possibly nonlinear operator, and is a realization of an -valued centred Gaussian random variable with known covariance . A prior probability measure is put on , and the distribution of is given by (LABEL:e:obs), with assumed independent of . Under appropriate conditions on and , Bayes theorem is interpreted as giving the following formula for the Radon-Nikodym derivative of the posterior distribution on with respect to :

\hb@xt@.01(1.6)

where

\hb@xt@.01(1.7)

Derivation of Bayes formula (LABEL:e:radon1) for problems with finite dimensional data, and in this form, is discussed in [7]. Clearly, then, Bayesian inverse problems with Gaussian priors fall into the class of problems studied in this paper, for potentials given by (LABEL:eq:fy) which depend on the observed data . When the probability measure arises from the Bayesian formulation of inverse problems, it is natural to ask whether the MAP estimator is close to the truth underlying the data, in either the small noise or large sample size limits. This is a form of Bayesian posterior consistency, here defined in terms of the MAP estimator only. We will study this question for finite observations of a nonlinear forward model, subject to Gaussian additive noise.

The paper is organized as follows:

  • in section LABEL:s:Bayes we detail our assumptions on and ;

  • in section LABEL:s:MAP we give conditions for the existence of an Onsager-Machlup functional and show that the MAP estimator is well-defined as the minimiser of this functional;

  • in section LABEL:s:consistency we study the problem of Bayesian posterior consistency by studying limits of Onsager-Machlup minimisers in the small noise and large sample size limits;

  • in section LABEL:sec:fm we study applications arising from data assimilation for the Navier-Stokes equation, as a model for what is done in weather prediction;

  • in section LABEL:sec:cd we study applications arising in the theory of conditioned diffusions.

We conclude the introduction with a brief literature review. We first note that MAP estimators are widely used in practice in the infinite dimensional context [30, 22]. We also note that the functional in (LABEL:eq:I-finite) resembles a Tikhonov-Phillips regularization of the minimisation problem for [12], with the Cameron-Martin norm of the prior determining the regularization. In the theory of classical non-statistical inversion, formulation via Tikhonov-Phillips regularization leads to an infinite dimensional optimization problem and has led to deeper understanding and improved algorithms. Our aim is to achieve the same in a probabilistic context. One way of defining a MAP estimator for given by (LABEL:e:radon) is to consider the limit of parametric MAP estimators: first discretize the function space using parameters, and then apply the finite dimensional argument above to identify an Onsager-Machlup functional on . Passing to the limit in the functional provides a candidate for the limiting Onsager-Machlup functional. This approach is taken in [27, 28, 32] for problems arising in conditioned diffusions. Unfortunately, however, it does not necessarily lead to the correct identification of the Onsager-Machlup functional as defined by (LABEL:eq:OM1). The reason for this is that the space on which the Onsager-Mahlup functional is defined is smoother than the space on which small ball probabilities are defined. Small ball probabilities are needed to properly define the Onsager-Machlup functional in the infinite dimensional limit. This means that discretization and use of standard numerical analysis limit theorems can, if incorrectly applied, use more regularity than is admissible in identifying the limiting Onsager-Mahlup functional. We study the problem directly in the infinite dimensional setting, without using discretization, leading, we believe, to greater clarity. Adopting the infinite dimensional perspective for MAP estimation has been widely studied for diffusion processes [9] and related stochastic PDEs [34]; see [35] for an overview. Our general setting is similar to that used to study the specific applications arising in the papers [9, 34, 35]. By working with small ball properties of Gaussian measures, and assuming that has natural continuity properties, we are able to derive results in considerable generality. There is a recent related definition of MAP estimators in [19], with application to density estimation in [16]. However, whilst the goal of minimising is also identified in [19], the proof in that paper is only valid in finite dimensions since it implicitly assumes that the Cameron-Martin norm is a.s. finite. In our specific application to fluid mechanics our analysis demonstrates that widely used variational methods [2] may be interpreted as MAP estimators for an appropriate Bayesian inverse problem and, in particular, that this interpretation, which is understood in the atmospheric sciences community in the finite dimensional context, is well-defined in the limit of infinite spatial resolution.

Posterior consistency in Bayesian nonparametric statistics has a long history [15]. The study of posterior consistency for the Bayesian approach to inverse problems is starting to receive considerable attention. The papers [23, 1] are devoted to obtaining rates of convergence for linear inverse problems with conjugate Gaussian priors, whilst the papers [4, 29] study non-conjugate priors for linear inverse problems. Our analysis of posterior consistency concerns nonlinear problems, and finite data sets, so that multiple solutions are possible. We prove an appropriate weak form of posterior consistency, without rates, building on ideas appearing in [3].

Our form of posterior consistency is weaker than the general form of Bayesian posterior consistency since it does not concern fluctuations in the posterior, simply a point (MAP) estimator. However we note that for linear Gaussian problems there are examples where the conditions which ensure convergence of the posterior mean (which coincides with the MAP estimator in the linear Gaussian case) also ensure posterior contraction of the entire measure [1, 23].

2 Set-up

Throughout this paper we assume that is a separable Banach space and that is a centred Gaussian (probability) measure on with Cameron-Martin space . The measure of interest is given by (LABEL:e:radon) and we make the following assumptions concerning the potential .

Assumption 2.1

The function satisfies the following conditions:

  • For every there is an , such that for all ,

  • is locally bounded from above, i.e. for every there exists such that, for all with we have

  • is locally Lipschitz continuous, i.e. for every there exists such that for all with we have

Assumption LABEL:a:asp1(i) ensures that the expression (LABEL:e:radon) for the measure is indeed normalizable to give a probability measure; the specific form of the lower bound is designed to ensure that application of the Fernique Theorem (see [5] or [25]) proves that the required normalization constant is finite. Assumption LABEL:a:asp1(ii) enables us to get explicit bounds from below on small ball probabilities and Assumption LABEL:a:asp1(iii) allows us to use continuity to control the Onsager-Machlup functional. Numerous examples satisfying these condition are given in the references [33, 18]. Finally, we define a function  by

\hb@xt@.01(2.1)

We will see in section LABEL:s:MAP that is the Onsager-Machlup functional.

Remark 2.2

We close with a brief remark concerning the definition of the Onsager-Machlup function in the case of non-centred reference measure . Shifting coordinates by it is possible to apply the theory based on centred Gaussian measure , and then undo the coordinate change. The relevant Onsager-Machlup functional can then be shown to be

3 MAP estimators and the Onsager-Machlup functional

In this section we prove two main results. The first, Theorem LABEL:t:OM, establishes that given by (LABEL:eq:I-finite) is indeed the Onsager-Machlup functional for the measure given by (LABEL:e:radon). Then Theorem LABEL:t:MAP and Corollary LABEL:c:MAPmin, show that the MAP estimators, defined precisely in Definition LABEL:d:MAP, are characterised by the minimisers of the Onsager-Machlup functional.

For , let be the open ball centred at with radius  in . Let

be the mass of the ball . We first define the MAP estimator for as follows:

Definition 3.1

Let

Any point satisfying , is a MAP estimator for the measure given by (LABEL:e:radon).

We show later on (Theorem LABEL:t:MAP) that a strongly convergent subsequence of exists and its limit, that we prove to be in , is a MAP estimator and also minimises the Onsager-Machlup functional . Corollary LABEL:c:MAPmin then shows that any MAP estimator as given in Definition LABEL:d:MAP lives in as well, and minimisers of characterise all MAP estimators of .

One special case where it is easy to see that the MAP estimator is unique is the case where is linear, but we note that, in general, the MAP estimator cannot be expected to be unique. To achieve uniqueness, stronger conditions on would be required.

We first need to show that is the Onsager-Machlup functional for our problem:

Theorem 3.2

Let Assumption LABEL:a:asp1 hold. Then the function defined by (LABEL:eq:I) is the Onsager-Machlup functional for , i.e. for any we have

Proof. Note that is finite and positive for any by Assumptions LABEL:a:asp1(i),(ii) together with the Fernique Theorem and the positive mass of all balls in , centred at points in , under Gaussian measure [5]. The key estimate in the proof is the following consequence of Proposition 3 in Section 18 of [25]:

\hb@xt@.01(3.1)

This is the key estimate in the proof since it transfers questions about probability, naturally asked on the space of full measure under , into statements concerning the Cameron-Martin norm of , which is almost surely infinite under .

We have

By Assumption LABEL:a:asp1 (iii), for any

where with . Therefore, setting and , we can write

Now, by (LABEL:eq:need), we have

with as . Thus

\hb@xt@.01(3.2)

Similarly we obtain

with as and deduce that

\hb@xt@.01(3.3)

Inequalities (LABEL:e:Jlims) and (LABEL:e:Jlimi) give the desired result.     

We note that similar methods of analysis show the following:

Corollary 3.3

Let the Assumptions of Theorem LABEL:t:OM hold. Then for any

where .

Proof. Noting that we consider to be a probability measure and hence

with , arguing along the lines of the proof of the above theorem gives

with (where is as in Definition LABEL:a:asp1) and as . The result then follows by taking and as .     

Proposition 3.4

Suppose Assumptions LABEL:a:asp1 hold. Then the minimum of is attained for some element .

Proof. The existence of a minimiser of in , under the given assumptions, is proved as Theorem 5.4 in [33] (and as Theorem 2.7 in [7] in the case that is non-negative).     

The rest of this section is devoted to a proof of the result that MAP estimators can be characterised as minimisers of the Onsager-Machlup functional  (Theorem LABEL:t:MAP and Corollary LABEL:c:MAPmin).

Theorem 3.5

Suppose that Assumptions LABEL:a:asp1 (ii) and (iii) hold. Assume also that there exists an such that for any .

  • Let . There is a and a subsequence of which converges to strongly in .

  • The limit is a MAP estimator and a minimiser of .

The proof of this theorem is based on several lemmas. We state and prove these lemmas first and defer the proof of Theorem LABEL:t:MAP to the end of the section where we also state and prove a corollary characterising the MAP estimators as minimisers of Onsager-Machlup functional.

Lemma 3.6

Let . For any centred Gaussian measure on a separable Banach space we have

where and is a constant independent of and .

Proof. We first show that this is true for a centred Gaussian measure on with the covariance matrix in basis , where . Let , and . Define

\hb@xt@.01(3.4)

and with the ball of radius and centre in . We have

for any and where is a centred Gaussian measure on with the Covariance matrix (noting that ). By Anderson’s inequality for the infinite dimensional spaces (see Theorem 2.8.10 of [5]) we have and therefore

and since is arbitrarily small the result follows for the finite-dimensional case.

To show the result for an infinite dimensional separable Banach space , we first note that , the orthogonal basis in the Cameron-Martin space of for , separates the points in , therefore is an injective map from into . Let and

Then, since is a Radon measure, for the balls and , for any , there exists large enough such that the cylindrical sets and satisfy and for [5], where denotes the symmetric difference. Let and and for , large enough so that . With we have

Since and converge to zero as , the result follows.     

Lemma 3.7

Suppose that , and converges weakly to in as . Then for any there exists small enough such that

Proof. Let be the covariance operator of , and the eigenfunctions of scaled with respect to the inner product of , the Cameron-Martin space of , so that forms an orthonormal basis in . Let be the corresponding eigenvalues and . Since converges weakly to in as ,

\hb@xt@.01(3.5)

and as , for any , there exists sufficiently large and sufficiently small such that

where . By (LABEL:e:wcX), for small enough we have and therefore

\hb@xt@.01(3.6)

Let map to , and consider to be defined as in (LABEL:e:defJ0n). Having (LABEL:e:lA), and choosing such that , for any we can write

As was arbitrary, the constant in the last line of the above equation can be made arbitrarily small, by making sufficiently small and sufficiently large. Having this and arguing in a similar way to the final paragraph of proof of Lemma LABEL:l:limg0, the result follows.     

Corollary 3.8

Suppose that . Then

Lemma 3.9

Consider and suppose that converges weakly and not strongly to in as . Then for any there exists small enough such that

Proof. Since converges weakly and not strongly to , we have

and therefore for small enough there exists such that for any . Let , and , , be defined as in the proof of Lemma LABEL:l:limg0-nE. Since as ,

\hb@xt@.01(3.7)

Also, as for -almost every , and is an orthonormal basis in (closure of in ) [5], we have

\hb@xt@.01(3.8)

Now, for any , let large enough such that . Then, having (LABEL:e:wcX0) and (LABEL:e:L2b), one can choose small enough and large enough so that for and

Therefore, letting and be defined as in the proof of Lemma LABEL:l:limg0-nE, we can write

if is small enough so that . Having this and arguing in a similar way to the final paragraph of proof of Lemma LABEL:l:limg0, the result follows.     

Having these preparations in place, we can give the proof of Theorem LABEL:t:MAP.

Proof. (of Theorem LABEL:t:MAP) i) We first show is bounded in . By Assumption LABEL:a:asp1.(ii) for any there exists such that

for any satisfying ; thus may be assumed to be a non-decreasing function of . This implies that

We assume that and then the inequality above shows that

\hb@xt@.01(3.9)

noting that is independent of .

We also can write

which implies that for any and

\hb@xt@.01(3.10)

Now suppose is not bounded in , so that for any there exists such that (with as ). By (LABEL:e:J0), (LABEL:e:zX1) and definition of we have

implying that for any and corresponding

This contradicts the result of Lemma LABEL:l:limg0 (below) for