Posterior Consistency for Bayesian Inverse Problems

# Posterior Consistency for Bayesian Inverse Problems through Stability and Regression Results

## Abstract

In the Bayesian approach, the a priori knowledge about the input of a mathematical model is described via a probability measure. The joint distribution of the unknown input and the data is then conditioned, using Bayes’ formula, giving rise to the posterior distribution on the unknown input. In this setting we prove posterior consistency for nonlinear inverse problems: a sequence of data is considered, with diminishing fluctuations around a single truth and it is then of interest to show that the resulting sequence of posterior measures arising from this sequence of data concentrates around the truth used to generate the data. Posterior consistency justifies the use of the Bayesian approach very much in the same way as error bounds and convergence results for regularisation techniques do. As a guiding example, we consider the inverse problem of reconstructing the diffusion coefficient from noisy observations of the solution to an elliptic PDE in divergence form. This problem is approached by splitting the forward operator into the underlying continuum model and a simpler observation operator based on the output of the model.

In general, these splittings allow us to conclude posterior consistency provided a deterministic stability result for the underlying inverse problem and a posterior consistency result for the Bayesian regression problem with the push-forward prior.

Moreover, we prove posterior consistency for the Bayesian regression problem based on the regularity, the tail behaviour and the small ball probabilities of the prior.

###### ams:
35R30, 62C10, 62G20
2

## 1 Introduction

Many mathematical models used in science and technology contain parameters for which a direct observation is very difficult. A good example is subsurface geophysics. The aim in subsurface geophysics is the reconstruction of subsurface properties such as density and permeability given measurements on the surface.Using the laws of physics, these properties can be used as parameters of a forward model mapping them to the measurements which we subsequently call data.

Inverting such a relationship is non-trivial and lies in the focus of the area of inverse problems. Classically, these parameters are estimated by minimisation of a regularised least squares functional which is based on the data output mismatch (Tikhonov). The idea of this approach is to use optimisation techniques aiming at parameters that produce nearly the same noiseless output as the given noisy data while being not too irregular. However, it is difficult to quantify how the noise in the data translates into the uncertainty of the reconstructed parameters for this method. Uncertainty quantification is much more straightforward in the Bayesian approach. The basic idea of the Bayesian method is that not all parameter choices are a priori equally likely. Instead, the parameters are artificially treated as random variables by modelling their distribution using a priori knowledge. This distribution is accordingly called the prior. For a specific forward model and given the distribution of the observational noise, the parameters and the data can be treated as jointly varying random variables. Under mild conditions, the prior can then be updated by conditioning the parameters on the data.

The posterior is one of the main tools for making inference about the parameters. Possible estimates include approximation of the posterior mean or the maximum a posteriori (MAP) estimator. Moreover, it is possible to quantify the uncertainty of the reconstructed parameter by posterior variance or posterior probability of a set around for example an estimate of the parameters under consideration.

The main focus of this article lies on posterior consistency which quantifies the quality of the resulting posterior in a thought experiment. As for any evaluation for an approach to inverse problems, an identical twin experiment is performed, that is for a fixed set of parametersand artificial data is generated. It is conceivable to expect that, under appropriate conditions, the posterior concentrates around this set of ’true’ parameters. Results of this type are called posterior consistency. It justifies the Bayesian method by establishing that this method recovers the ’true’ parameters sometimes with a specific rate.

So far, there are only posterior consistency results available for linear forward models and mainly Gaussian priors [knapik2011bayesian, 2012arXiv1203.5753A, ray2012bayesian, florens2012regularized]. In this article, we prove posterior consistency of nonlinear inverse problems with explicit bounds on the rate. The main idea behind our posterior consistency results is to use stability properties of the deterministic inverse problem to reduce posterior consistency of a nonlinear inverse problem to posterior consistency of a Bayesian non-parametric regression problem. Our guiding example is the inverse problem of reconstructing the diffusion coefficient from measurements of the pressure. More precisely, we assume that the relation between the diffusion coefficient and the pressure satisfies the following partial differential equation (PDE) with Dirichlet boundary conditions

 {−∇⋅(a∇p)=f(x) in Dp=0on ∂D (1)

where is a bounded smooth domain in For this guiding example the required stability results are due to [MR628945]. However, our methods are generally applicable to inverse problems with deterministic stability results. These are often available in the literature because they are also needed for convergence results of the Tikhonov regularisation (consider for example Theorem 10.4. in [inverseProblem]). Finally, we complete our reasoning by proving appropriate posterior consistency results for the corresponding Bayesian non-parametric regression problem.

In Section 2, we both review preliminary material and give a detailed exposition of our main ideas, steps and results. In Section 3, we provide novel posterior consistency results for Bayesian non-parametric regression. In order to evaluate the rate for the regression problem, we compare our rates to those for Gaussian priors for which optimal rates are known. These results are needed in order to obtain posterior consistency for the elliptic inverse problem in Section 4. We obtain explicit rates for priors based on a series expansion with uniformly distributed coefficients. In Section 5, we draw a conclusion and mention other inverse problems to which this approach is applicable. The appendix contains a detailed summary of relevant technical tools such as Gaussian measures and Hilbert scales which are used in the proofs of our main results.

The author would like to thank Professor Martin Hairer, Professor Andrew Stuart, Dr. Hendrik Weber and Sergios Agapiou for helpful discussions. SJV is grateful for the support of an ERC scholarship.

## 2 Preliminaries and Exposition to Posterior Consistency for Nonlinear Inverse Problems

Our crucial idea for proving posterior consistency for a nonlinear Bayesian inverse problem is the use of stability results which allow us to break it down to posterior consistency of a Bayesian regression problem. Because the proofs are quite technical, it is worth becoming familiar with the outline of our main ideas first. Therefore this section is intended to motivate, review and summarise our investigation of posterior consistency for a nonlinear inverse problem leaving technical details to the Sections 3 and 4. For the convenience of the reader we also repeat the relevant material on Bayesian inverse problems in Section 2.1 without proofs, thus making our exposition self-contained. In Section 2.2, we precisely define posterior consistency in this setting and place it within the literature. Subsequently, we introduce an elliptic inverse problem as guiding example for which we apply our method using stability results from [MR628945].

Finally, we conclude our exposition by giving a general abstract theorem of posterior consistency for nonlinear inverse problems with stability results in Section 2.4.

### 2.1 Summary of the Bayesian Approach to Inverse Problems on Hilbert Spaces

The key idea of Bayesian inverse problems is to model the input of a mathematical model, for example the initial condition of a PDE, as random variable with distribution based on a priori knowledge. This distribution is called the prior which is updated based on the observed data . The resulting distribution is called posterior and lies in the focus of the Bayesian approach.

We assume that the data is modelled as

 y=G(a)+ξ (2)

with being the forward operator, a mapping between the Hilbert spaces and , and with the observational noise . The aim of the inverse problem is the reconstruction of given the data . Because might be non-injective and is unknown, the problem is not exactly solvable as stated. If the distribution of the noise is known, then and can be treated as jointly varying random variables. Under mild assumptions on the prior, the distribution of the noise and the forward operator, there exists a conditional probability measure on , called the posterior . It is an update of the prior using the data and models the a posteriori uncertainty. Therefore it can be viewed as the solution to the inverse problem itself. In this way it is possible to obtain different explanations of the data corresponding to different modes of the posterior.

In this article, we assume that the law of the observational noise is a mean-zero Gaussian with covariance . In this case Bayes’ rule can be generalised for any mapping into a finite dimensional space . It follows that

 dμydμ0(a) ∝exp(−12∥∥G(a)∥∥2Γ+⟨y,G(a)⟩Γ).

By we denote the norm of the Cameron-Martin space of that is the closure of with respect to (see A for more details). A proper derivation of Equation (2.1), including the fact that its last line is also valid for functional data, and an appropriate introduction to Bayesian inverse problems can be found in [MR2652785] and [stuartchinanotes]. All in all the Bayesian approach can be summarised as

 (4)

As one can see in this example, the posterior can usually only be expressed implicitly as an unnormalised density with respect to the prior. Thus, in order to estimate the input parameters or perform inference using the posterior, it has to be probed using either

• sampling methods, such as MCMC which aim at generating draws from the posterior or

• variational methods for determining the location of an infinitesimal ball with maximal posterior probability.

The second approach is also called the maximum a posteriori probability (MAP) estimator. It can be viewed as an extension to many classical methods for inverse problems. For example, it can be linked to the -Tikhonov regularisation by considering a Gaussian prior and noise [2013dashtiMap]. This relates the choice of norms in the Tikhonov regularisation to the choice of the covariance of the prior and the noise.

These regularisation techniques can be justified by convergence results. Similarly, inference methods based on the posterior can be justified by posterior consistency, a concept which we introduce in the next section.

### 2.2 Posterior Consistency for Bayesian Inverse Problems

As for any approach to inverse problems, the Bayesian method can be evaluated by considering an identical twin experiment. Therefore a fixed input , called the ’truth’, is considered and data is generated using a sequence of forward models

 yn=Gn(a†)+ξn

which might correspond to the increasing amount of data or diminishing noise. For each we denote the posterior corresponding to the prior , the noise distribution and the forward operator by . Under appropriate assumptions, the posterior is well-defined for given by Bayes’ rule in Equation (4) for -a.e. and -a.e. (c.f. [stuartchinanotes]). This Bayes’ rule does not give rise to a well-defined measure for arbitrary . However, we will pose assumptions such that the normalising constant in the Bayes’ rule will be bounded above and below for every belonging to a particular set and -a.e. . We will denote these posteriors by . This sequence of inverse problems is called posterior consistent if the posteriors concentrate around the ’truth’ . We quantify the concentration by the posterior probability assigned to the ball . Here denotes a ball of radius with respect to a metric .

In the following we define this concept precisely and place it within the literature before closing this section by relating posterior consistency to small ball probabilities for the prior.

###### Definition.

(Analogue to [MR1790007]) A sequence of Bayesian inverse problems is posterior consistent for with rate and with respect to a metric if for

 yn=Gn(a†)+ξn,

there exists a constant and a sequence such that

 Pξn(μyn(BdMϵn(a†))≥ln)→1. (5)

We simply say that is posterior consistent if the above holds for any fixed constant .

Two important special cases of this definition are

• posterior consistency in the small noise limit:

 L(ξn)=L(1√nξ) and Gn=G
• posterior consistency in the large data limit:

 L(ξn)=⊗ni=1L(ξ) and Gn=∏ni=1Gi=(G1,…,Gn).

In the above formulation corresponds to different measurements while denotes the law of a random variable.

There exists a variety of results for posterior consistency and inconsistency for statistical problems. Two important examples are the identification of a distribution from (often i.i.d.) samples or density estimation [postConIncon, MR1790007, MR2418663, BayesianNonBook]. The former is concerned with considering a prior on a set of probability distributions and the resulting posterior based on samples of one of these probability distributions. In [doob1949application], Doob proved that if a countable collection of samples almost surely allows the identification of the generating distribution, then the posterior is consistent for almost every probability distribution with respect to the prior. This very general result is not completely satisfactory because it does not provide a rate and the interest may lie in showing posterior consistency for every possible truth in a certain class. Moreover, some surprisingly simple examples of posterior inconsistency have been provided for example by considering distributions on [MR0158483]. The necessary bounds for posterior consistency (c.f. Equation (5)) can be obtained using the existence of appropriate statistical tests which are due to bounds on entropy numbers. These methods are used in a series of articles, for example in [MR1790007, shen2001rates, MR2418663, vaartBook]. This idea has also recently been applied to the Bayesian approach to linear inverse problems in [ray2012bayesian].

In general, posterior consistency for infinite dimensional inverse problems has mostly been studied for linear inverse problems in the small noise limit where the prior is either a sieve prior, a Gaussian or a wavelet expansion with uniform distributed coefficients [knapik2011bayesian, 2012arXiv1203.5753A, ray2012bayesian, florens2012regularized]. Except for [ray2012bayesian], all these articles exploit the explicit structure of the posterior in the conjugate Gaussian setting, that means that we have a Gaussian prior as well as a Gaussian posterior.

In contrast, we consider general priors, general forward operators and Gaussian noise in this article. Usually, the posterior has a density with respect to the prior as in Equation (4). However, it is possible to provide examples where both the prior and posterior are Gaussian but not absolutely continuous. This can be achieved using for example Proposition 3.3 in [severeillposedbay].

Subsequently, we assume that the posterior has a density with respect to the prior implying that the posterior probability of a set is zero whenever the prior probability of this set is zero. Therefore it is necessary that is in the support of the prior giving rise to the following definition.

###### Definition.

The support of a measure in a metric space is given by

 suppd(μ)={x∣∣μ(Bdϵ(x)>0∀ϵ>0)}.

It is natural to expect that the posterior consistency rate depends on the behaviour of as . Asymptotics of this type are called small ball probabilities. We recommend [smallballsurv] as a good survey and refer the reader to [smallBallBib] for an up-to-date list of references. In this article, we consider algebraic rates of posterior consistency, that means we take in Definition 2.2. In order to establish these rates of posterior consistency, we consider small ball asymptotics of the following form

 log(μ0(Bdϵ(a†))≿−ϵ−ρ,

where and with the notation as in Appendix A.

Both posterior consistency and the contraction rate depend on properties of the prior. This suggests that we should choose a prior with favourable posterior consistency properties. From a dogmatic point of view the prior is only supposed to be chosen to match the subjective a priori knowledge. In practice priors are often picked based on their computational performance whereas some of their parameters are adapted to represent the subjective knowledge. An example for this is the choice of the base measure and the intensity for a Dirichlet process [BayesianNonBook].

Finally, we would like to conclude this Section by mentioning that it has been shown in [MR829555] that posterior consistency is equivalent to the property that the posteriors corresponding to two different priors merge. The yet unpublished book [vaartBook] contains a more detailed discussion about the justification of posterior consistency studies for dogmatic Bayesians.

### 2.3 An Elliptic Inverse Problem as an Application of our Theory

The aim of this section is to set up the elliptic inverse problem for which we will prove posterior consistency (c.f. Section 2.2) both in the small noise and the large data limit. In a second step we describe the available stability results and how they can be used to reduce the problem of posterior consistency of a nonlinear inverse problem to that of a linear regression problem. We end this section by stating a special case of our posterior consistency results in Section 4.

Our results do not only apply to this particular elliptic inverse problem but to any nonlinear inverse problem with appropriate stability results (c.f. Section 2.4). However, the results for the elliptic inverse problem are of particular interest because it is used in oil reservoir simulations and the reconstruction of the groundwater flow [yeh1986review, MR628945, hansen2012inverse].

The forward model corresponding to our elliptic inverse problem is based on the relation between and given by the elliptic PDE in 1.

We would like to highlight that the relation between and is nonlinear. Under the following assumptions, the solution operator to the above PDE is well-defined [MR1814364].

###### Assumption 1.

(Forward conditions) Suppose that

1. is compact, satisfies the exterior sphere condition (see [MR1814364]) and has a smooth boundary;

2. and f is smooth in ;

3. and in Equation (1).

Under these assumptions, the regularity results from [MR1814364] yield the following forward stability result.

###### Proposition 1.

If and satisfy Assumption 1 and are elements of for , then

 ∥p(⋅;a1)−p(⋅;a2)∥Cα+1≤M∥a1−a2∥Cα. (6)

The inverse problem is concerned with the reconstruction of given the data

 yn=Gn(a)+ξ,

which is related to in the following way.

###### Assumption 2.

The forward operator can be split into a composition of the solution operator and an observation operator , that is

 Gn(a)=On(p(⋅;a)). (7)

The Bayesian approach to the Elliptic Inverse Problem (EIP) summarises as

 Model−∇⋅(a∇p(⋅;a))=f(x)in D,p=0 on ∂DPrior μ0 on aDatay=Gn(a)+ξn=On(p(⋅,a))+ξn,ξn∼N(0,Γn)Posteriordμndμ0(a)∝exp(−12∥∥Gn(a)∥∥2Γn+⟨y,G(a)⟩Γn).(EIP)

A rigorous Bayesian formulation of this inverse problem, with log-Gaussian priors and Besov priors has been given in [UncertaintyElliptic] and [Con3] respectively. In [Con4] the problem is considered with a prior based on a series expansion with uniformly distributed coefficients (see Section 4.1.1). In the same article, a generalised Polynomial Chaos (gPC) method is derived in order to approximate posterior expectations.

We consider posterior consistency as set up in Definition 2.2 in the following cases:

• the small noise limit with corresponding to a functional observation and an additive Gaussian random field as noise such that

 yn=p(⋅;u)+1√nξ;
• the large data limit with where are evaluations at . In this case the data takes the form

 yn={p(xi;a)}ni=1+ξn.

Posterior consistency in both cases are based on a stability result which can be derived by taking as the unknown in Equation (1). This leads to the following hyperbolic PDE

 −∇a⋅∇p−aΔp=f. (8)

Imposing Assumption 1, it has been established that there exists a unique solution to this PDE without any additional boundary conditions:

###### Proposition 1 (Corollary 2 on page 220 in [Mr628945]).

Suppose arises as a solution to Equation (1) with as diffusion coefficient satisfying Assumption 1. Then Equation (8) is uniquely solvable for any and such that

Moreover, if and satisfy these assumptions, then

 ∥a1−a2∥∞≤M∥a1∥C1⋅∥p(⋅,a1)−p(⋅,a2)∥C2.

The stability result above and a change of variables (Theorem 1) implies

 μyn(BL∞ϵ(a†))=~μyn(p(BL∞ϵ(a†))≥~μyn(BC2ϵM(p†)).

This statement reduces posterior consistency of the EIP in to posterior consistency of the following Bayesian Regression Problem (BRP) in

where is now treated as an variable, that is the prior and the posterior are now formulated on the pressure space. Moreover, denotes the push forward of the prior under . Note that for the BRP can also be viewed as the simplest linear inverse problem.

The required posterior consistency results for the BRP can be derived from those in Section 3 using interpolation inequalities. In this way we obtain posterior consistency results in Section 4 a special case of which is the following theorem:

###### Theorem 2 (1).

Suppose that the prior satisfies

Let the noise be given by . If and , then (EIP) is posterior consistent for any in the small noise limit with respect to the -norm for any .

This approach is not limited to the EIP as the following section shows.

### 2.4 Posterior Consistency through Stability Results

In Section 2.3, we present our main idea, that is the reduction of the problem of posterior consistency of the EIP to that of the BRP. The main ingredients of this reduction are the stability result that was summarised in Proposition 1 and the posterior consistency results for the BRP. This approach is not limited to the EIP but it is applicable to any inverse problem for which appropriate stability results are available. This is the case for many inverse problems such as the inverse scattering problem in [kuchment2012radon] or the Calderon problem in [alessandrini1988stable]. We would like to point out that these stability results are also crucial for proving the convergence of regularisation methods (see Theorem 10.4 in [inverseProblem]).

###### Theorem 3.

Suppose with and . Moreover, we assume that

• there exists a stability result of the form

 ∥a1−a2∥X ≤b(∥G(a1)−G(a2)∥Y) where b:R+→R+is % increasing and,b(0)=0;
• the sequence of Bayesian inverse problems is posterior consistent with respect to for all with rate .

Then is posterior consistent with respect to for all with rate

###### Proof.

Using the notation of Section 2.3, we denote the posteriors for the Bayesian inverse problems and by and , respectively. Then a change of variables (c.f. Theorem 1) implies

 μy(BXb(ϵn)(a†))≥~μy(BYϵn(G(a†))).

## 3 Posterior Consistency for Bayesian Regression

As described in the previous section, for many inverse problems posterior consistency can be reduced to posterior consistency of a BRP (c.f. Section 2.4) using stability results. Thus, with the results obtained in this section we may conclude posterior consistency for apparently harder nonlinear inverse problems. For the EIP this is achieved by an application of the results in Theorem 3 and 7. Because the derivation of these two results is quite technical, we first give a summary and we recommend the reader to become familiar with both theorems but to skip the technical details on the first read.

It is classical to model the response as

 yn=On(p)+ξn.

In the following we consider two Bayesian regression models with

• and the noise is a Gaussian random field that is scaled to zero like or

• and corresponding to evaluations of a function with additive i.i.d. Gaussian noise.

These models represent the large data and the small noise limit, respectively.

We prove posterior consistency for both problems under weak assumptions on the prior. This is necessary because the BRPs resulting from nonlinear inverse problems are usually only given in an implicit form. For both cases we are able to obtain a rate assuming appropriate asymptotic lower bounds on the small ball probabilities of the prior around (see Section 2.2). Moreover, posterior consistency with respect to stronger norms can be obtained using prior or posterior regularity in combination with interpolation inequalities which is the subject of Section 3.3.

For the large data limit, that is , we obtain posterior consistency with respect to the -norm in Section 3.2. We assume an almost sure upper bound on a Hï¿½lder norm for the prior and an additional condition on the locations of the observations. The latter is justified by construction of a counterexample.

For the small noise limit, that is , we prove posterior consistency with respect to the Cameron-Martin norm of the noise in Section 3.1. This norm corresponds to the -norm in the Hilbert scale with respect to the covariance operator . Both the Cameron-Martin norm and Hilbert scales are introduced in A. If an appropriate -norm is -a.s. bounded, we obtain an explicit rate of posterior consistency. Otherwise, the rate is implicitly given as a low-dimensional optimisation problem. However, the condition for mere posterior consistency takes a simple form.

###### Corollary 1.

(See Corollary 5 for the case of general noise)
Suppose that the noise is given by and for and . Then the posterior is consistent in for any if and satisfy the following conditions

 e>−1+√8−8λifλ∈[0,12]e>2−2λifλ∈12,1.
###### Remark 1.

If the prior is Gaussian, then the above inequality is satisfied because and the RHS is less than for any . Thus, the only remaining condition is .

###### Remark 2.

It is worth pointing out that for the large class of log-concave measures it is known that , for details consult [1974BorellLogConcave].

In the statistics literature regression models are mainly concerned with pointwise observations. Despite its name this is also true for functional data analysis (see [ferraty2006nonparametric]). However, the regression problem associated with can be viewed as a particular linear inverse problem. As described in the introduction, this has been studied for Gaussian priors in [knapik2011bayesian] and [2012arXiv1203.5753A]. Although our focus lies on establishing posterior consistency for general priors and non-linear models, we also obtain rates which in the special case of Gaussian priors are close to the optimal rates given in the references above.

### 3.1 The Small Noise Limit for Functional Response

In the following we study posterior consistency for a Bayesian regression problem assuming that the data takes values in the Hilbert space . In particular we deal with the regression model

 y=a+1√nξ (9)

with and all being elements of . Moreover, we suppose that the observational noise is a Gaussian random field on and we assume that it satisfies the following assumption.

###### Assumption 3.

Suppose there is such that is trace-class for all , that is

 ∞∑k=1λ2σk<∞.

Imposing this assumption, it becomes possible to quantify the regularity of the observational noise in terms of the Hilbert scale defined with respect to the covariance operator (c.f. A). More precisely, this is possible due to Lemma 2. from [2012arXiv1203.5753A].

The regression model in Equation (9) is a special case of a general inverse problem as considered in Equation (2). Hence the corresponding posterior takes the following form (c.f. Equation (4)).

 dμydμ0=Z(n,ξ)exp(−12n∥a∥21+n⟨a,y⟩1). (10)

Assuming that the data takes values in the Hilbert space , Equation (10) can simply be derived by an application of the Cameron-Martin lemma in combination with the conditioning lemma (Lemma 5.3 in [nonlinearsampling]). We generate data for a fixed ’truth’

 y=a†+1√nξ. (11)

By changing the normalising constant, we may rewrite the posterior in the following way

 dμydμ0=Z(n,ξ)exp(−n2∥∥a−a†∥∥21+√n⟨a−a†,ξ⟩1). (12)

The normalising constant is bounded above and below for for -a.e. . In fact, this holds under weaker assumptions than needed for our results.

###### Lemma 2.

Suppose for and . Then the normalising constant in Equation (12) is bounded for -a.s. and every above and away form zero.

###### Proof.

See D. ∎

The expression above suggests that the posterior concentrates in balls around the truth in the Cameron-Martin norm. First, we make this fact rigorous for priors which are a.s. uniformly bounded with respect to the -norm. In a second step, we assume that the prior has higher exponential moments. Considering Gaussian priors, we show that our rate is close to the optimal rate obtained in [knapik2011bayesian].

#### Posterior Consistency for Uniformly Bounded Priors

The following theorem can be viewed as a preliminary step towards Theorem 4 which contains our most general posterior consistency result for the Bayesian regression problem in the small noise limit. While containing our main ideas, the following result also establishes an explicit rate for posterior consistency which will be used for the EIP in Section 4.

###### Theorem 3.

Suppose that the noise satisfies Assumption 3 and

 ∥∥a∥∥s≤Uμ0-a.s. (13)

for If and , then is consistent in . Additionally, if the following small ball asymptotic is satisfied

 log(μ0(B1ϵ(a†))≿−ϵ−ρ, (14)

then this holds with rate for any with .

###### Proof.

Our proof is based on the observation that posterior consistency is implied by the existence of a sequence of subsets such that and

 supξ∈Snμyn(B1ϵn−κ(a†)c)μyn(B1ϵn−κ(a†))→0forn→∞ (15)

where . This implication holds because

 μyn(B1ϵn−κ(a†))+μyn(B1ϵn−κ(a†)c)=1

and thus

 supξ∈Snμyn(B1ϵn−κ(a†)c)μyn(B1ϵn−κ(a†))≤δ⇒11+δ≤supξ∈Snμyn(B1ϵn−κ(a†)) (16)

which together with implies posterior consistency, for details see Equation (5).

Fix . Then with as sufficiently slow. We notice that Lemma 2 implies that as . The remainder of the proof will be devoted to showing that Equation (15) holds. We bound by smoothing at the expense of

 ∣∣⟨a−a†,ξ⟩1∣∣ ≤ ∣∣∣⟨Γ−1+1−σ0−γ2(a−a†),Γσ0−1+γ2ξ⟩1∣∣∣ ≤ ∥∥a−a†∥∥1+σ0+γ∥ξ∥1−σ0−γ ≤ ∥∥a−a†∥∥1+σ0+γK′n∀ξ∈B1−σ0K′n(0).

Interpolating between and for (c.f. Lemma 1) yields

 ∣∣⟨a−a†,ξ⟩1∣∣≤K′n∥∥a−a†∥∥λ1∥∥a−a†∥∥1−λs≤Kn∥∥a−a†∥∥λ1 (17)

with . An application of Equation (12) yields the following upper bound

 μy(B1ϵnκ(a†)) (18)

Similarly, we obtain the following upper bound

 μy(B1ϵn−κ(a†))≤

The expression in the exponential in Equation (12) can be rewritten as a function of which is decreasing on . If

 −12(2−λ)<−κ, (19)

then for and large enough leading to

 μy(B1ϵn−κ(a†))≤Z(n,ξ)exp(−ϵ2n1−2κ+n12−κλϵλKn). (20)

We now derive sufficient conditions for to be the dominant term in the exponential in the Equations (18) and (20) implying Equation (15). This is the case if, in addition to Inequality (19),

 1−2κ > 12−κλand logμ0(B1ϵn−κ2(a†)) ≳ −n1−2κ

hold. The first line is equivalent to Inequality (19) and using Inequality (14) the second line is implied by

 1−2κ>κρ. (21)

Thus, the Inequalities (19) and (21) imply that is the dominant term in the Inequalities (18) and (20) establishing Equation (15). Letting concludes the proof. ∎

#### Extension to the Case of Unbounded Priors

In the following we weaken the assumptions of Theorem 3 by assuming that the prior has exponential moments of . The price we pay is that the algebraic rate of convergence is implicitly given as a low-dimensional optimisation problem.

###### Theorem 4.

Suppose that the noise satisfies Assumption 3, the prior satisfies the small ball asymptotic

 log(μ0(B1ϵ(a†))≿−ϵ−ρ

and for and for . If the following optimisation problem has a solution , then for any the posterior is consistent in for in with rate .

 12+ηpq−κλp <1−2κ (41) 12−η+(1−λ)qθ <1−2κ (42) ρκ

where .

See Appendix C.∎

###### Remark 3.

In general, might depend on for to hold. Therefore the rate might be improved by optimising over different .

Whereas the algebraic rate in Theorem 4 is implicit, the following corollary yields a simple condition implying posterior consistency.

###### Corollary 5.

Suppose that the noise satisfies Assumption 3, and for , and . If one of the following two conditions holds

 0<λ≤12 and e>−1+2√2√1−λor 12<λ<1 and e>2−2λ,

then is posterior consistent for in .

###### Proof.

It follows from the proof of Theorem 4 that we only have to find , , and such that the Inequalities (41), (42), (46), (51) and (54) are satisfied. Choosing as large as Inequality (41) permits, that is , extends the range of solutions of the other inequalities ((42) and (54)) containing . Similarly, choosing as large as (42) permits, that is , extends the range of solutions of Inequality (54). Letting in (54) yields

 p ≥1 λp <2(???) (1−λ)q
 (p−2)(p−1e(p−1)+(λ−1)p+1)2(p−1)

Now it is left to perform a case-by-case analysis. Starting from Inequality (22), the first two cases are and . For these cases we have to treat and separately in order to rearrange Equation (22) to a quadratic inequality in . The details are tedious but straightforward algebra.∎

###### Remark 4.

We would like to point out that the Remarks 1 and 2 are also valid for this more general Corollary 5.

#### Comparison for the Special Case of Gaussian Priors

In the special case of jointly diagonalisable prior and noise covariance, we evaluate the consistency rate in Theorem 4 by comparing it with the optimal rates obtained in [knapik2011bayesian]. By numerically solving the optimisation problem in Theorem 4, we indicate that our rates are close to the optimal rate.

In the following we first derive a Gaussian prior and noise for a regression problem before reformulating our result in this context. In a second step we reformulate the problem in the notation of [knapik2011bayesian] and state the corresponding result. We conclude this section by an actual comparison between the posterior consistency rate obtained in [knapik2011bayesian] and the results of this paper.

We suppose that the prior is Gaussian and that the covariance operators of the prior and of the noise are jointly diagonalisable over denoting an orthonormal basis of eigenvectors. Furthermore, we assume that the eigenvalues and of and satisfy

 μj =j−t (23) λj =j−r, (24)

respectively. The inner product of the Hilbert scale with respect to can now explicitly be written as

 ⟨x,y⟩r=∞∑j=1μ−2rjxjyj,∥x∥2r=∞∑j=1μ−2rjx2j.

Moreover, we remark that Assumption 3 is satisfied with . The covariance operator of on has eigenvalues which can be seen by denoting and calculating

 Eμ0⟨x,u⟩Hs⟨x,v⟩Hs =Eμ0⟨x,S2sru⟩H⟨x,S2srv⟩H =⟨C0S2sru,S2sru⟩H=⟨S2srC0u,v⟩Hs.

In order to conclude that is trace-class on , we need to impose that . In this case, we know from Example 2 and Proposition 3 in Section 18 of [MR1472736] that the small balls asymptotic

 log(μ0(B1ϵ(a†))≿−ϵ−ρ

is satisfied for with .

For this problem we adapt Theorem 4 by optimising over in the appropriate range as described in Remark 3. Moreover, Fernique’s theorem [gaussianMeasureas] for Gaussian measures motivates us setting and as discussed above.

###### Corollary 6.

Let the prior and the observational noise be specified as in Equation (23) and (24). If the following optimisation problem has a solution , then for any the posterior is consistent in for in with rate .

 12+ηpq−κλp < 1−2κ 12−η+(1−λ)qθ < 1−2κ 1t−r−1κ < 2θ 1t−r−1κ < 1−2κ (26) (ηpq−12)λp < −κ λp < 2 (1−λ)q < 2 (12−η)(1+12−(1−λ)q) < max(1−2κ,θ2)

where .

We now recast our problem reformulating it in the setting and notation of [knapik2011bayesian]. Letting be -valued white noise, our problem corresponds to recovering from

 y=a+1√nΓ12ζ.

This problem is equivalent to

 ~Y=Ka+1√nζ (27)

where Let be an orthonormal basis of eigenvectors of on . In order to adapt the notation of [knapik2011bayesian], we write and note that will be equivalent to the Cameron-Martin space which takes the form

 H1=SrH2:={v∈H2|v=∑vifis.t.∑v2ii2r<∞}

with orthonormal basis . Moreover, let be defined as

 Kek:=Γ−12ek=λkkrfk.

In order to match Assumption 3.1 in [knapik2011bayesian], we have to bound the eigenvalues of as follows

 M−1i−p≤κi≤Mi−p.

We determine these eigenvalues by noting that

 ⟨KTfk,ej⟩H2=⟨fk,Kej⟩H2=δjkλkkr.

The calculation above yields

 KTKfk=(λkkr)2fk

and thus

 κk=(λkkr)2≍1=n0⇒p=0.

As in Equation (3.1.3), we identify the covariance operator of on through its eigenvalues

 ~λk≍k−2t+2r.

By Theorem 4.1 in [knapik2011bayesian] the posterior contraction rate is given by

 n−α∧β1+2α+2p

where (compare Equation (3.5) in [knapik2011bayesian]) and is the regularity of the truth. As above, we suppose that resulting in

 κopt=t−r−122(t−r)−1.

In Figure 1, we use numerical optimisation to compare our rate to the optimal one for with varying .

Just considering Inequality (26) (essential to our approach since this implies that the Cameron-Martin term dominates the prior measure c.f. Equation (40)) yields

 κPossible=t−r−12(t−r)−1

which coincides with the rate obtained by solving the optimisation problem in Corollary 6. Thus, even if we are able to improve our bounds, there is a genuine gap between our rate and the optimal rate in the case of Gaussian priors. The reason for this gap is that Theorem 4 is applicable to any prior satisfying the stated regularity and small ball assumptions. Nevertheless, Figure 1 indicates that the obtained rates are quite close. In contrast, [knapik2011bayesian] is only applicable to Gaussian priors for which the Gaussian stucture of the prior and the posterior are explicitly used.

### 3.2 Pointwise Observations in the Large Data Limit

We consider the following non-parametric Bayesian regression problem

 yi=a(xi)+ξii:=1,…,n (28)

with , a bounded domain and . We assume that a prior is supported on resulting in a posterior of the form

 dμyndμ0∝exp(−n∑i=1(a(xi)−yi)22σ2).

Subsequently, we will prove posterior consistency for this problem for the case