Well-posedness of Bayesian data assimilation and inverse problems

On well-posedness of Bayesian data assimilation and inverse problems in Hilbert space

Abstract.

Bayesian inverse problem on an infinite dimensional separable Hilbert space with the whole state observed is well posed when the prior state distribution is a Gaussian probability measure and the data error covariance is a cylindrical Gaussian measure whose covariance has positive lower bound. If the state distribution and the data distribution are equivalent Gaussian probability measures, then the Bayesian posterior measure is not well defined. If the state covariance and the data error covariance commute, then the Bayesian posterior measure is well defined for all data vectors if and only if the data error covariance has positive lower bound, and the set of data vectors for which the Bayesian posterior measure is not well defined is dense if the data error covariance does not have positive lower bound.
Keywords: White noise, Bayesian inverse problems, cylindrical measures
AMS Subject Classification: 65J22, 62C10, 60B11

Supported partially by the Czech Science Foundation grant 13-34856S and NSF grant 1216481

1. Introduction

Data assimilation and the solution of inverse problems on infinite dimensional spaces are of interest as a limit case of discrete problems of an increasing resolution and thus increasing dimension. Computer implementation is by necessity finite dimensional, yet studying a discretized problem (such as a finite difference or finite element discretization) as an approximation of an infinite-dimensional one (such as a partial differential equation) is a basic principle of numerical mathematics. This principle has recently found use in Bayesian inverse problems as well, for much the same reason; important insights in high-dimensional probability are obtained by considering it in the light of infinite dimension. See [19, 20] for an introduction and an extensive discussion.

Bayesian data assimilation and inverse problems are closely linked; the prior distribution acts as a regularization and the maximum aposteriori probability (MAP) delivers a single solution to the inverse problem. In the Gaussian case, the prior becomes a type of Tikhonov regularization and the MAP estimate is essentially a regularized least squares solution. Since there is no Lebesque measure in an infinite dimension, the standard probability density does not exist, and the MAP estimate needs to be understood in a generalized sense [5, 7].

However, unlike in a finite dimension, even the simplest problems are often ill-posed when the data is infinite dimensional. It is often assumed that the data likelihood is such that the problem is well posed, or, more specifically, that the data space is finite dimensional, e.g., [5, 7, 9, 10, 12, 19, 20]. Well-posedness of the infinite dimensional problem affects the performance of stochastic filtering algorithms for finite dimensional approximations; it was observed computationally [3, Sec. 4.1] that the performance of the ensemble Kalman filter and the particle filter does not deteriorate with increasing dimension when the state distribution approaches a Gaussian probability measure, but the curse of dimensionality sets in when the state distribution approaches white noise. A related theoretical analysis was recently developed in [1].

It was noted in [16] that Bayesian filtering is well defined only for some values of observations when the data space is infinite dimensional. In [1], necessary and sufficient conditions were given in the Gaussian case for the Bayesian inverse problem to be well posed for all data vectors a.s. with respect to the data distribution, which was understood as a Gaussian measure on a larger space than the given data space. However, in the typical case studied here, such random data vector is a.s. not in the given data space, so the conditions in [1] are not informative in our setting. See Remark 9 for more details.

In this paper, we study perhaps the simplest case of a Bayesian inverse problem on an infinite dimensional separable Hilbert space: the whole state is observed, the observation operator (the forward operator in inverse problems nomenclature) is identity, and both the state distribution and the data error distribution (which enters in the data likelihood) are Gaussian. The state distribution is (a standard, -additive) Gaussian measure, but the data error distribution is allowed to be only a weak Gaussian measure [2], that is, only a finitely additive cylindrical measure [18]. This way, we may give up the -additivity of the data error distribution, but the data vectors are in the given data space. Weak Gaussian measure is -additive if and only if its covariance is has finite trace. White noise, with covariance bounded away from zero on infinite dimensional space, is an example of a weak Gaussian measure, which is not -additive.

It is straightforward that when the data error covariance has positive lower bound, then the least squares, Kalman filter, and the Bayesian posterior are all well defined (Theorems 2, 3, and 6). The main results of this paper consist of the study of the converse when the state is a Gaussian measure:

  1. Example 1: If the state covariance and the data error covariance are the same operator with finite trace, then the least squares are not well posed for some data vectors.

  2. Example 4: If the state distribution and the data error distribution are equivalent Gaussian measures on infinite dimensional space, then the posterior measure is not well defined.

  3. Theorem 7: If the state covariance and the data error covariance commute, then the posterior measure is well defined for all data vectors if and only if the data error covariance has positive lower bound.

  4. Corollary 8: If the state covariance and the data covariance commute and the data covariance does not have positive lower bound, then the set of vectors for which the posterior measure is not well defined is dense.

The paper is organized as follows. In Section 2, we recall some background and establish notation. The well-posedness of data assimilation as a least squares problem is considered in Section 3.1, the well-posedness of Kalman filter formulas in Section 3.2, and the well-posedness of the Bayesian setting in terms of measures in Section 4.

2. Notation

We denote by a separable Hilbert space with a real-valued inner product denoted by and the norm . We assume that has infinite dimension, though all statements hold in finite dimension as well. We denote by the space of all bounded linear operators from to We say that has positive lower bound if

for some and all We write when is symmetric, i.e., , where denotes the adjoint operator to and has positive lower bound. The operator is positive semidefinite if

for all and we use the notation when is symmetric and positive semidefinite. We say that is a trace class operator if

where is a total orthonormal set in does not depend on the choice of .

We denote by the space of all random variables on , i.e., if , then is a measurable mapping from a probability space to where denotes Borel -algebra on . A weak random variable is a mapping

where is a general probability space, and denotes an algebra of cylindrical sets on , such that:

  1. for all , it holds that , and

  2. for any and any the mapping

    is measurable, i.e., is an -dimensional real random vector.

We denote by the space of all weak random variables on . Obviously, when then i.e., weak random variables are interesting only if the dimension of the state is infinite. A weak random variable has weak Gaussian distribution (also called a cylindrical measure), denoted where is the mean of W, and is the covariance of if, for any finite set the random vector has multivariate Gaussian distribution with

and covariance matrix

where denotes the element of the matrix in the row and the column. It can be shown that is measurable, i.e., if and only if the covariance is trace class, e.g., [2, Theorem 6.2.2].

For further background on probability on infinite dimensional spaces and cylindrical measures see, e.g., [4, 6, 21].

3. Data assimilation

Suppose that is a dynamical system defined on a separable Hilbert space Data assimilation uses observations of the form

where , and , to estimate sequentially the states of the dynamical system. In each data assimilation cycle, i.e., for each a forecast state is combined with the observation to produce a better estimate of the true state Hence, one data assimilation cycle is an inverse problem [19, 20, 5]. Since we are interested in one data assimilation cycle only, we drop the time index for the rest of the paper.

3.1. 3dvar

The 3DVAR method is based on a minimization of the cost function

(1)

where is a known background covariance operator and is a data noise covariance. If the state space is finite dimensional, and the matrix is regular, then the norm on the right-hand side of (1) is defined by

However, when the state space is infinite dimensional, the inverse of a compact linear operator is unbounded and only densely defined. It is then natural to extend the quadratic forms on the right-hand side of (1) as

(2)

where i.e., denotes the image of the operator , and

(3)

Obviously, the 3DVAR cost function attains infinite value, and, even worse, it is not hard to construct an example when for all .

Example 1.

Suppose that is a trace class operator,

(4)

and

(5)

If then

because is an linear subspace of and (4), so

When , then

using (5), so, again,

Therefore, for all

Naturally, a minimization of does not make sense unless there is at least one such that Fortunately, we can formulate a sufficient condition when this condition is fulfilled.

Theorem 2.

If at least one of the operators and has positive lower bound, then for any possible values of and there exist at least one such that

Proof.

Without loss of generality, assume that has positive lower bound. Hence,

for any combinations of and Therefore, given

for any

3.2. KF and EnKF

The ensemble Kalman filter [8, 11], which is based on the Kalman filter [14, 13], is one of the most popular assimilation method. The key part of both methods is the Kalman gain operator

(6)

where and . If the data space is finite dimensional, then the matrix is positive definite, and the inverse is well defined. However, when data space is infinite dimensional, the operator may not be defined on the whole space since an inverse of a trace class operator is only densely defined. Therefore, the KF update equation

where , may not be applicable since there is no guarantee that the term is defined. Yet, similarly to 3DVAR, there is a sufficient condition when the Kalman filter algorithm is well defined for any possible values.

Theorem 3.

If the data noise covariance has positive lower bound, then the Kalman gain operator is defined on the whole space .

Proof.

If has positive lower bound, then the linear operator has positive lower bound as well because is the covariance operator, so The statement now follows from the fact that an operator with positive lower bound has an inverse defined on the whole space. ∎

4. Bayesian approach

Denote by the distribution of Bayes’ theorem prescribes the analysis measure by

(7)

for all if

(8)

where the given function is called a data likelihood. If the distribution of the forecast and data noise are both Gaussian, then

(9)

where

With the natural convention that , we have

When both state and data spaces are finite dimensional, condition (8) is fulfilled for any possible value of observation Unfortunately, when both spaces are infinite dimensional, condition (8) may not be fulfilled as shown in the next example.

Example 4.

Assume that and, belongs to the Cameron-Martin space of i.e., . If the measures and are equivalent, then both have the same Cameron-Martin space, and

so

Remark 5.

Another data likelihood is proposed in [19],

where is defined by (8), and is defined by (9). This definition leads to the analysis distribution

(10)

for all That is, any data such that is ignored.

Obviously, the Bayesian update (7), is useful only if the set

is empty. The sufficient condition when the set is empty is similar to conditions when previously mentioned assimilation techniques are well defined.

Theorem 6.

The set

where and the data likelihood is defined by (9), is empty if the operator has positive lower bound.

Proof.

The operator has positive lower bound, so the data likelihood function

is positive for any , and it follows that

for all . ∎

In the special case when both forecast and data covariances commute, we can show that this condition is also necessary for the set to be empty. Recall that operators and commute when

Theorem 7.

Assume that , and operators and commute. Then

for all if and only if the operator has positive lower bound.

Proof.

Without loss of generality assume that The operators and are symmetric, commute, and is compact, so there exists a total orthonormal set of common eigenvectors,

e.g., [15, Lemma 8], [17], [22, Section II.10].

For any , denote by its Fourier coefficient with respect to the orthonormal set ,

Using this notation,

and

Denote

Since for any , is a monotone sequence of functions on . The functions are continuous and therefore measurable, and by the the monotone convergence theorem,

(11)

For each , the random variable has distribution, which we denote by . Additionally,

and, in particular, the random variables and are independent unless Then,

for all and, using Fubini’s theorem,

Now (11) yields that

and, since the measure is absolutely continuous with respect to the Lebesgue measure on ,

(12)

where

i.e., is the density of a -distributed random variable.

The identity

with

allows us to write (12) in the form

By standard properties of the normal distribution,

for each , so

(13)

where we used the computation

The infinite product (13) is nonzero if and only if the following sum converges,

(14)

To conclude the proof, we need to show that that (14) converges if and only if

(15)

First, the equivalence

(16)

follows from the limit comparison test because

when

(17)

If condition (17) is not satisfied, then both sums in (16) diverge. If , then the sum

and this sum converges because is trace class.

Further, if , then

since are Fourier coefficients of On the other side, when , we will construct such that and

Since there exists a subsequence such that

and we define

with

The element lies in the unit circle because

while

where the last equality follows immediately from (17).

Therefore, the sum (14) is finite for all if and only if

The construction of the element at the end of the previous proof may be generalized, and it implies the following interesting corollary.

Corollary 8.

Assume that operators and commute. The set

where and the data likelihood is defined by (9), is dense in if the the operator does not have positive lower bound.

Proof.

To show that is dense it is sufficient to show that for each and any

Let and . Similarly as in the previous proof, denote by the total orthonormal set such that

Because

there exists a subsequence such that

for all Now, define such that

so

Using the same method as in the proof of Theorem 7,

if and only if

(18)

However, when

(19)

then

Therefore, using the same arguments as in the previous proof, if (19) is not satisfied, then

Therefore, the sum at the left-hand side of (18) diverges, and

Remark 9.

[1, Theorem 3.8] have shown that if the spectrum of consists of countably many eigenvalues (plus zero), then the analysis measure is well defined and absolutely continuous with respect to the forecast measure for -almost all if and only if is trace class. The data error distribution is understood as a Gaussian measure on a Hilbert space . The space has a weaker topology than , and draws from may not be in .

For example, suppose that is trace class and . Then is trace class, is white noise with the Cameron-Martin space , and since the measure of the Cameron-Martin space of is zero, data vector drawn from on is in fact -a.s. not in . Consequently, in this example, [1, Theorem 3.8] is not informative about the well-posedness of the analysis measure when , where the problem is formulated. In the present approach, the data error distribution is only a cylindrical measure on , and the analysis measure is well defined and absolutely continuous with respect to the forecast measure , for all .

References

  1. S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, and A. M. Stuart, Importance sampling: computational complexity and intrinsic dimension, arxiv:1511.06196, 2015, Version 2, January 2017, submitted to Statistical Science.
  2. A. V. Balakrishnan, Applied functional analysis, Springer-Verlag, New York, 1976. MR MR0470699 (57 #10445)
  3. Jonathan D. Beezley, High-dimensional data assimilation and morphing ensemble Kalman filters with applications in wildfire modeling, Ph.D. thesis, University of Colorado Denver, 2009. MR MR2713197
  4. Vladimir I. Bogachev, Gaussian measures, Mathematical Surveys and Monographs, Vol. 62, American Mathematical Society, Providence, RI, 1998. MR MR1642391 (2000a:60004)
  5. S. L. Cotter, M. Dashti, J. C. Robinson, and A. M. Stuart, Bayesian inverse problems for functions and applications to fluid mechanics, Inverse Problems 25 (2009), no. 11, 115008, 43. MR 2558668
  6. Giuseppe Da Prato and Jerzy Zabczyk, Stochastic equations in infinite dimensions, Encyclopedia of Mathematics and its Applications, vol. 44, Cambridge University Press, Cambridge, 1992. MR MR1207136 (95g:60073)
  7. Masoumeh Dashti, Stephen Harris, and Andrew Stuart, Besov priors for Bayesian inverse problems, Inverse Problems and Imaging 6 (2012), no. 2, 183–200. MR 2942737
  8. Geir Evensen, Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, Journal of Geophysical Research 99 (C5) (1994), no. 10, 143–162.
  9. Bamdad Hosseini, Well-posed Bayesian inverse problems with infinitely-divisible and heavy-tailed prior measures, arXiv:1609.07532, 2016.
  10. Bamdad Hosseini and Nilima Nigam, Well-posed Bayesian inverse problems: Priors with exponential tails, arXiv:1604.02575, 2016.
  11. P.L. Houtekamer and Herschel L. Mitchell, Data assimilation using an ensemble Kalman filter technique, Monthly Weather Review 126 (1998), no. 3, 796–811.
  12. M.A. Iglesias, K. Lin, S. Lu, and A.M. Stuart, Filter based methods for statistical linear inverse problems, arXiv:1512.01955.
  13. R. E. Kalman and R. S. Bucy, New results in filtering and prediction theory, Transactions of the ASME – Journal of Basic Engineering 83 (1961), 95–108.
  14. Rudolph Emil Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME – Journal of Basic Engineering, Series D 82 (1960), 35–45.
  15. Ivan Kasanický, Ensemble Kalman filter on high and infinite dimensional spaces, Ph.D. thesis, Charles University, Faculty of Mathematics and Physics, 2016.
  16. S. Lasanen, Measurements and infinite-dimensional statistical inverse theory, Proc. Appl. Math. Mech. 7 (2007), no. 1, 1080101–1080102.
  17. F. S. Levin, An introduction to quantum theory, Cambridge University Press, Cambridge, 2002. MR 1922993
  18. Laurent Schwartz, Radon measures on arbitrary topological spaces and cylindrical measures, Published for the Tata Institute of Fundamental Research, Bombay by Oxford University Press, London, 1973, Tata Institute of Fundamental Research Studies in Mathematics, No. 6. MR 0426084
  19. A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numer. 19 (2010), 451–559. MR 2652785
  20. Andrew M. Stuart, The Bayesian approach to inverse problems, arXiv:1302.6989, 2013.
  21. N. N. Vakhania, V. I. Tarieladze, and S. A. Chobanyan, Probability distributions on Banach spaces, Mathematics and its Applications (Soviet Series), vol. 14, D. Reidel Publishing Co., Dordrecht, 1987, Translated from the Russian and with a preface by Wojbor A. Woyczynski. MR 1435288
  22. John von Neumann, Mathematical foundations of quantum mechanics, Princeton University Press, Princeton, 1955, Translated by Robert T. Beyer. MR 0066944
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
   
Add comment
Cancel
Loading ...
104126
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description