On wellposedness of Bayesian data assimilation and inverse problems in Hilbert space
Abstract.
Bayesian inverse problem on an infinite dimensional separable Hilbert space
with the whole state observed is well posed when the prior state distribution
is a Gaussian probability measure and the data error covariance is a cylindrical
Gaussian measure whose covariance has positive lower bound. If the state
distribution and the data distribution are equivalent Gaussian probability
measures, then the Bayesian posterior measure is not well defined. If the
state covariance and the data error covariance commute, then the Bayesian
posterior measure is well defined for all data vectors if and only if the data
error covariance has positive lower bound, and the set of data vectors for
which the Bayesian posterior measure is not well defined is dense if the data
error covariance does not have positive lower bound.
Keywords:
White noise, Bayesian inverse problems, cylindrical measures
AMS Subject Classification: 65J22, 62C10, 60B11
1. Introduction
Data assimilation and the solution of inverse problems on infinite dimensional spaces are of interest as a limit case of discrete problems of an increasing resolution and thus increasing dimension. Computer implementation is by necessity finite dimensional, yet studying a discretized problem (such as a finite difference or finite element discretization) as an approximation of an infinitedimensional one (such as a partial differential equation) is a basic principle of numerical mathematics. This principle has recently found use in Bayesian inverse problems as well, for much the same reason; important insights in highdimensional probability are obtained by considering it in the light of infinite dimension. See [19, 20] for an introduction and an extensive discussion.
Bayesian data assimilation and inverse problems are closely linked; the prior distribution acts as a regularization and the maximum aposteriori probability (MAP) delivers a single solution to the inverse problem. In the Gaussian case, the prior becomes a type of Tikhonov regularization and the MAP estimate is essentially a regularized least squares solution. Since there is no Lebesque measure in an infinite dimension, the standard probability density does not exist, and the MAP estimate needs to be understood in a generalized sense [5, 7].
However, unlike in a finite dimension, even the simplest problems are often illposed when the data is infinite dimensional. It is often assumed that the data likelihood is such that the problem is well posed, or, more specifically, that the data space is finite dimensional, e.g., [5, 7, 9, 10, 12, 19, 20]. Wellposedness of the infinite dimensional problem affects the performance of stochastic filtering algorithms for finite dimensional approximations; it was observed computationally [3, Sec. 4.1] that the performance of the ensemble Kalman filter and the particle filter does not deteriorate with increasing dimension when the state distribution approaches a Gaussian probability measure, but the curse of dimensionality sets in when the state distribution approaches white noise. A related theoretical analysis was recently developed in [1].
It was noted in [16] that Bayesian filtering is well defined only for some values of observations when the data space is infinite dimensional. In [1], necessary and sufficient conditions were given in the Gaussian case for the Bayesian inverse problem to be well posed for all data vectors a.s. with respect to the data distribution, which was understood as a Gaussian measure on a larger space than the given data space. However, in the typical case studied here, such random data vector is a.s. not in the given data space, so the conditions in [1] are not informative in our setting. See Remark 9 for more details.
In this paper, we study perhaps the simplest case of a Bayesian inverse problem on an infinite dimensional separable Hilbert space: the whole state is observed, the observation operator (the forward operator in inverse problems nomenclature) is identity, and both the state distribution and the data error distribution (which enters in the data likelihood) are Gaussian. The state distribution is (a standard, additive) Gaussian measure, but the data error distribution is allowed to be only a weak Gaussian measure [2], that is, only a finitely additive cylindrical measure [18]. This way, we may give up the additivity of the data error distribution, but the data vectors are in the given data space. Weak Gaussian measure is additive if and only if its covariance is has finite trace. White noise, with covariance bounded away from zero on infinite dimensional space, is an example of a weak Gaussian measure, which is not additive.
It is straightforward that when the data error covariance has positive lower bound, then the least squares, Kalman filter, and the Bayesian posterior are all well defined (Theorems 2, 3, and 6). The main results of this paper consist of the study of the converse when the state is a Gaussian measure:

Example 1: If the state covariance and the data error covariance are the same operator with finite trace, then the least squares are not well posed for some data vectors.

Example 4: If the state distribution and the data error distribution are equivalent Gaussian measures on infinite dimensional space, then the posterior measure is not well defined.

Theorem 7: If the state covariance and the data error covariance commute, then the posterior measure is well defined for all data vectors if and only if the data error covariance has positive lower bound.

Corollary 8: If the state covariance and the data covariance commute and the data covariance does not have positive lower bound, then the set of vectors for which the posterior measure is not well defined is dense.
The paper is organized as follows. In Section 2, we recall some background and establish notation. The wellposedness of data assimilation as a least squares problem is considered in Section 3.1, the wellposedness of Kalman filter formulas in Section 3.2, and the wellposedness of the Bayesian setting in terms of measures in Section 4.
2. Notation
We denote by a separable Hilbert space with a realvalued inner product denoted by and the norm . We assume that has infinite dimension, though all statements hold in finite dimension as well. We denote by the space of all bounded linear operators from to We say that has positive lower bound if
for some and all We write when is symmetric, i.e., , where denotes the adjoint operator to and has positive lower bound. The operator is positive semidefinite if
for all and we use the notation when is symmetric and positive semidefinite. We say that is a trace class operator if
where is a total orthonormal set in does not depend on the choice of .
We denote by the space of all random variables on , i.e., if , then is a measurable mapping from a probability space to where denotes Borel algebra on . A weak random variable is a mapping
where is a general probability space, and denotes an algebra of cylindrical sets on , such that:

for all , it holds that , and

for any and any the mapping
is measurable, i.e., is an dimensional real random vector.
We denote by the space of all weak random variables on . Obviously, when then i.e., weak random variables are interesting only if the dimension of the state is infinite. A weak random variable has weak Gaussian distribution (also called a cylindrical measure), denoted where is the mean of W, and is the covariance of if, for any finite set the random vector has multivariate Gaussian distribution with
and covariance matrix
where denotes the element of the matrix in the row and the column. It can be shown that is measurable, i.e., if and only if the covariance is trace class, e.g., [2, Theorem 6.2.2].
3. Data assimilation
Suppose that is a dynamical system defined on a separable Hilbert space Data assimilation uses observations of the form
where , and , to estimate sequentially the states of the dynamical system. In each data assimilation cycle, i.e., for each a forecast state is combined with the observation to produce a better estimate of the true state Hence, one data assimilation cycle is an inverse problem [19, 20, 5]. Since we are interested in one data assimilation cycle only, we drop the time index for the rest of the paper.
3.1. 3dvar
The 3DVAR method is based on a minimization of the cost function
(1) 
where is a known background covariance operator and is a data noise covariance. If the state space is finite dimensional, and the matrix is regular, then the norm on the righthand side of (1) is defined by
However, when the state space is infinite dimensional, the inverse of a compact linear operator is unbounded and only densely defined. It is then natural to extend the quadratic forms on the righthand side of (1) as
(2) 
where i.e., denotes the image of the operator , and
(3) 
Obviously, the 3DVAR cost function attains infinite value, and, even worse, it is not hard to construct an example when for all .
Example 1.
Naturally, a minimization of does not make sense unless there is at least one such that Fortunately, we can formulate a sufficient condition when this condition is fulfilled.
Theorem 2.
If at least one of the operators and has positive lower bound, then for any possible values of and there exist at least one such that
Proof.
Without loss of generality, assume that has positive lower bound. Hence,
for any combinations of and Therefore, given
for any ∎
3.2. KF and EnKF
The ensemble Kalman filter [8, 11], which is based on the Kalman filter [14, 13], is one of the most popular assimilation method. The key part of both methods is the Kalman gain operator
(6) 
where and . If the data space is finite dimensional, then the matrix is positive definite, and the inverse is well defined. However, when data space is infinite dimensional, the operator may not be defined on the whole space since an inverse of a trace class operator is only densely defined. Therefore, the KF update equation
where , may not be applicable since there is no guarantee that the term is defined. Yet, similarly to 3DVAR, there is a sufficient condition when the Kalman filter algorithm is well defined for any possible values.
Theorem 3.
If the data noise covariance has positive lower bound, then the Kalman gain operator is defined on the whole space .
Proof.
If has positive lower bound, then the linear operator has positive lower bound as well because is the covariance operator, so The statement now follows from the fact that an operator with positive lower bound has an inverse defined on the whole space. ∎
4. Bayesian approach
Denote by the distribution of Bayes’ theorem prescribes the analysis measure by
(7) 
for all if
(8) 
where the given function is called a data likelihood. If the distribution of the forecast and data noise are both Gaussian, then
(9) 
where
With the natural convention that , we have
When both state and data spaces are finite dimensional, condition (8) is fulfilled for any possible value of observation Unfortunately, when both spaces are infinite dimensional, condition (8) may not be fulfilled as shown in the next example.
Example 4.
Assume that and, belongs to the CameronMartin space of i.e., . If the measures and are equivalent, then both have the same CameronMartin space, and
so
Remark 5.
Obviously, the Bayesian update (7), is useful only if the set
is empty. The sufficient condition when the set is empty is similar to conditions when previously mentioned assimilation techniques are well defined.
Theorem 6.
The set
where and the data likelihood is defined by (9), is empty if the operator has positive lower bound.
Proof.
The operator has positive lower bound, so the data likelihood function
is positive for any , and it follows that
for all . ∎
In the special case when both forecast and data covariances commute, we can show that this condition is also necessary for the set to be empty. Recall that operators and commute when
Theorem 7.
Assume that , and operators and commute. Then
for all if and only if the operator has positive lower bound.
Proof.
Without loss of generality assume that The operators and are symmetric, commute, and is compact, so there exists a total orthonormal set of common eigenvectors,
For any , denote by its Fourier coefficient with respect to the orthonormal set ,
Using this notation,
and
Denote
Since for any , is a monotone sequence of functions on . The functions are continuous and therefore measurable, and by the the monotone convergence theorem,
(11) 
For each , the random variable has distribution, which we denote by . Additionally,
and, in particular, the random variables and are independent unless Then,
for all and, using Fubini’s theorem,
Now (11) yields that
and, since the measure is absolutely continuous with respect to the Lebesgue measure on ,
(12) 
where
i.e., is the density of a distributed random variable.
The identity
with
allows us to write (12) in the form
By standard properties of the normal distribution,
for each , so
(13) 
where we used the computation
The infinite product (13) is nonzero if and only if the following sum converges,
(14) 
To conclude the proof, we need to show that that (14) converges if and only if
(15) 
First, the equivalence
(16) 
follows from the limit comparison test because
when
(17) 
If condition (17) is not satisfied, then both sums in (16) diverge. If , then the sum
and this sum converges because is trace class.
Further, if , then
since are Fourier coefficients of On the other side, when , we will construct such that and
Since there exists a subsequence such that
and we define
with
The element lies in the unit circle because
while
where the last equality follows immediately from (17).
Therefore, the sum (14) is finite for all if and only if ∎
The construction of the element at the end of the previous proof may be generalized, and it implies the following interesting corollary.
Corollary 8.
Assume that operators and commute. The set
where and the data likelihood is defined by (9), is dense in if the the operator does not have positive lower bound.
Proof.
To show that is dense it is sufficient to show that for each and any
Let and . Similarly as in the previous proof, denote by the total orthonormal set such that
Because
there exists a subsequence such that
for all Now, define such that
so
Using the same method as in the proof of Theorem 7,
if and only if
(18) 
However, when
(19) 
then
Therefore, using the same arguments as in the previous proof, if (19) is not satisfied, then
Therefore, the sum at the lefthand side of (18) diverges, and ∎
Remark 9.
[1, Theorem 3.8] have shown that if the spectrum of consists of countably many eigenvalues (plus zero), then the analysis measure is well defined and absolutely continuous with respect to the forecast measure for almost all if and only if is trace class. The data error distribution is understood as a Gaussian measure on a Hilbert space . The space has a weaker topology than , and draws from may not be in .
For example, suppose that is trace class and . Then is trace class, is white noise with the CameronMartin space , and since the measure of the CameronMartin space of is zero, data vector drawn from on is in fact a.s. not in . Consequently, in this example, [1, Theorem 3.8] is not informative about the wellposedness of the analysis measure when , where the problem is formulated. In the present approach, the data error distribution is only a cylindrical measure on , and the analysis measure is well defined and absolutely continuous with respect to the forecast measure , for all .
References
 S. Agapiou, O. Papaspiliopoulos, D. SanzAlonso, and A. M. Stuart, Importance sampling: computational complexity and intrinsic dimension, arxiv:1511.06196, 2015, Version 2, January 2017, submitted to Statistical Science.
 A. V. Balakrishnan, Applied functional analysis, SpringerVerlag, New York, 1976. MR MR0470699 (57 #10445)
 Jonathan D. Beezley, Highdimensional data assimilation and morphing ensemble Kalman filters with applications in wildfire modeling, Ph.D. thesis, University of Colorado Denver, 2009. MR MR2713197
 Vladimir I. Bogachev, Gaussian measures, Mathematical Surveys and Monographs, Vol. 62, American Mathematical Society, Providence, RI, 1998. MR MR1642391 (2000a:60004)
 S. L. Cotter, M. Dashti, J. C. Robinson, and A. M. Stuart, Bayesian inverse problems for functions and applications to fluid mechanics, Inverse Problems 25 (2009), no. 11, 115008, 43. MR 2558668
 Giuseppe Da Prato and Jerzy Zabczyk, Stochastic equations in infinite dimensions, Encyclopedia of Mathematics and its Applications, vol. 44, Cambridge University Press, Cambridge, 1992. MR MR1207136 (95g:60073)
 Masoumeh Dashti, Stephen Harris, and Andrew Stuart, Besov priors for Bayesian inverse problems, Inverse Problems and Imaging 6 (2012), no. 2, 183–200. MR 2942737
 Geir Evensen, Sequential data assimilation with nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics, Journal of Geophysical Research 99 (C5) (1994), no. 10, 143–162.
 Bamdad Hosseini, Wellposed Bayesian inverse problems with infinitelydivisible and heavytailed prior measures, arXiv:1609.07532, 2016.
 Bamdad Hosseini and Nilima Nigam, Wellposed Bayesian inverse problems: Priors with exponential tails, arXiv:1604.02575, 2016.
 P.L. Houtekamer and Herschel L. Mitchell, Data assimilation using an ensemble Kalman filter technique, Monthly Weather Review 126 (1998), no. 3, 796–811.
 M.A. Iglesias, K. Lin, S. Lu, and A.M. Stuart, Filter based methods for statistical linear inverse problems, arXiv:1512.01955.
 R. E. Kalman and R. S. Bucy, New results in filtering and prediction theory, Transactions of the ASME – Journal of Basic Engineering 83 (1961), 95–108.
 Rudolph Emil Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME – Journal of Basic Engineering, Series D 82 (1960), 35–45.
 Ivan Kasanický, Ensemble Kalman filter on high and infinite dimensional spaces, Ph.D. thesis, Charles University, Faculty of Mathematics and Physics, 2016.
 S. Lasanen, Measurements and infinitedimensional statistical inverse theory, Proc. Appl. Math. Mech. 7 (2007), no. 1, 1080101–1080102.
 F. S. Levin, An introduction to quantum theory, Cambridge University Press, Cambridge, 2002. MR 1922993
 Laurent Schwartz, Radon measures on arbitrary topological spaces and cylindrical measures, Published for the Tata Institute of Fundamental Research, Bombay by Oxford University Press, London, 1973, Tata Institute of Fundamental Research Studies in Mathematics, No. 6. MR 0426084
 A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numer. 19 (2010), 451–559. MR 2652785
 Andrew M. Stuart, The Bayesian approach to inverse problems, arXiv:1302.6989, 2013.
 N. N. Vakhania, V. I. Tarieladze, and S. A. Chobanyan, Probability distributions on Banach spaces, Mathematics and its Applications (Soviet Series), vol. 14, D. Reidel Publishing Co., Dordrecht, 1987, Translated from the Russian and with a preface by Wojbor A. Woyczynski. MR 1435288
 John von Neumann, Mathematical foundations of quantum mechanics, Princeton University Press, Princeton, 1955, Translated by Robert T. Beyer. MR 0066944