Filter Based Methods For
Statistical Linear Inverse Problems
Ill-posed inverse problems are ubiquitous in applications. Understanding of algorithms for their solution has been greatly enhanced by a deep understanding of the linear inverse problem. In the applied communities ensemble-based filtering methods have recently been used to solve inverse problems by introducing an artificial dynamical system. This opens up the possibility of using a range of other filtering methods, such as 3DVAR and Kalman based methods, to solve inverse problems, again by introducing an artificial dynamical system. The aim of this paper is to analyze such methods in the context of the ill-posed linear inverse problem.
Statistical linear inverse problems are studied in the sense that the observational noise is assumed to be derived via realization of a Gaussian random variable. We investigate the asymptotic behavior of filter based methods for these inverse problems. Rigorous convergence rates are established for 3DVAR and for the Kalman filters, including minimax rates in some instances. Blowup of 3DVAR and a variant of its basic form is also presented, and optimality of the Kalman filter is discussed. These analyses reveal a close connection between (iterative) regularization schemes in deterministic inverse problems and filter based methods in data assimilation. Numerical experiments are presented to illustrate the theory.
In many geophysical applications, in particular in the petroleum industry and in hydrology, distributed parameter estimation problems are often solved by means of iterative ensemble Kalman filters . The basic methodology is to introduce an artificial dynamical system, to supplement this with observations, and to apply the ensemble Kalman filter. The methodology is described in a basic, abstract form, applicable to a general, possibly nonlinear, inverse problem in . In this basic form of the algorithm regularization is present due to dynamical preservation of a subspace spanned by the ensemble during the iteration. The paper  gives further insight into the development of regularization for these ensemble Kalman inversion methods, drawing on links with the Levenberg-Marquardt scheme . In this paper our aim is to further the study of filters for the solution of inverse problems, going beyond the ensemble Kalman filter to encompass the study of other filters such as 3DVAR and the Kalman filter itself – see  for an overview of these filtering methods. A key issue will be the implementation of regularization with the aim of deriving optimal error estimates.
We focus on the linear inverse problem
where is a compact operator acting between Hilbert spaces and . The exact solution is denoted by and is a noise polluting the observations. We will consider two situations: Data Model where multiple observations are made in the form 1; and Data Model where a single observation is made. For modelling purposes we will assume that the noise is generated by the Gaussian , independently in the case of multiple observations. In each case we create a sequence ; for Data Model the elements of this sequence are i.i.d. whilst for Data Model they are , with a single draw from . The case where multiple independent observations are made is not uncommon in applications (for example in electrical impedance tomography (EIT, ) and, although we do not pursue it here, our methodology also opens up the possibility of considering multiple instances with correlated observational noise, by means of similar filtering-based techniques.
The artificial, partially observed linear dynamical system that underlies our methodology is as follows:
In deriving the filters we apply to this dynamical system, it is assumed that the are i.i.d. from . Note, however, that whilst the data sequence we use in Data Model is of this form, the assumption is not compatible with Data Model ; thus for Data Model we have a form of model error or model mis-specification .
By studying the application of filtering methods to the solution of the linear inverse problem our aim is to open up the possibility of employing the filtering methodology to (static) inverse problems of the form 1, and nonlinear generalizations. We confine our analysis to the linear setting as experience has shown that a deep understanding of this case is helpful both because there are many linear inverse problems which arise in applications, and because knowledge of the linear case guide methodologies for the more general nonlinear problem . The last few decades have seen a comprehensive development of the theory of linear inverse problems, both classically and statistically – see [2, 6] and the references therein. Consider the Tikhonov-Phillips regularization method
This can be reformulated from a probabilistic perspective as the MAP estimator for Bayesian inversion given a Gaussian smoothness prior, with mean and Cameron-Martin space compactly embedded into , and a Gaussian noise model as defined above; this connection is eludicated in [11, 5]. We note that from the point of view of Tikhonov-Phillips regularization only the parameter is relevant, but that each of and have separate interpretations in the overarching Bayesian picture, the first as a scaling of the prior precision and the second as observational noise variance. In this paper we deepen the connection between the Bayesian methodology and classical methods.
The recent paper  opens up the prospect for a statistical explanation of iterative regularization methods in the form of
with a general Kalman gain operator . In this paper, we establish the connection between iterative regularization methods (c.f. [6, 8]) and filter based methods  with respect to an artificial dynamic system. More precisely, for a linear inverse problem, we verify that the iterative Tikhonov regularization method
is closely related to filtering methods such as 3DVAR and the Kalman filter when applied to the partially observed linear dynamical system 2. The similarity between both schemes provides a probabilistic interpretation of iterative regularization methods, and allows the possibility of quantifying uncertainty via the variance. On the other hand, we will employ techniques from the convergence analysis arising in regularization theories  to shed light on the convergence of filter based methods, especially when the linear observation operator is ill-posed.
The paper is organized as follows. We first introduce filter based methods for the artificial dynamics 2 in Section 2. Section 3 describes some general useful formulae which are relevant to all the filters we study, and lists our main assumptions on the inverse problem of interest. In Sections 4 and 5 respectively, detailed asymptotic analyses are given for the Kalman filter method and 3DVAR, for both data models. The final Section 6 presents numerical illustrations confirming the theoretical predictions.
2 Filters For The Artificial Dynamics
2.1 Filter Definitions
Recall the artificial dynamics (2), where the observation operator also defines the inverse problem (1), and is an i.i.d. sequence with . The aim of filters is to estimate given the data In particular, probabilistic filtering aims to estimate the probability distribution of the conditional random variable
If we assume that then the desired conditional random variable is Gaussian, because of the linearity inherent in (2), together with the assumed Gaussian structure of the noise sequence . Furthermore the independence of the elements of the noise sequence means that the Gaussian can be updated sequentially in a Markovian fashion. If we denote by the mean, and by the covariance, then we obtain the Kalman filter updates for these two quantities:
The operator is known as the Kalman gain and the inverse of the covariance, the precision operator , may be shown to satisfy
All of these facts concerning the Kalman filter may be found in Chapter 4 of . Expression (5) requires careful justification in infinite dimensions, and this is provided in  in certain settings. However we will only use (5) as a quick method for deriving useful formulae, not expressed in terms of precision operators, which can be justified directly under the assumptions we make.
A simplification of the Kalman filter method is the 3DVAR algorithm  which is not, strictly speaking, a probabilistic filter because it does not attempt to accurately track covariance information. Instead the covariance is fixed in time at
for some fixed positive and self-adjoint operator The parameter is a scaling constant the inverse of which measures the relative size of the fixed covariance of the filter relative to that of the data. Imposing this simplification on equations (4a), (4b) gives
It is also helpful to define, from (4c),
Throughout the paper stands for Kalman gain, updated mean and updated covariance for the Kalman filter method and is the related sequence of quantities for 3DVAR.
2.2 Asymptotic Behaviour of Filters
We will view the filters as methods for reconstructing the truth ; in particular we will study the proximity of (for the Kalman filter) and (for 3DVAR) to for various large asymptotics. Although the assumption in the derivation of the filters is that is an i.i.d. sequence of the form , we will not always assume that the data available is of this form; to be precise Data Model is compatible with this assumption whilst Data Model is not.
Recall that Data Model refers to the situation where the data used in the Kalman and 3DVAR filters has the form , where the are i.i.d. Given such a data sequence we can generate an auxiliary element
with and . The law of large numbers and central limit theorem thus allows us to consider an inverse problem of the form (1) with noise level reduced by a factor of We will study, in the sequel, whether the Kalman or 3DVAR filters are able to automatically exploit the decreased uncertainty inherent in an i.i.d. data set of this form.
For Data Model we simply assume that the data used in the filters is of the form where is given by (1) with From the discussion in the preceding paragraph, we clearly expect less accurate reconstruction in this case. For this data model we may view 3DVAR as a stationary iterative Tikhonov regularization, whilst the Kalman filter is an alternative iterative non-stationary regularization scheme, since is updated in each step. In addition, the statistical perspective not only allows us to obtain an estimator (the mean (4b) or (7b)), but also in the case of the Kalman filter method, to quantify the uncertainty (via the covariance (4c)). This uncertainty quantification perspective provides additional motivation for the filtering approaches considered herein.
In this paper our primary focus is the asymptotic large behavior of the Kalman filter method and 3DVAR. More precisely, we are interested in the accurate recovery of the true state when the noise variance vanishes, i.e. for Data Models and , or as for Data Model (by the law of large numbers/central limit theorem discussion above).
To highlight the difficulties inherent in ill-posed inverse problems in this regard, we note the following which is a straightforward consequence of Theorem 4.10 in  when specialized to linear problems. Here, and in what follows, denotes both the norm on and the operator norm from into itself.
Consider the 3DVAR algorithm with . Assume that there exists a constant such that and that Then, for Data Model , it yields
Note however, that if the observation operator is compact or the inversion is ill-posed, the assumption in the preceding proposition cannot hold. More precisely, the operator is no longer contractive since the spectrum of the operator clusters at the origin. Our focus in the remainder of the paper will be on such ill-posed inverse problems.
3 Main Assumptions and General Properties of Filters
Recall that denotes both the norm on and the operator norm from into itself. Throughout will denote a generic constant, independent of the key limiting quantities and . Our main assumption is:
For both the Kalman filter and the 3DVAR filter, we assume
the variance and , where is a positive constant and is positive self-adjoint, and is a densely defined unbounded self-adjoint strictly positive operator;
the forward operator satisfies
on for some constants and ;
the initial mean satisfies (or ) with ;
the operator in item (i) is trace class on .
We briefly comment on these items. Item allows a well defined operator
which is essential in carrying out our analysis. Item is often called the link condition and it connects both operators and (or ). The third item is the source condition (regularity) of the true solution . The final item makes a well-defined covariance operator on .
Item in the preceding assumption is automatically satisfied if and have the same eigenfunctions and certain decaying singular values. Item can then be expressed in this eigenbasis. When studying the Kalman filter we will, in some instances, employ the following specific form of items , . Comparison of Assumptions 1 and 2 we see that they are identical if and
Let the variance . The operators and have the same eigenfunctions with their eigenvalues and satisfying
for some , and . Furthermore, by choosing the initial mean , the true solution with its coordinates in the basis obeys .
3.2 Filter Properties
We start by deriving properties of the Kalman filter method under Data Model . Other cases can be simply derived from minor variants of this setting. Recall from (4b)
and note that
Under Data Model we have and hence the total error satisfies
To establish a rigorous convergence analysis, the mean squared error (MSE) is of particular interest. Since is deterministic and each is i.i.d Gaussian we obtain a bias-variance decomposition of the MSE:
To proceed further, both terms and need to be calibrated more carefully.
We consider the operator which appears in both terms. By (4c), we obtain
which is equivalent to
Notice that (5) yields
We will use this expression (which is well-defined in view of Assumption 1 ) and the labelled equations preceding it in this subsection, frequently in what follows.
4 Asymptotic Analysis of the Kalman Filter
In this section we investigate the asymptotic behaviour of the Kalman filter (4a)-(4c), under Assumption 1. In particular, we are interested in whether we can reproduce the minimax convergence rate. This minimax rate is achieved by adopting Assumption 1 in the diagonal form of Assumption 2.
4.1 Kalman Filter and Data Model
We present the main results in current subsection.
(i) Under the Assumption 2 we prove unconditional convergence of the Kalman filter method for any fixed and , noticing that both the bias and variance vanish when goes to infinity. The key ingredient which leads to this unconditional convergence, in comparison with Assumption 1, is that the rate of decay of the eigenvalues of the variance operator is made explicit under Assumption 2; this is to be contrasted with the weaker assumption made in Assumption 1 .
(ii) By choosing depending on the update step , again with fixed , both Theorems 1 and 2 yield convergence rates in the MSE sense. Indeed in the second case, where we use Assumption 2, the minimax rate of is achieved. This minimax rate may also be achieved from the Bayesian posterior distribution with appropriate tuning of the prior in terms of the (effective) noise size ; the tuning of the prior is identical to the tuning of the initial condition for the covariance , via choice of
The operator-valued function (18) has been verified to be powerful in the convergence analysis of deterministic regularization schemes – see [14, Ch.2]. In that context the following inequality is useful:
Following these ideas we obtain the next two lemmas describing the bias and variance error bounds. We leave the proofs of both lemmas to the Appendix. Theorems 1 and 2 are consequences, by choosing the parameter appropriately.
Lemma 1 (Bias for Kalman filter).
4.2 Kalman Filter and Data Model
The key difference between Data Model and Data Model is that, in the case , the noises appearing in the expression for the term are identical, rather than i.i.d. mean zero as in case . This results in a reduced rate of convergence in case over case , as seen in the following two theorems:
Both convergence rates in Theorems 3 and 4 are of the same order since the variance has been tuned to scale in the same way as the bias by choosing to stop the iteration at , depending on , appropriately. In comparison with the convergence rates in Theorems 1 and 2, the ones in this section under Data Model require small noise ; those in the preceding subsection do not because multiple observations, with additive independent noise, are made of Proof of the two preceding theorems is straightforward: the terms is analyzed as in the preceding subsection and the term must be carefully analyzed under the assumptions of Data Model . The key new result is stated in the following lemma, whose proof may be found in the Appendix.
5 Asymptotic Analysis of 3DVAR
5.1 Classical 3DVAR
The mean of the 3DVAR algorithm is given by (7b) and has the form
If we define then we obtain, since ,
with . We further derive
Similarly to the analysis of the Kalman filter, we derive
Thus, the MSE takes the bias-variance decomposition form
This leads to the following two theorems:
The preceding two theorems show that, under Data Model and for any fixed , the (bound on the) MSE of the 3DVAR filter blows up logarithmically as under Assumption 1, and is asymptotically bounded for Assumption 2. In contrast, for the Kalman filter method the MSE is asymptotically bounded or unconditionally converges in under the same assumptions – see Theorems 1 and 2.
With optimal choice of in terms of the stopping time of the iteration at , comparison of the convergence rates in Theorems 1 and 5 (or Theorems 2 and 6) shows that the Kalman filter outperforms 3DVAR, but only by a logarithmic factor (or a Hölder factor). For simplicity we only analyze and discuss Data Model for 3DVAR filter under the additional Assumption 2; as for Data Model one can derive consequences similar to those in the preceding section in an analogous manner.
We now study Data Model and 3DVAR. We consider only Assumption 1; however the reader may readily extend the analysis to include Assumption 2. In the case of Data Model , both the Kalman and 3DVAR filters have the same error bounds:
The preceding three theorems can be proved by the bias-variance decomposition and application of the following three lemmas, whose proofs are left to the Appendix. However the proof of Theorem 6 is not as straightforward as the others and we present details in the Appendix.
Lemma 4 (Bias for 3DVAR).
Lemma 5 (Variance for 3DVAR - Data Model ).
5.2 Variant of 3DVAR for Data Model
Recall that the 3DVAR iteration (7b) looks like a stationary iterative Tikhonov method (3) [6, 14], with replaced by and with a fixed parameter . The non-stationary iterative Tikhonov regularization, with varying , has been proven to be powerful in deterministic inverse problems . We generalize this method to the 3DVAR setting. Furthermore we demonstrate that, for Data Model , it is possible to see blow-up phenomena with this algorithm.
By calibrating both terms , carefully, we obtain the following blow-up result, proved in the Appendix. Although the result only provides an upper bound, numerical evidence does indeed show blow-up in this regime.
6 Numerical Illustrations
In this section we provide numerical results which display the capabilities of 3DVAR and Kalman filter for solving linear inverse problems of the type described in Section 1. In addition, we verify numerically some of the theoretical results from Section 4 and Section 5.
We consider the two-dimensional domain and define the operators
With this domain is positive, self-adjoint and invertible.Note that (9) from Assumption 1 is satisfied with and . In the following subsections we produce synthetic data from a true function that we generate as a two-dimensional random field drawn from the Gaussian measure with covariance
with domain of definition and where is selected as described below. The shift of by a constant introduces a correlation length into the true function. Furthermore, for simplicity we consider and note  that the given selection of yields for all . Therefore, for discussion of the present experiments we simply assume that . Consequently, in order to satisfy Assumption 1 (iii) we need such that . Note that operator in (25) satisfies Assumption 1 (iv).
The numerical generation of is carried out by means of the Karhunen-Loeve decomposition of in terms of the eigenfunctions of which, from the definition of , are cosine functions. We recall that for the application of the Kalman Filter and 3DVAR with Data Model 1 we need to generate instances of synthetic data where is the maximum number of iterations of the scheme. Below we discuss the selection of such . The aforementioned synthetic data are generated by means of with and specified below. For Data Model 2 we produce synthetic data simply by setting with . In order to avoid inverse crimes , all synthetic data used in our experiments are generated on a finer grid ( cells) than the one (of cells) used for the inversion. We use splines to interpolate synthetic data on the coarser grid that we use for the application of the filters.
For both 3DVAR and Kalman filter with Data Model 1, we fix the number of iterations for the scheme and consider the selection stated in Theorem 1 and Theorem 5. Provided that the filters are stopped according to , these theorems ensure the convergence rates in (16) and (22) that we verify numerically in the following subsection. For Data Model 2, Theorem 3 and Theorem 7 suggest that convergence rates (20) and (23) are satisfied for and provided that . The latter equality provides an expression for the number of iterations that we may use as stopping criteria for these filtering algorithms applied to Data Model 2. In Algorithm 1 we summarize the Kalman filter and 3DVAR schemes applied to both data models.
Kalman Filter/3DVAR (Data Model Data Model )
For , update and as follows
and where we recall that
Note that for Data Model we need at least independent instances of data.
6.2 Using Kalman Filter and 3DVAR for solving linear inverse problems
In this subsection we demonstrate how the filters under consideration in the iterative framework described in Algorithm 1 can be used, with Data Model 1 and Data Model 2, to solve the linear inverse problem presented in Section 1. Let us consider the truth displayed in Figure 1 (top) generated, as described in subsection 6.1, from a Gaussian measure with covariance (26) and . Synthetic data are generated as described above with three different choices of that yield noise levels of approximately , , and of the norm of the noise free data (i.e. ).
We apply Algorithm 1 to this synthetic data generated both for the application of Data Model and Data Model . For Data Model we consider a selection of . Algorithm 1 states that the these schemes should be stopped at iteration level . However, in order to observe the performance of these schemes, in these experiments we allowed for a few more iterations. In the left column of Figure 2 we display the plots of the error w.r.t the truth of the estimator as a function of the iterations, i.e.
where denotes the interpolation of on the coarse grid used for the inversion. Note that the error w.r.t. the truth of the estimates produced by both schemes decreases monotonically. Interestingly, the value at the final iteration displayed in these figures is approximately the same for all these experiments independently of the noise level. Moreover, the stability of the scheme does not seem to depend on the early termination of the scheme. In addition, we note that both Kalman filter and 3DVAR exhibit very similar performance in terms of reducing the error w.r.t the truth. However, for larger noise levels we observe small fluctuations in the error obtained with 3DVAR.
In Figure 1 we display the estimates obtained with 3DVAR with Data Model at iterations for noise levels (determined by the choice of ) of (top-middle), (middle-bottom) and (bottom). We can visually appreciate from Figure 1 that the estimates obtained at the final iteration are indeed similar one to another even though the one in the bottom row was computed by inverting data five times noisier than the one in the first row. Similar estimates (not shown) were obtained with the Kalman filter for Data Model .
For Data Model , the selection of corresponding to noise levels of , , and yields a maximum number of iterations respectively. Clearly, for Data Model , smaller observational noise results in schemes that can be iterated longer, presumably achieving more accurate estimates. Similarly to Data Model , we are required to stop the algorithm at the iteration . However, for the purpose of this study we iterate until . In the right column of Figure 2 we display the plots of the error w.r.t the truth of . In contrast to Data Model , we observe that the error w.r.t the truth increases for . In other words, the Kalman filter and 3DVAR, when applied to Data Model , need to be stopped at in order to stabilize the scheme and obtain accurate estimates of the truth. Moreover, stopping the scheme at results in estimates whose accuracy increases with smaller noise levels. Clearly, in the small noise limit, both data models tend to exhibit similar behaviour. In Figure 3 we display obtained from 3DVAR applied to Data Model at iterations for data with noise levels of (top-middle), (middle-bottom) and (bottom). Similar estimates (not shown) were obtained with the Kalman Filter for Data Model .
It is clear indeed, that for Data Model , the application of Kalman filter and 3DVAR results in more accurate and stable estimates when the noise level in the data is not sufficiently small. The results from this subsection show that the reduction in the variance of the noise due to the law of large numbers and central limit theorem effect in Data Model produces more accurate algorithms. Although Data Model requires multiple instances of the data, in some applications such as in EIT , the data collection can be repeated multiple times in order to obtain these data.
6.3 Numerical verification of convergence rates
In this subsection we test the convergence rates from Theorems 1, 5, 3, and 7. For the verification of each of these rates we let denote the covariance from (26), and we consider experiments corresponding to different truths generated from with the four selections of . Note that Assumption 1 (iii) is satisfied only for . Again, for Data Model 1 we generate synthetic data for each of the truths and for as many iterations used for the application of both schemes. Inverse crimes are avoided as described in subsection 6.1.
The verification of Theorem 1 and Theorem 5 by means of Algorithm 1 in the case of Data Model 1 is straightforward. For each of the set of synthetic data associated to each of the 20 truths previously mentioned, we fix . For each (with ) we run Algorithm 1, stop the schemes at and record the value of . In the right (resp. left) Figure 4 we display a plot of vs for the Kalman filter (resp. 3DVAR) for each of the set of 20 experiments associated to different truths (red solid lines) generated as described above with (from top to bottom) . From Theorem 1 we note that the corresponding slopes of the convergence rates should be approximately given by . For Theorem 5 there is an additional term of , but this is of course negligible compared to the algebraic decay and we ignore it for the purposes of this discussion. For comparison, a line (black dotted) with slope