Accuracy and Stability of Filters for Dissipative PDEs

# Accuracy and Stability of Filters for Dissipative PDEs

C. E. A. Brett, K. F. Lam, K. J. H. Law, D. S. McCormick, M. R. Scott and A. M. Stuart Warwick Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK
July 5, 2019
###### Abstract

Data assimilation methodologies are designed to incorporate noisy observations of a physical system into an underlying model in order to infer the properties of the state of the system. Filters refer to a class of data assimilation algorithms designed to update the estimation of the state in an on-line fashion, as data is acquired sequentially. For linear problems subject to Gaussian noise, filtering can be performed exactly using the Kalman filter. For nonlinear systems filtering can be approximated in a systematic way by particle filters. However in high dimensions these particle filtering methods can break down. Hence, for the large nonlinear systems arising in applications such as oceanography and weather forecasting, various ad hoc filters are used, mostly based on making Gaussian approximations. The purpose of this work is to study the accuracy and stability properties of these ad hoc filters. We work in the context of the 2D incompressible Navier-Stokes equation, although the ideas readily generalize to a range of dissipative partial differential equations (PDEs). By working in this infinite dimensional setting we provide an analysis which is useful for the understanding of high dimensional filtering, and is robust to mesh-refinement. We describe theoretical results showing that, in the small observational noise limit, the filters can be tuned to perform accurately in tracking the signal itself (filter accuracy), provided the system is observed in a sufficiently large low dimensional space; roughly speaking this space should be large enough to contain the unstable modes of the linearized dynamics. The tuning corresponds to what is known as variance inflation in the applied literature. Numerical results are given which illustrate the theory. The positive results herein concerning filter stability complement recent numerical studies which demonstrate that the ad hoc filters can perform poorly in reproducing statistical variation about the true signal.

## 1 Introduction

Assimilating large data sets into mathematical models of time-evolving systems presents a major challenge in a wide range of applications. Since the data and the model are often uncertain, a natural overarching framework for the formulation of such problems is that of Bayesian statistics. However, for high dimensional models, investigation of the Bayesian posterior distribution of model state given data is not computationally feasible in on-line situations. For this reason various ad hoc filters are used. The purpose of this paper is to provide an analysis of such filters.

The paradigmatic example of data assimilation is weather forecasting: computational models to predict the state of the atmosphere currently involving on the order of unknowns but these models must be run with an initial condition which is only known incompletely. This is compensated for by a large number, currently on the order of , partial observations of the atmosphere at subsequent times. Filters are widely used to make forecasts which combine the mathematical model of the atmosphere and the data to make predictions. Indeed the particular method of data assimilation which we study here includes, as a special case, the algorithm commonly known as 3DVAR. This method originated in weather forecasting. It was first proposed at the UK Met Office in 1986 article:Lorenc1986 (), and was developed by the US National Oceanic and Atmospheric Administration (NOAA) soon thereafter; see article:Parrish1992 (). More details of the implementation of 3DVAR by the UK Met Office can be found in article:Lorenc2000 (), and by the European Centre for Medium-Range Weather Forecasts (ECMWF) in article:Courtier1998 (). The 3DVAR algorithm is prototypical of the many more sophisticated filters which are now widely used in practice and it is thus natural to study it. The reader should be aware, however, that the development of new filters is a very active area and that the analysis here constitutes an initial step towards the analyses required for these more recently developed algorithms. For insight into some of these more sophisticated filters see toth1997ensemble (); evensen2009data (); VL09 (); harlim2008filtering (); majda2010mathematical () and the references therein.

Filtering can be performed exactly for linear systems subject to Gaussian noise: the Kalman filter harvey1991forecasting (). For nonlinear or non-Gaussian scenarios the particle filter doucet2001sequential () may be used and provably approximates the desired probability distribution as the number of particles is increased bain2008fundamentals (). However in practice this method performs poorly in high dimensional systems Bickel (). Whilst there is considerable research activity aimed at overcoming this degeneration van2010nonlinear (); chorin2010implicit (), the methodology cannot yet be viewed as a provably accurate tool within the context of the high dimensional problems arising in geophysical data assimilation. In order to circumvent problems associated with the representation of high dimensional probability distributions some form of ad hoc Gaussian approximation is typically used to create practical filters, and the 3DVAR method which we analyze here is perhaps the simplest example of this. This ad hoc filters may also be viewed in the framework of nonlinear control theory. Proving filter stability and accuracy has a long history in this field and the paper tarn76 () is closely related to the work we develop here. However we work in an infinite dimensional setting, in order to directly confront the high dimensionality of many current real-world filtering applications, and this brings new issues to the problem of establishing filter stability and accuracy; overcoming these problems provides the focus of this paper.

In the paper lawstuart () a wide range of Gaussian approximate filters, including 3DVAR, are evaluated by comparing the distributions they produce with a highly accurate (and impractical in realistic online scenarios) MCMC simulation of the desired distributions. The conclusion of that work is that the Gaussian approximate filters perform well in tracking the mean of the desired distribution, but poorly in terms of statistical variability about the mean. In this paper we provide a theoretical analysis of the ability of the filters to estimate the mean state accurately. Although filtering is widely used in practice, much of the analysis of it, in the context of fluid mechanics, works with finite-dimensional dynamical models. Our aim is to work directly with a PDE model relevant in fluid mechanics, the Navier-Stokes equation, and thereby confront the high-dimensional nature of the problem head-on. Study of the stability of filters for data assimilation has been a developing research area over the last few years and the paper carrassi2008data () contains a finite dimensional theoretical result, numerical experiments in a variety of finite and (discretized) infinite dimensional systems not covered by the theory, and references to relevant applied literature. The paper trevisan2011chaos () gives a review of many important aspects relating to assimilating the evolving unstable directions of the underlying dynamical system. Our analysis will build in a concrete fashion on the approach in olson2003determining () and hayden2011discrete () which were amongst the first to study data assimilation directly through PDE analysis, using ideas from the theory of determining modes in infinite dimensional dynamical systems. However, in contrast to those papers, we will allow for noisy observations in our analysis. Nonetheless the estimates in hayden2011discrete () form an important component of our analysis. Furthermore the large time asymptotic results in hayden2011discrete () constitute a limiting case of our theory, where there is no observational noise.

The presentation will be organized as follows: in Section 2 we introduce the Navier-Stokes equation as the forward model of interest and formulate the inverse problem of estimating the velocity field given partial noisy observations. This leads to a family of filters for the velocity field which have the form of a non-autonomous dynamical system which blends the forward model with the data in a judicious fashion; Theorem 2.3 describes this dynamical system via the solution of a sequence of inverse problems. In Section 3 we introduce notions of stability and prove the Main Theorems 3.3 and 3.7 concerning filter stability and accuracy for sufficiently small observational noise. In Section 4 we present numerical results which corroborate the analysis; and finally, in Section 5 we present conclusions.

## 2 Combining Model and Data

In subsection 2.1 we describe the forward model that we employ throughout the remainder of the paper: the Navier Stokes equation on a two dimensional torus. Then, in subsection 2.2, we describe the observational data model that we employ; using this we apply Tikhonov-Phillips regularization to derive the filter which we use to combine model and data. Subsection 2.3 contains a specific example of this filter, used later in the paper for our numerical illustrations. This example constitutes a particular instance of the 3DVAR method.

### 2.1 Forward Model: Navier-Stokes equation

In this section we establish a notation for, and describe the properties of, the Navier-Stokes equation. This is the forward model which underlies the inverse problem which we study in this paper. We consider the 2D Navier-Stokes equation on the torus with periodic boundary conditions:

 ∂tu−νΔu+u⋅∇u+∇p=ffor all (x,t)∈T2×(0,∞),∇⋅u=0for all (x,t)∈T2×(0,∞),u(x,0)=u0(x)for all x∈T2.

Here is a time-dependent vector field representing the velocity, is a time-dependent scalar field representing the pressure, is a vector field representing the forcing (which we assume to be time-independent for simplicity), and is the viscosity. In numerical simulations (see section 4), we typically represent the solution via the vorticity and stream function ; these are related through and , where . We define

 H:={trigonometricpolynomialsu:T2→R2∣∣∇⋅u=0,∫T2u(x)dx=0}

and as the closure of with respect to the norm. We define to be the Leray-Helmholtz orthogonal projector.

Given , define . Then an orthonormal basis for is given by , where

 ψk(x):=k⊥|k|exp(2πik⋅xL)fork∈Z2∖{0}. (1)

Thus for we may write

 u=∑k∈Z2∖{0}uk(t)ψk(x)

where, since is a real-valued function, we have the reality constraint We then define the projection operators and by

 Pλu=∑|2πk|2<λL2uk(t)ψk(x),Qλ=I−Pλ.

We let and

Using the Fourier decomposition of , we define the fractional Sobolev spaces

 Hs:=⎧⎪⎨⎪⎩u∈H:∑k∈Z2∖{0}(4π2|k|2)s|uk|2<∞⎫⎪⎬⎪⎭ (2)

with the norm , where . We use the abbreviated notation for the norm on , and for the norm on .

Applying the projection to the Navier-Stokes equation we may write it as an ODE (ordinary differential equation) in ; see constantin1988navier (); temam1995navier (); book:Robinson2001 () for details. This ODE takes the form

 dudt+νAu+B(u,u)=f,u(0)=u0. (3)

Here, is the Stokes operator, the term is the bilinear form found by projecting the nonlinear term onto and finally, with abuse of notation, is the original forcing, projected into . We note that is diagonalized in the basis comprised of the , on , and the smallest eigenvalue of is The following proposition is a classical result which implies the existence of a dissipative semigroup for the ODE Eq. ( 3 ). See Theorems 9.5 and 12.5 in book:Robinson2001 () for a concise overview and temam1995navier (); book:Temam1997 () for further details.

###### Proposition 2.1.

Assume that and . Then Eq. ( 3 ) has a unique strong solution on for any

 u∈L∞((0,T);H1)∩L2((0,T);D(A)),dudt∈L2((0,T);H).

Furthermore the equation has a global attractor and there is such that, if , then

We let denote the semigroup of solution operators for the equation Eq. ( 3 ) through time units. We note that by working with weak solutions, can be extended to act on larger spaces , with , under the same assumption on ; see Theorem 9.4 in book:Robinson2001 (). We will, on occasion, use this extension of to act on larger spaces.

The key properties of the Navier-Stokes equation that drive our analysis of the filters are summarized in the following proposition, taken from the paper hayden2011discrete (). To this end, define for some fixed Note that the statement here is closely related to the squeezing property book:Robinson2001 () of the Navier-Stokes equation, a property employed in a wide range of applied contexts. Furthermore, many other dissipative PDEs are known to satisfy similar properties.

###### Proposition 2.2.

Let and . There is such that

 ∥Ψ(u)−Ψ(v)∥2≤exp(βh)∥u−v∥2. (4)

Now let and assume that

 λ>λ⋆:=9c8/3λ131(2K12+R12ν)8/3

where is a dimensionless positive constant and is the constant appearing in Proposition 2.1 above. Then there exists with the property that, for all there is such that

 ∥Qλ(Ψ(u)−Ψ(v))∥2≤γ2∥u−v∥2. (5)
###### Proof.

The first statement is simply Theorem 3.8 from hayden2011discrete (). The second statement follows from the proof of Theorem 3.9 in the same paper, modified at the end to reflect the fact that, in our setting, Note also that the constant appearing on the right hand side of the lower bound for in the statement of Theorem 3.9 in hayden2011discrete () should be (as the proof that follows in hayden2011discrete () shows) and that use of definition of (see Theorem 3.6 of that paper) allows rewrite in terms of – indeed the proof in that paper is expressed in terms of . ∎

### 2.2 Inverse Problem: Filtering

In this section we describe the basic problem of filtering the Navier Stokes equation Eq. ( 3 ): estimating properties of the state of the system sequentially from partial, noisy, sequential observations of the state. Throughout the following we write for the natural numbers , and for the non-negative integers . Let be the Hilbert spaces with inner product and norm defined via Eq. ( 2 ) with . If is a self-adjoint, positive-definite compact operator on , which therefore has a symmetric square root , we define

 ⟨⋅,⋅⟩L:=⟨⋅,L−1⋅⟩,∥⋅∥L:=|L−1/2⋅|.

Recall that we have defined for some fixed We let denote and define , by 111With abuse of notation, subscripts will indicate times, while subscripts will denote Fourier coefficients as before in order to avoid confusion. The meaning should also be clear in context.

 uj+1:=Ψ(uj). (6)

This discrete dynamical system is well-defined by virtue of Proposition 2.2. Thus where is the solution of Eq. ( 3 ).

Our interest is in determining from noisy observations of . We now let be a noise sequence in which perturbs the sequence to generate the observation sequence in given by

 yj+1:=Pλuj+1+ξj+1,j∈Z+. (7)

This observation operator allows us to study partially observed infinite dimensional systems in a clean fashion and the resulting analysis will be a useful building block for the study of other partial observations such as pointwise, or smoothed, observations on a regular grid in physical space.

We let , the accumulated data up to time We assume that is not known exactly. The goal of filtering is to determine the state from the data The approach to the problem of blending data and model that we study here is based on Tikhonov-Phillips regularization. We find a point which represents the best compromise between information given by the model and by the data. More precisely, we use the model to provide a regularization of a least squares problem designed to match the data. This will result in a sequence which approximates the true signal giving rise to the data. To define the sequence we introduce two sequences of operators , and which will be used to weight the contributions of model and data at discrete time Then define

 Ij(u)=12∥yj+1−Pλu∥2Γj+1+12∥u−Ψ(ˆuj)∥2Cj+1. (8)

Choosing a minimizer of the functional encodes the idea of a compromise between the model output and the data , to estimate the state of the system at time . The operators and give appropriate weights on the two sources of information. With this in mind we set

 ˆuj+1=arginfu∈H1Ij(u). (9)

The following theorem shows that this is justified:

###### Theorem 2.3.

Assume that and are positive-definite self-adjoint operators on and respectively, that is a bounded operator from into itself and that the norm is equivalent to the norm for some Then, if , the iteration (8, 9) determines a unique sequence in given by

 ˆuj+1=(I−Kj+1)Ψ(ˆuj)+Kj+1yj+1 (10)

where

 Kj+1=Cj+1P∗λ(Γj+1+PλCj+1P∗λ)−1Pλ. (11)
###### Proof.

The proof follows by a simple induction. Assume that . Then let and note that the functional given by

 (12)

is convex and weakly lower semi-continuous and hence has a unique minimizer . Thus is in since and with . Equations (10, 11) for the minimizer follow from standard theory of calculus of variations. ∎

In the case where is a linear map, the matrix is the Kalman gain matrix harvey1991forecasting (), composed with the observation operator We will also study the situation where complete observations are made, obtained by taking in the preceding analyses. The observations are given by

 yj:=uj+ξj,j∈Z+ (13)

where now The operator is now a positive self-adjoint operator on . The filter that we study takes the form (10) with replaced by the identity so that (11) is replaced by

 Kj+1=Cj+1(Γj+1+Cj+1)−1. (14)

If we define

 Bj=I−Kj+1 (15)

then Theorem 2.3 yields the key equation

 ˆuj+1=BjΨ(ˆuj)+(I−Bj)yj+1. (16)

This equation and Eq. ( 10 ) demonstrate that the estimate of the solution at time is found as an operator-convex combination of the true dynamics applied to the estimate of the solution at time , and the data at time . We will favor use of in what follows, rather than , as is the operator which controls the stability, and hence accuracy, of the estimate.

This problem can be given a Bayesian formulation in which the probability distribution of is the primary object of interest. In practice, for high dimensional systems arising in applications in the atmospheric sciences, various ad hoc Gaussian approximations are typically used to approximate these probability distributions; the reader interested in understanding the derivation of filters from this probabilistic perspective is directed to lsetal () for details; this unpublished technical report is an expanded version of the material contained here, to incorporate the probabilistic perspective. The mean of a Gaussian posterior distribution has an equivalent formulation as solution of a Tikhonov-Phillips regularized least squares problem, as we have here, and this perspective on data assimilation is adopted in pot12 (). In pot12 () linear autonomous dynamical systems are studied and filter accuracy and stability results similar to ours are derived.

It is demonstrated numerically in lawstuart () that the Gaussian approximations are, in general, not good approximations. More precisely, they fail to accurately capture covariance information. However, the same numerical experiments reveal that the methodology can perform well in replicating the mean, if parameters are chosen correctly, even if it is initially in error. Indeed this accurate tracking of the mean is often achieved by means of variance inflation – increasing the model uncertainty, here captured in the exogenously imposed , in comparison with the data uncertainty, here captured in the . The purpose of the remainder of the paper is to explain, and illustrate, this phenomenon by means of analysis and numerical experiments.

### 2.3 Example of a Filter: 3DVAR

The algorithm described in the previous section yields the well-known 3DVAR method, discussed in the introduction, when and for some fixed operators . To impose commutativity with , we assume that the operators and are both fractional powers of the Stokes operator , in and respectively. We choose (the parameter forms a useful normalizing constant in the numerical experiments of section 4) and set in and in Substituting into the update formula Eq. ( 16 ) for and defining , in then in (16) we have the constant operator

 B=(B0(η)00I). (17)

Using this we obtain the mean update formula

 ˆuj+1=BΨ(ˆuj)+(I−B)yj+1. (18)

Notice that for given as above, the algorithm depends only on the three parameters and , once the constant of proportionality in is set. The parameter measures the size of the space in which observations are made; for fixed wavevector , the parameter is a measure of the scale of the uncertainty in observations to uncertainty in the model; and the sign of the parameter determines whether, for fixed and asymptotically for large wavevectors, the model is trusted more () or less () than the data. This can be seen by noting that if then , while if then .”

In the case , the case of complete observations where the whole velocity field is noisily observed, we again obtain Eq. ( 18 ), with in . The roles of and are the same as in the finite (partial observations) case.

The discussion concerning parametric dependence with respect to varying shows that, for the example of 3DVAR introduced here, and for both finite and infinite, variance inflation, which refers to reducing faith in the model in comparison with the data, can be achieved by decreasing the parameter We will show that variance inflation does indeed improve the ability of the filter to track the signal.

## 3 Accuracy and Stability

In this section we develop conditions under which it is possible to prove stability of the nonautonomous dynamical system defined by the mean update equation Eq. ( 16 ) and show that, after a sufficiently long time, the true signal is accurately recovered. By stability we here mean that two filters driven by the same noise observation will converge towards the same estimate of the solution. By accuracy we mean that when the noise perturbing the observations is , the filter will converge to an neighbourhood of the true signal, even if initially it is an distance from the true signal. In subsection 3.1 we study the case of partial observations; subsection 3.2 contains the (easier) result for the case of complete observations. The third subsection 3.3 shows how our results can be applied to the specific instance of the 3DVAR algorithm introduced in subsection 2.3, for any , provided , which is a measure of the ratio of uncertainty in the data to uncertainty in the model, is sufficiently small: this, then, is a result concerning variance inflation.

For simplicity, we will assume a “truth” which is on the global attractor, as in hayden2011discrete (). This is not necessary, but streamlines the presentation as it gives an automatic uniform in time bound in . Recall that denotes the norm on , and the norm on ; similarly we lift to denote the induced operator norm on .

It is useful to recall the filter in the form (16):

 ˆuj+1=BjΨ(ˆuj)+(I−Bj)yj+1. (19)

It is also useful to consider a second filter driven by the same data , but possibly started at a different point:

 ˆwj+1=BjΨ(ˆwj)+(I−Bj)yj+1. (20)

### 3.1 Main Result: Partial Observations

In this case we will see that it is crucial that the observation space is sufficiently large, i.e. that a sufficiently large number of modes are observed. This, combined with the contractivity in the high modes encapsulated in Proposition 2.2 from hayden2011discrete (), can be used to ensure stability if combined with variance inflation. We study filters of the form given in Eq. ( 16 ) and make the following assumption on the observations

###### Assumption 3.1.

Consider a sequence , where is a solution of Eq. ( 3 ) lying on the global attractor . Then, for some ,

 yj=Pλuj+ξj

for some sequence satisfying

Note that this assumption, concerning uniform boundedness of the noise, is not verified for the i.i.d. Gaussian case. However we do expect that a more involved analysis would enable us to handle Gaussian noise, at the expense of proving results in mean square, or in probability. Indeed in blsz12 () we study a continuous time limit of the set-up contained in this paper, in which white noise forcing arises from the i.i.d. Gaussian noise; accuracy and stability results can then indeed be proved in mean square and in probability. However, we believe that, for clarity, the assumption made in this paper enables us to convey the important ideas in the most straightforward fashion.

We make the following assumption about the family , and assumed dependence on a parameter Recall that the inverse of quantifies the amount of variance inflation.

###### Assumption 3.2.

The family of positive operators commute with , satisfy , and for some uniformly with respect to . Furthermore, and there is, for all , constant such that

We now study the asymptotic behaviour of the filter under these assumptions.

###### Theorem 3.3.

Let Assumptions 3.1 and 3.2 hold, choose any and let be as given in Proposition 2.2. Assume that . Then for any there is sufficiently small so that the sequences , given by Eq. ( 16 ), Eq. ( 20 ) satisfy, for some ,

 ∥ˆuj−ˆwj∥≤aj∥ˆu0−ˆw0∥

and

 ∥ˆuj−uj∥≤ajr+2bϵj−1∑i=0ai.

Hence

 limsupj→∞∥ˆuj−uj∥≤2b1−aϵ.
###### Proof.

We prove the second, accuracy, result concerning The stability result concerning is proved similarly. Assumption 3.2 shows that Recall that in Eq. ( 16 ) has been extended to an element of , by defining it to be zero in , and we do the same with Substituting the resulting expression for in Eq. ( 16 ) we obtain

 ˆuj+1=BjΨ(ˆuj)+(I−Bj)PλΨ(uj)+(I−Bj)ξj+1

but since by assumption we have

 ˆuj+1=BjΨ(ˆuj)+(I−Bj)Ψ(uj)+(I−Bj)ξj+1. (21)

Note also that

 uj+1=BjΨ(uj)+(I−Bj)Ψ(uj).

Subtracting gives the basic equation for error propagation, namely

 (22)

Since the second item in Proposition 2.2 holds. Fix where is defined in Proposition 2.2. Assume, for the purposes of induction, that

 ∥ˆuj−uj∥≤ajr+2bϵj−1∑i=0ai.

Define noting that the inductive hypothesis implies that, for sufficiently small, Applying to Eq. ( 22 ) and using Eq. ( 4 ) gives

 ∥Pλ(ˆuj+1−uj+1)∥≤∥PλBj∥∥(Ψ(ˆuj)−Ψ(uj))∥+∥Pλ(I−Bj)∥ϵ≤c(λ)η2exp(βh/2)∥ˆuj−uj∥+bϵ.

Applying to Eq. ( 22 ) and using Eq. ( 5 ) gives222The term on the right-hand side of the final identity can here be set to zero because ; however in the analogous proof of Theorem 3.7 it is present and so we retain it for that reason.

 ∥Qλ(ˆuj+1−uj+1)∥≤∥Bj∥∥Qλ(Ψ(ˆuj)−Ψ(uj))∥+∥Qλ(I−Bj)∥ϵ≤γ∥ˆuj−uj∥+bϵ.

Now note that, for any , Thus, by adding the two previous inequalities, we find that

 ∥ˆuj+1−uj+1∥≤(c(λ)η2exp(βh/2)+γ)∥ˆuj−uj∥+2bϵ.

Since and , we may choose sufficiently small so that

 ∥ˆuj+1−uj+1∥≤a∥ˆuj−uj∥+2bϵ.

and the inductive hypothesis holds with . Taking gives the desired result concerning the limsup. ∎

###### Remark 3.4.

Note that the proof exploits the fact that induces a contraction within a finite ball in . This contraction is established by means of the contractivity of in , via variance inflation, and the squeezing property of in , for large enough observation space, from Proposition 2.2.

There are two important conclusions from this theorem. The first is that, even though the solution is only observed in the low modes, there is sufficient contraction in the high modes to obtain an error in the entire estimated state which is of the same order of magnitude as the error in the (low mode only) observations. The second is that this phenomenon occurs even when the initial estimate suffers from an error. Of course this result also shows that, if the true solution starts in an neighbourhood of the truth, then it remains there for all positive time.

### 3.2 Main Result: Complete Observations

Here we study filters of the form given in Eq. ( 16 ) with observations given by Eq. ( 13 ). In this situation the whole velocity field is observed and so, intuitively, it should be no harder to obtain stability than in the partially observed case. The proof is in fact almost identical to the case of partial observations, and so we omit the details. We observe that, although there is no parameter in the problem statement itself, it is introduced in the proof: as in the previous subsection, see Remark 3.4, the key to stability is to obtain contraction in using the squeezing property of the Navier-Stokes equation, and contraction in using the properties of the filter to control unstable modes.

We make the following assumptions:

###### Assumption 3.5.

Consider a sequence where is a solution of Eq. ( 3 ) lying on the global attractor . Then

 yj=uj+ξj

for some sequence satisfying

###### Assumption 3.6.

The family of positive operators commute with , and for some uniformly with respect to Furthermore, for all , there is a constant such that

We now study the asymptotic behaviour of the filter under these assumptions.

###### Theorem 3.7.

Let Assumptions 3.5 and 3.6 hold and choose any Then for any , with given in Proposition 2.2, there is sufficiently small so that the sequences , given by Eq. ( 16 ), Eq. ( 20 ) satisfy, for some ,

 ∥ˆuj−ˆwj∥≤aj∥ˆu0−ˆw0∥

and

 ∥ˆuj−uj∥≤ajr+2bϵj−1∑i=0ai.

Hence

 limsupj→∞∥ˆuj−uj∥≤2b1−aϵ.
###### Proof.

The proof is nearly identical to that of Theorem 3.3. Differences arise only because we have not assumed that This fact arises in two places in Theorem 3.3. The first is where we obtain Eq. ( 21 ). However in this case we directly obtain Eq. ( 21 ) since the whole velocity field is observed. The second place it arises is already dealt with in the footnote appearing in the proof of Theorem 3.3 when estimating the contraction properties in ; there we indicate that the proof is already adjusted to allow for the situation required here. ∎

###### Remark 3.8.

If then the proof may be simplified considerably as it is not necessary to split the space into two parts, and . Instead the contraction of can be used to control any expansion in , provided is sufficiently small.

### 3.3 Example of Main Result: 3DVAR

We demonstrate that the 3DVAR algorithm from subsection 2.3 satisfies Assumptions 3.2 and 3.6 in the partially and completely observed cases respectively, and hence Theorems 3.3 and 3.7 respectively may be applied to the resulting filters. In particular, the filters will locate the true signal, provided is sufficiently small. Satisfaction of Assumptions 3.2 and 3.6 follows from the properties of

 B0(η)=(I+η2A2α0)−1η2A2α0,I−B0(η)=(I+η2A2α0)−1.

Note that the eigenvalues of are

if Clearly the spectral radius of is less than or equal to one on or , independently of the sign of The difference is just that in the former, and is unbounded in the latter.

First we consider the partially observed situation. We note that and is given by Eq. ( 17 ):

 B=((I+η2A2α0)−1η2A2α000I) (23)

so that the Kalman gain-like matrix is given by

 I−B=((I+η2A2α0)−1000). (24)

From this it is clear that Furthermore, since the spectral radius of does not exceed one, the same is true of . Hence for the operator norms from into itself we have Similarly, if then , whilst if then Thus Theorem 3.3 applies. Note that and that .

In the fully observed case we simply have where defined above on . Again and if then , whilst if then Thus Theorem 3.7 applies. Note (see Remark 3.8), that if then the proof of that theorem could be simplified considerably because and in fact

###### Remark 3.9.

We observe that the key conclusion of Theorems 3.3 and 3.7 is the asymptotic accuracy of the algorithm, when started at distances of . The asymptotic bound, although of , has constant which may exceed and so the bound may exceed the error obtained by simply using the observations to estimate the signal. Our numerics will show, however, that in practice the algorithm gives estimates of the state which improve upon the observations.

## 4 Numerical Results

In this section we describe a number of numerical results designed to illustrate the range of filter stability phenomena studied in the previous sections. We start, in subsection 4.1, by describing two useful bounds on the error committed by filters; we will use these guides in the subsequent numerics. Subsection 4.2 describes the common setup for all the subsequent numerical results shown. Subsection 4.3 describes these results in the case of complete observations in discrete time, whilst Subsection 4.4 extends to the case of partial observations, also in discrete time.

Our theoretical results have been derived under Assumptions 3.1 and 3.5 on the errors. These are incompatible with the assumption that the observational noise sequence is Gaussian. This is because i.i.d Gaussian sequences will not have finite supremum. However, in order to test the robustness of our theory we will conduct numerical experiments with Gaussian noise sequences.

### 4.1 Useful Error Bounds

We describe two useful bounds on the error which help to guide and evaluate the numerical simulations. To derive these bounds we assume that the observational noise sequence is i.i.d with and . Then

 E|ξj|2=tr(Γ)=∑kgk

where are the eigenvalues of the operator

• The lower bound is derived from (22). Using the assumed independence of the sequence we see that

 E|ˆuj+1−uj+1|2≥E|(I−Bj)ξj+1|2=tr((I−Bj)Γ(I−Bj)∗) (25)
• The upper bound on the filter error is found by noting that a trivial filter is obtained by simply using the observation sequence as the filter mean; this corresponds to setting in (16). For this filter we obtain

 E|ˆuj+1−uj+1|2≤E|ξj+1|2=tr(Γ) (26)

in the case of complete observations, and

 E|ˆuj+1−uj+1|2≤E|ξj+1|2+|Qλuj+1|2=tr(Γ)+|Qλuj+1|2 (27)

in the case of incomplete observations.

Although the lower bound (25) does not hold pathwise, only on average, it provides a useful guide for our pathwise experiments. The upper bounds (26) and (27) do not apply to any numerical experiment conducted with non-zero , but also serve as a useful guide to those experiments: it is clearly undesirable to greatly exceed the error committed by simply trusting the data. We will hence plot the lower and upper bounds as useful comparators for the actual error incurred in our numerical experiments below. We note that, for the 3DVAR example from subsection 2.3 with complete observations, the upper and lower bounds coincide in the limit as then . For partial observations they differ by the second term in the upper bound.

### 4.2 Experimental Setup

For all the results shown we choose a box side of length . The forcing in Eq. (3) is taken to be , where and with the canonical skew-symmetric matrix, and . The method used to approximate the forward model (3) is a modification of a fourth-order Runge-Kutta method, ETD4RK cox2002exponential (), in which the Stokes semi-group is computed exactly by working in the incompressible Fourier basis in Eq. (1), and Duhamel’s principle (variation of constants formula) is used to incorporate the nonlinear term. Spatially, a Galerkin spectral method hesthaven2007spectral () is used, in the same basis, and the convolutions arising from products in the nonlinear term are computed via FFTs. We use a double-sized domain in each dimension, buffered with zeros, resulting in grid-point FFTs, and only half the modes in each direction are retained when transforming back into spectral space in order to prevent aliasing, which is avoided as long as fewer than 2/3 of the modes are retained.

The dimension of the attractor is determined by the viscosity parameter . For the particular forcing used there is an explicit steady state for all and for this solution is stable (see majda2006non (), Chapter 2 for details). As decrease the flow becomes increasingly complex and the regime corresponds to strongly chaotic dynamics with an upscale cascade of energy in the spectrum. We focus subsequent studies of the filter on a strongly chaotic () parametric regime. For this small viscosity parameter, we use a time-step of .

The data is generated by computing a true signal solving Eq. ( 3 ) at the desired value of , and then adding Gaussian random noise to it at each observation time. Such noise does not satisfy Assumption 3.5, since the supremum of the norm of the noise sequence is not finite, and so this setting provides a severe test beyond what is predicted by the theory; nonetheless, it should be noted that Gaussian random variables only obtain arbitrarily large values arbitrarily rarely.

All experiments are conducted using the 3DVAR setup and it is useful to reread the end of subsection 2.3 in order to interpret the parameters and We consider both the choices for 3DVAR, noting that in the case the operator has norm strictly less than one and so we expect the algorithm to be more robust in this case (see Remark 3.8 for discussion of this fact). For all experiments we set which ensures that the action of , and hence , on the first eigenfunction is independent of the value of ; this is a useful normalization when comparing computations with and

We set the observational noise to constant white noise (i.e. in section 2.3). Here , which gives a standard deviation of approximately 10% of the maximum standard deviation of the strongly chaotic dynamics. Since we are computing in a truncated finite-dimensional basis the eigenvalues are summable; the situation can be considered as an approximation of an operator whose eigenvalues decay rapidly outside the basis in which we compute.

To be more precise regarding the algorithm, let denote the finite-dimensional spectral representation, which is a complex-valued vector of dimension , including redundancy arising from reality and zero mass constraints. The computation of in Eq. (3) requires padding this with zeros to a vector, computing inverse FFTs on the discretization of the 6 fields for , performing products, computing FFTs on the 2 resulting (discrete) spatial fields, and finally discarding the now-populated padding modes. Denoting the discrete map from time to time by , the experiments proceed precisely as follows:

• Evolve until statistical equilibrium, as judged by observing the energy fluctuations . Set so that the initial condition is on the attractor.

• Compute the observations .

• Draw , where [this is essentially arbitrary, as long as the initial condition is something sensible and such that ].

• Compute , and as given in Equations (23, 24)

• For : Compute

### 4.3 Complete Observations

We start by considering discrete and complete observations and illustrate Theorem 3.7, and in particular the role of the parameter The experiments presented employ a large observation increment of . For we find that when (Fig. 1) the estimator stabilizes from an initial error and then remains stable. The upper and lower bounds are satisfied (the upper bound after an initial rapid transient), and even the high modes, which are slaved to the low modes, synchronize to the true signal. For (Fig. 2) the estimator fails to satisfy the upper bound, but remains stable over a long time horizon; there is now significant error in the mode, in contrast to the situation with smaller shown in Fig. 1. Finally, when (Fig. 3), the estimator really diverges from the signal, although still remains bounded.

When the lower and upper bounds are almost indistinguishable and, for all values of examined, the error either exceeds or fluctuates around the upper bound; see Figures 4, 5 and 6 where and respectively. It is not until (Fig. 6) that the estimator really loses the signal. Notice also that the high modes of the estimator always follow the noisy observations and this could be undesirable. For both and , the estimator performs better than the one for in terms of overall error, illustrating the robustness alluded to in Remark 3.8 since for we have However, an appropriately tuned filter has the potential to perform remarkably well, both in terms of overall error and individual error of all modes (see Fig. 1, in contrast to Fig. 4). In particular, this filter has an expected error substantially smaller than the upper bound, which does not happen for the case of when complete observations are assimilated.

### 4.4 Partial Observations

We now proceed to examine the case of partial observations, illustrating Theorem 3.3. Note that the forced mode has magnitude , so ensuring that it is observed requires that . When enough modes are retained, for example when in our setting, the results for the case remain roughly the same and are not shown. However, in the case , in which the observations are trusted more than the model at high wavevectors, the results are greatly improved by ignoring the observations of the high-frequencies. See Fig. 7. This improvement, and indeed the improvement beyond setting for both cases disappears as is decreased. In particular, when the error is never very much smaller than the upper bound. This is due to the fact that the dynamics of the low wavevectors tend to be unpredictable and they contain very little useful information for the assimilation. Then, for much smaller , once enough unstable modes are left unobserved, there is no convergence. The order of magnitude of the error in the asymptotic regime as a function of remains roughly consistent as is decreased until the estimator no longer converges. For small (high-frequency in time observations) and complete observations, the estimator can be slow to converge to the asymptotic regime. In this case, the number of iterations until convergence, for a sufficiently small , becomes significantly larger as is decreased (again until the estimator fails to converge at all).

Given , we expect that for sufficiently small the contribution of the model to the filter will be negligible for all with for . Hence the estimators for both will behave similarly. An example of this is shown in Fig. 8 where and in Fig. 8. In both cases, the estimator is essentially utilizing all the available observations. There are enough observations to draw the higher wavevectors of the estimator closer to the truth than if we just set the population of those modes to zero. In contrast, as mentioned above, when , there are not enough observations even when they are all used, and the error is roughly the same as the upper bound as (not shown).

## 5 Conclusion

This paper contains three main components:

• we show that the filtering problem for the Navier-Stokes equation may be formulated as a a sequence of well-posed inverse problems: Theorem 2.3;

• we prove Theorems 3.3 and 3.7, which establish filter accuracy and stability provided variance inflation is employed;

• we describe numerical results which illustrate, and extend the validity of, the theory.

We note that the analysis will also apply to other semilinear dissipative PDEs which possess a squeezing property (contraction when projected into the high modes) and a global attractor; such equations are studied in depth in book:Temam1997 (), and include the Cahn-Allen and Cahn-Hilliard equations, the Kuramoto-Sivashinsky equation and Ginzburg-Landau equation. These two structural properties of the underlying model, when combined with sufficient variance inflation ( small enough in 3DVAR) enable proof that the 3DVAR filter has a contractive property, even when the underlying dynamical system itself has positive Lyapunov exponents.

There are a number of natural future directions which stem from this work:

• to develop analogous filter stability theorems for more sophisticated filters, such as the extended and ensemble-based methods, when applied to the Navier-Stokes equation;

• to study model-data mismatch by looking at filter stability for data generated by forward models which differ from those used to construct the filter;

• to study the effect of filtering in the presence of model error, by similar methods, to understand how this may be used to overcome problems arising from model-data mismatch.

• to combine the analysis in this paper, which concerns nonlinear problems, but assumes that the observation operator and the Stokes operator commute, and the recent work pot12 () which concerns only linear autonomous problems but does not assume commutativity of the solution operator for the forward model and the observation operator.

Acknowledgements AMS would like to thank the following institutions for financial support: EPSRC, ERC, ESA, and ONR; KJHL was supported by EPSRC, ESA, and ONR; and CEAB, KFL, DSM and MRS were supported EPSRC, through the MASDOC Graduate Training Centre at Warwick University. The authors also thank The Mathematics Institute and Centre for Scientific Computing at Warwick University for supplying valuable computation time. Finally, the authors thank Masoumeh Dashti for valuable input.

## References

• (1) A. C. Lorenc, Analysis methods for numerical weather prediction, Quart. J. R. Met. Soc. 112 (474) (2000) 1177–1194.
• (2) D. F. Parrish, J. C. Derber, The national meteorological centerâs spectral statistical-interpolation analysis system, Monthly Weather Review 120 (8) (1992) 1747–1763.
• (3) A. C. Lorenc, S. P. Ballard, R. S. Bell, N. B. Ingleby, P. L. F. Andrews, D. M. Barker, J. R. Bray, A. M. Clayton, T. Dalby, D. Li, T. J. Payne, F. W. Saunders, The Met. Office global three-dimensional variational data assimilation scheme, Quart. J. R. Met. Soc. 126 (570) (2000) 2991–3012.
• (4) P. Courtier, E. Andersson, W. Heckley, D. Vasiljevic, M. Hamrud, A. Hollingsworth, F. Rabier, M. Fisher, J. Pailleux, The ECMWF implementation of three-dimensional variational assimilation (3d-Var). I: Formulation, Quart. J. R. Met. Soc. 124 (550) (1998) 1783–1807.
• (5) Z. Toth, E. Kalnay, Ensemble forecasting at NCEP and the breeding method, Monthly Weather Review 125 (1997) 3297.
• (6) G. Evensen, Data Assimilation: the Ensemble Kalman Filter, Springer Verlag, 2009.
• (7) P. Van Leeuwen, Particle filtering in geophysical systems, Monthly Weather Review 137 (2009) 4089–4114.
• (8) J. Harlim, A. Majda, Filtering nonlinear dynamical systems with linear stochastic models, Nonlinearity 21 (2008) 1281.
• (9) A. Majda, J. Harlim, B. Gershgorin, Mathematical strategies for filtering turbulent dynamical systems, Dynamical Systems 27 (2) (2010) 441–486.
• (10) A. Harvey, Forecasting, Structural Time Series Models and the Kalman filter, Cambridge Univ Pr, 1991.
• (11) A. Doucet, N. De Freitas, N. Gordon, Sequential Monte Carlo methods in practice, Springer Verlag, 2001.
• (12) A. Bain, D. Crişan, Fundamentals of Stochastic Filtering, Springer Verlag, 2008.
• (13) T. Snyder, T. Bengtsson, P. Bickel, J. Anderson, Obstacles to high-dimensional particle filtering, Monthly Weather Review. 136 (2008) 4629–4640.
• (14) P. van Leeuwen, Nonlinear data assimilation in geosciences: an extremely efficient particle filter, Quarterly Journal of the Royal Meteorological Society 136 (653) (2010) 1991–1999.
• (15) A. Chorin, M. Morzfeld, X. Tu, Implicit particle filters for data assimilation, Communications in Applied Mathematics and Computational Science (2010) 221.
• (16) T. Tarn, Y. Rasis, Observers for nonlinear stochastic systems, Automatic Control, IEEE Transactions on 21 (4) (1976) 441–448.
• (17) K. Law, A. Stuart, Evaluating data assimilation algorithms, Arxiv preprint arXiv:1107.4118.
• (18) A. Carrassi, M. Ghil, A. Trevisan, F. Uboldi, Data assimilation as a nonlinear dynamical systems problem: Stability and convergence of the prediction-assimilation system, Chaos: An Interdisciplinary Journal of Nonlinear Science 18 (2008) 023112.