An Optimal Transport Formulation of the Ensemble Kalman Filter
Abstract
Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (meanfield type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the meanfield control law, a finite controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite algorithm. The analysis is used to prove weak convergence of the empirical distribution as . For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance samplingbased algorithms is described. The analysis helps explain the favorable scaling properties of the controlbased algorithms reported in several numerical studies in recent literature.
I Introduction
The subject of this paper concerns MonteCarlo methods for simulating a nonlinear filter (conditional distribution) in continuoustime settings. The mathematical abstraction of any filtering problem involves two processes: a hidden Markov process and the observation process . The numerical problem is to compute the posterior distribution where is the filtration generated by the observations. A standard solution approach is the particle filter which relies on importance sampling to implement the effect of conditioning [3, 4]. In numerical implementations, this often leads to the particle degeneracy issue (or weight collapse) whereby only a few particles have large weights. To combat this issue, various types of resampling schemes have been proposed in the literature [5, 6].
In the past decade, an alternate class of algorithms has attracted growing attention. These algorithms can be regarded as a controlled interacting particle system where the central idea is to implement the effect of conditioning using feedback control. Mathematically, this involves construction of controlled stochastic process, denoted by . In continuoustime settings, the model for the is a stochastic differential equation (sde):
(1) 
where the [additional terms] are prespecified (these terms may be zero). The control problem is to design meanfield type control law and such that the conditional distribution of (given ) is equal to the posterior distribution of . If this property holds, the filter is said to be exact. In a numerical implementation, the meanfield terms in the control law are approximated empirically by simulating copies of (1). The resulting system is a controlled interacting particle system with a finite number of interacting particles. The particles have uniform importance weights by construction. Therefore, the particle degeneracy issue does not arise. Resampling is no longer necessary and steps such as rules for reproduction, death or birth of particles are altogether avoided.
The focus of this paper is on (i) formal methods for design of control laws ( and ) for (1); (ii) algorithms for empirical approximation of the control laws using particles; and (iii) error analysis of the finite interacting particle models as . The main problem highlighted and addressed in this paper is the issue of uniqueness: one can interpret the controlled system (1) as transporting the initial distribution at time (prior) to the conditional distribution at time t (posterior). Clearly, there are infinitely many maps that transport one distribution into another. This suggests that there are infinitely many choices of control laws that all lead to exact filters. This is not surprising: The exactness condition specifies only the marginal distribution of the stochastic process at times , which is not enough to uniquely identify a stochastic process, e.g., the joint distributions at two time instants are not specified.
Although these issues are relevant more generally, the scope of this paper is limited to the linear Gaussian problem. A motivation comes from the widespread use of the ensemble Kalman filter (EnKF) algorithm in applications. It is noted that the meanfield limit of the EnKF algorithm is exact only in linear Gaussian settings. The issue of nonuniqueness is manifested in the different types of EnKF algorithms reported in literature. Some of these EnKF types are discussed as part of the literature survey (in Sec. IB) and in the main body of the paper (in Sec. IIB).
Ia Contributions of this paper
The following contributions are made in this paper:

Nonuniqueness issue: For the linear Gaussian problem, an error process is introduced to help explain the nonuniqueness issue in the selection of the control law in (1). The error process helps clarify the relationship between the different types of control laws leading to the different types of EnKF algorithms that have appeared over the years in the literature.

Optimal transport FPF: To select a unique control law, an optimization problem is proposed in the form of a timestepping procedure. The optimality concept is motivated by the optimal transportation theory [7, 8]. The solution of the timestepping procedure yields a unique optimal control law. The resulting filter is referred to as the optimal transport FPF. The procedure is suitably adapted to handle the case, important in MonteCarlo implementations with finitely many particles, where the covariance is singular. In this case, the optimal (deterministic) transport maps are replaced by optimal (stochastic) couplings. The general form of the optimal FPF includes stochastic terms which are zero when the covariance is nonsingular.

Error analysis: A detailed error analysis is carried out for the deterministic form of the optimal FPF for the finite but large limit. For the purposes of error analysis, it is assumed that the linear system is controllable and observable, and the initial empirical covariance matrix is nonsingular. The main results are as follows:

Empirical mean and covariance of particles is shown to converge almost surely to exact mean and covariance as even for finite (Prop. 2(i));

Meansquared error is shown to be bounded by where the constant has polynomial dependence on the problem dimension (Prop. 2(ii));

A propagation of chaos analysis is carried out to show that empirical distribution of the particles converges in a weak sense to the exact filter posterior distribution (Cor. 1).


Comparison to importance sampling: For a certain simplified filtering problem, a comparison of the m.s.e. between the importance sampling and controlbased filters is described. The main result is to show that using an important sampling approach, the number of particles must grow exponentially with the dimension . In contrast, with a controlbased approach, scales at most as order in order to maintain the same error (Prop. 4). The conclusions are also verified numerically (Fig. 1).
This paper extends and completes the preliminary results reported in our prior conference papers [1, 2]. The optimal transport formulation of the FPF and the timestepping procedure was originally introduced in [1]. However, its extension to the singular covariance matrix case, in Sec. IIID, is original and has not appeared before. Preliminary error analysis of the deterministic form of optimal FPF appeared in our prior work [2]. The current paper extends the results in [2] in two key aspects: (i) The error bounds for the convergence of the empirical mean and covariance, in Prop. 2(ii), reveal the scaling with the problem dimension (see Remark 3(ii)), whereas previous results did not; (ii) The propagation of chaos analysis is carried out for the vector case (Cor. 1), whereas previous result was only valid for the scalar case. These improvements became possible with a proof approach that is entirely different than the one used in [2]. Finally, the analytical comparison with the importance sampling particle filter, in Sec. V, is new.
IB Literature survey
Two examples of the controlled interacting particle systems are the classical ensemble Kalman filter (EnKF) [9, 10, 11, 12] and the more recently developed feedback particle filter (FPF) [13, 14]. The EnKF algorithm is the workhorse in applications (such as weather prediction) where the state dimension is very high; cf., [12, 15]. The highdimension of the statespace provides a significant computational challenge even in linear Gaussian settings. For such problems, an EnKF implementation may require less computational resources (memory and FLOPS) than a Kalman filter [15, 16]. This is because the particlebased algorithm avoids the need to store and propagate the error covariance matrix (whose size scales as ).
An expository review of the continuoustime filters including the progression from the Kalman filter (1960s) to the ensemble Kalman filter (1990s) to the feedback particle filter (2010s) appears in [17]. In continuoustime settings, the first interacting particle representation of the nonlinear filter appears in the work of Crisan and Xiong [18]. Also in continuoustime settings, Reich and collaborators have derived deterministic forms of the EnKF [19, 12]. In discretetime settings, Daum and collaborators have pioneered the development of closely related particle flow algorithms [20, 21].
The technical approach of this paper has its roots in the optimal transportation theory. These methods have been widely applied for uncertainty propagation and Bayesian inference: The ensemble transform particle filter is based upon computing an optimal coupling by solving a linear optimization problem [22]; Polynomial approximations of the Rosenblatt transport maps for Bayesian inference appears in [23, 24]; Solution of such problems using the Gibbs flow is the subject of [25]. The time stepping procedure of this paper is inspired by the JKO construction in [26]. Its extension to the filtering problem appears in [27, 28, 29, 30].
Closely related to error analysis of this paper is the recent literature on stability and convergence of the EnKF algorithm. For the discretetime EnKF algorithm, these results appear in [31, 32, 33, 34, 35]. The analysis for continuoustime EnKF is more recent. For continuoustime EnKF with perturbed observation, under additional assumptions (stable and fully observable), it has been shown that the empirical distribution of the ensemble converges to the meanfield distribution uniformly for all time with the rate [36]. This result has been extended to the nonlinear setting for the case with Langevin type dynamics with a strongly convex potential and full linear observation [37]. The stability assumption is recently relaxed in [38]. Under certain conditions, convergence and long term stability results appear in [39].
In independent numerical evaluations and comparisons, it has been observed that EnKF and FPF exhibit smaller simulation variance and better scaling properties – with respect to the problem dimension – when compared to the traditional methods [40, 41, 42, 43, 44]. The error analysis (in Sec. IV) together with the analytical bounds on comparison with the importance sampling (in Sec. V) provide the first such rigorous justification for the performance improvement reported in literature. The analysis of this paper is likely to spur wider adoption of the controlbased algorithms for the purposes of sampling and simulation.
IC Paper outline
Sec. II includes the preliminaries along with a discussion of the nonuniqueness issue. Its resolution is provided in Sec. III where the optimal FPF is derived. The error analysis appears in Sec. IV and comparison with importance sampling particle filter is given in Sec. V. The proofs appear in the Appendix.
Notation: For a vector , denotes the Euclidean norm. For a square matrix , denotes the Frobenius norm, is the spectral norm, is the matrixtrace, denotes the nullspace, denotes the range space, and denotes the spectrum. For a symmetric matrix , and denote the maximum and minimum eigenvalues of respectively. The partial order of positive definite matrices is denoted by such that means is positive definite. is a Gaussian probability distribution with mean and covariance .
Ii The NonUniqueness Issue
Iia Preliminaries
The linear Gaussian filtering problem is described by the linear stochastic differential equations (sdes):
(2a)  
(2b) 
where and are the state and observation at time , are mutually independent standard Wiener processes taking values in and , respectively, and , , are matrices of appropriate dimension. The initial condition is assumed to have a Gaussian distribution . The filtering problem is to compute the posterior distribution,
(3) 
where .
KalmanBucy filter: In this linear Gaussian case, the posterior distribution is Gaussian , whose mean and variance evolve according to the KalmanBucy filter [45]:
(4a)  
(4b) 
where is the Kalman gain and .
Feedback particle filter: The stochastic linear FPF [14, Eq. (26)] (and also the squareroot form of the EnKBF [12, Eq (3.3)]) is described by the MckeanVlasov sde:
(5) 
where is the Kalman gain, is a standard Wiener process, , are the meanfield terms, and . We use
(6) 
to denote the conditional distribution of meanfield process .
The FPF control law is exact. The exactness result appears in the following theorem which is a special case of the [14, Thm. 1] that describes the exactness result for the general nonlinear nonGaussian case. A proof is included in Appendix A. The proof is useful for studying the nonuniqueness issue described in Sec. IIB.
Theorem 1
The notation nomenclature is tabulated in Table I.
Variable  Notation  Equation 

State of the hidden process  Eq. (2a)  
State of the particle in finite sys.  Eq. (8), (21)  
State of the meanfield model  Eq. (5), (19)  
Kalman filter mean and covariance  Eq. (4a)(4b)  
Empirical mean and covariance  Eq. (9)  
Meanfield mean and covariance  Eq. (5)(19)  
Conditional distribution of  Eq. (3)  
Conditional distribution of  Eq. (6)  
Empirical distribution of particles  Eq. (10) 
IiB The nonuniqueness issue
In the proof of Thm. 1 (given in Appendix A), it is shown that (i) the conditional mean process evolves according to (4a); and (ii) the conditional variance process evolves according to (4b).
Define an error process for . The equation for is obtained by subtracting the equation for the mean, (38) in Appendix A, from (5):
This is a linear system and therefore, the variance of , easily seen to be given by , evolves according to the Lyapunov equation
which is identical to (4b).
These arguments suggest the following general procedure to construct an exact process: Express as a sum of two terms:
where evolves according to the KalmanBucy equation (4a) and the evolution of is defined by the sde:
where and are independent Brownian motions, and , , and are solutions to the matrix equation
(7) 
By construction, the equation for the variance is given by the Riccati equation (4b).
In general, there are infinitely many solutions for (7). Below, we describe three solutions that lead to three established form of EnKF and linear FPF:
Given a particular solution , one can construct a family of solutions , where is any skewsymmetric matrix. For the linear Gaussian problem, the nonuniqueness issue has been discussed in literature: At least two forms of EnKF, the perturbed observation form [19] and the squareroot form [12], are well known. A homotopy of exact deterministic and stochastic filters is given in [46]. An explanation for the nonuniqueness in terms of the Gauge transformation appears in [47].
IiC Finite implementation
In a numerical implementation, one simulates stochastic processes (particles) , where is the state of the particle at time . The evolution of is obtained upon empirically approximating the meanfield terms. The finite filter for the linear FPF (5) is an interacting particle system:
(8) 
where ; are independent copies of ; for ; and the empirical approximations of the two meanfield terms are as follows:
(9)  
We use the notation
(10) 
to denote the empirical distribution of the particles. Here, denotes the Dirac delta distribution at .
Iii Optimal Transport FPF
The problem is to uniquely identify an exact stochastic process . The solution is based on an optimality concept motivated by the optimal transportation theory [7, 8]. The background on this theory is briefly reviewed next.
Iiia Background on optimal transportation
Let and be two probability measures on with finite second moments. The (Monge) optimal transportation problem with quadratic cost is to minimize
(11) 
over all measurable maps such that
(12) 
If it exists, the minimizer is called the optimal transport map between and . The optimal cost is referred to as Wasserstein distance between and [8].
IiiB The timestepping optimization procedure
To uniquely identify an exact stochastic process , the following time stepping optimization procedure is proposed:

Divide the time interval into equal time steps with the time instants .

Initialize a discrete time random process according to the initial distribution (prior) of ,

For each time step , evolve the process according to
(14) where the map is the optimal transport map between the probability measures and .

Take the limit as to obtain the continuoustime process and the sde:
(15)
The procedure leads to the control laws and that depend upon . Since is unknown, one simply replaces it with – the two are meant to be identical by construction. The resulting sde (15) is referred to as the optimal transport FPF or simply the optimal FPF. A definition is needed to state the main result.
Definition 1
For a given positivedefinite matrix , define as the (unique such) symmetric solution to the matrix equation:
(16) 
Remark 1
The symmetric solution to the matrix equation (16) is explicitly given by:
The solution can also be expressed as:
(17) 
where is the (unique such) skewsymmetric matrix that solves the matrix equation
(18)  
The main result is as follows. Its proof appears in the Appendix B.
Proposition 1
Consider the linear Gaussian filtering problem (2a)(2b). Assume the initial covariance . Then the optimal transport FPF is given by
(19) 
where , , , , and
The filter is exact: That is, the conditional distribution of is Gaussian with and .
Using the form of the solution (17) for , the optimal transport sde (19) is expressed as
(20)  
Compared to the original (linear Gaussian) FPF (5), the optimal transport FPF (20) has two differences:

The stochastic term is replaced with the deterministic term . Given a Gaussian prior, the two terms yield the same posterior. However, in a finite implementation, the difference becomes significant. The stochastic term serves to introduce an additional error of order .

The sde (20) has an extra term involving the skewsymmetric matrix . The extra term does not effect the posterior distribution. This term is viewed as a correction term that serves to make the dynamics symmetric and hence optimal in the optimal transportation sense. It is noted that for the scalar () case, the skewsymmetric term is zero. Therefore, in the scalar case, the update formula in the original FPF (5) is optimal. In the vector case, it is optimal iff .
IiiC Finite implementation in nonsingular covariance case
The finite implementation of the optimal transport sde (19) is given by the following system of equations:
(21)  
for , where ; ; and empirical approximations of mean and variance are defined in (9).
The matrix is not welldefined if is a singular matrix. This is a problem because in a finite implementation, it is possible that even though . In particular, when , the empirical covariance matrix is of rank at most and hence singular. Note that this problem only arises for the optimal and deterministic forms of the FPF. In particular, the stochastic FPF (8) does not suffer from this issue. It can be simulated for any choice of . In Sec. IIID, we extend the optimal transportation formulation to handle also the singular forms of the covariance matrix.
IiiD The singular covariance case
The derivation of the optimal FPF (19) crucially relies on the assumption that which in turn implies that, in the timestepping procedure, for . In the proof of Prop. 1, the assumption is used to derive the optimal transport map . In general, when the covariance of Gaussian random variables or is singular, the optimal transport map may not exist.
In the singular case, a relaxed form of the optimal transportation problem, first introduced by Kantorovich, is used to search for optimal (stochastic) couplings instead of (deterministic) transport maps [8]. The following example helps illuminate the issue:
Example 1
Consider Gaussian random variable and with distributions, and , respectively. Suppose
where is small. If , the optimal transportation map exists, and is given by
If then there is no transport map that satisfies the constraints of the optimal transportation problem.
The Kantorovich relaxation of the optimal transportation problem (11) is the following optimization problem:
(22) 
where is a joint distribution on with marginals fixed to and .
Although a deterministic map does not exist for the problem, a (stochastic) coupling that solves the Kantorovich problem (22) exists and is given by
where is independent of .
In Appendix C, the Kantorovich relaxation is used to motivate an optimization problem whose solution yields the following extension of the optimal FPF:
(23) 
with where is the projection matrix into the kernel of the matrix , and is any symmetric solution of the matrix equation
(24) 
Remark 2
When is singular, the solution to the matrix equation (24) is not unique. It is shown in Appendix C that the solution is of the following general form:
(25)  
where is the pseudo inverse^{1}^{1}1The pseudo inverse of matrix is the unique matrix that satisfies , , is symmetric, and is symmetric [49]. of , is projection matrix onto the range of , and is a skewsymmetric matrix that solves a certain matrix equation (46) and is an arbitrary symmetric matrix.
Using the formula (25), the optimal FPF (23) is expressed as follows:
(26)  
The formula (26) allows one to clearly see the relationship between the deterministic and stochastic forms of the optimal FPF. In particular, when the covariance matrix is nonsingular, , and . This results in the deterministic form of the optimal transport FPF (20). When the covariance matrix is singular, then the effect of the linear term in (20) is now simulated with the two terms in (26). This is conceptually similar to the Example 1, where the deterministic optimal transport map is replaced with a stochastic coupling. The [additional terms] in (26) do not affect the distribution.
IiiE Finite implementation in the singular covariance case
The finite implementation of the optimal transport sde (23) is given by the following system of equations:
(27)  
where ; are independent copies of , , is a symmetric matrix solution to the matrix equation
and and are empirical mean and covariance defined in (9). Note that the stochastic term is zero when , which is true, e.g., when .
Iv Error analysis
This section is concerned with error bounds in the large but finite regime. Given that is large, we restrict ourselves to the nonsingular case. For the purposes of the error analysis, the following assumptions are made:
Assumption (I): The system is detectable and is stabilizable.
Assumption (II): Assume and the initial empirical covariance matrix almost surely.
The main result for the finite deterministic FPF (21) is as follows with the proof given in Appendix D.
Proposition 2
Consider the Kalman filter (4a)(4b) initialized with the prior and the finite deterministic form of the optimal FPF (21) initialized with for . Under Assumption (I) and (II), the following characterizes the convergence and error properties of the empirical mean and covariance obtained from the finite filter relative to the mean and covariance obtained from the Kalman filter:

Meansquared error: For any , as :
(28a) (28b) for all . The constant depends on , and spectral norms , and , where is the solution to the algebraic Riccati equation (see Lemma 2).
Remark 3
Two remarks are in order:

Asymptotically (as ) the empirical mean and variance of the finite filter becomes exact. This is because of the stability of the Kalman filter whereby the filter forgets the initial condition. In fact, for any (not necessarily i.i.d Gaussian) , that satisfy the assumption , the result holds.

(Scaling with dimension) If the parameters of the linear Gaussian filtering problem (2a)(2b) scale with the dimension in a way that the spectral norms , , , and do not change, then the constant in the error bounds (28a)(28b) do not change. The only term that scales with the dimension is . For example, with , . Therefore, the bound on the error typically scales as in problem dimension.
Iva Propagation of chaos
In this section, we study the convergence of the empirical distribution of the particles for the finite system (21) to the exact posterior distribution . Derivation of error estimates involve construction of independent copies of the exact process (19). Consistent with the convention to denote meanfield variables with a bar, the stochastic processes are denoted as where denotes the state of the particle at time . The particle evolves according to the meanfield equation (19) as
(29) 
where is the Kalman gain and the initial condition , the righthand side being the initial condition of the particle in the finite FPF (21). The meanfield process is thus coupled to through the initial condition.
In order to carry out the error analysis, the following event is defined for an arbitrary choice of a fixed matrix :
(30) 
The following proposition characterizes the error between and , when the event is true (the estimate is key to the propagation of chaos analysis). The proof appears in the Appendix E.
Proposition 3
V Comparison to Importance Sampling
For the purposes of the comparison of the optimal FPF with the importance samplingbased particle filter, we restrict to the static filtering example with a fully observed observation model:
(32)  
for , where . The posterior distribution at time , denoted as , is a Gaussian with and .
Let be i.i.d samples of . The importance samplingbased particle filter yields an empirical approximation of the posterior distribution as follows:
(33) 
In contrast, given the initial samples , the FPF approximates the posterior by implementing a feedback control law as follows:
(34) 
where the empirical mean and covariance are approximated as (9).
The m.s.e in estimating the conditional expectation of a given function is defined as follows:
where the subscript is either the PF or the FPF.
For , a numerically computed plot of the levelsets of m.s.e, as a function of and , is depicted in Figure 1(a)(b). The expectation is approximated by averaging over independent simulations. It is observed that, in order to have the same error, the importance samplingbased approach requires the number of samples