Ensemble Kalman filter with the unscented transform

# Ensemble Kalman filter with the unscented transform

X. Luo , I.M. Moroz Mathematical Institute, 24-29 St Giles’, Oxford, UK, OX1 3LB Oxford-Man Institute of Quantitative Finance, Blue Boar Court, 9 Alfred Street, Oxford, UK, OX1 4EH
###### Abstract

A modification scheme to the ensemble Kalman filter (EnKF) is introduced based on the concept of the unscented transform (Julier et al., 2000; Julier and Uhlmann, 2004), which therefore will be called the ensemble unscented Kalman filter (EnUKF) in this work. When the error distribution of the analysis is symmetric (not necessarily Gaussian), it can be shown that, compared to the ordinary EnKF, the EnUKF has more accurate estimations of the ensemble mean and covariance of the background by examining the multidimensional Taylor series expansion term by term. This implies that, the EnUKF may have better performance in state estimation than the ordinary EnKF in the sense that the deviations from the true states are smaller. For verification, some numerical experiments are conducted on a -dimensional system due to Lorenz and Emanuel (Lorenz and Emanuel, 1998). Simulation results support our argument.

###### keywords:
Ensemble Kalman filter, Unscented transform, Ensemble unscented Kalman filter
###### Pacs:
92.60Wc; 02.50-r
Corresponding author.

## 1 Introduction

The Kalman filter (KF) is a recursive data processing algorithm [23]. It optimally estimates the states of linear stochastic systems that are driven by Gaussian noise, and are observed through linear observation operators, which possibly also suffer from additive Gaussian errors. However, if there exists nonlinearity from either the dynamical systems or the observation operators, or, if neither the dynamical noise nor the observational noise follows any Gaussian distribution, then the Kalman filter becomes suboptimal. To tackle the problems of nonlinearity and non-Gaussianity, there are some strategies one may employ. For example, to handle the problem of nonlinearity, one may expand the nonlinear function in a Taylor series locally and keep the expansion terms only up to second order. This leads to the extended Kalman filter (EKF) (e.g., [7]). To deal with the problem of non-Gaussianity, one may specify a Gaussian mixture model (GMM) to approximate the underlying probability density function (pdf), such that the KF algorithm is applicable to the individual distributions of the GMM [27]. More generally, one may adopt the sequential Monte Carlo method (also known as the particle filter, e.g., [32]), which utilizes the empirical pdf obtained from a number of particles to represent the true pdf, wherein the problems of both nonlinearity and non-Gaussianity are taken into account during the pdf approximation.

For practical large-scale problems like weather forecasting, the computational cost is another issue of great concern. In such circumstances, direct application of the KF or EKF scheme is prohibitive because of the computational cost of evolving the full covariance matrices forward. While for the particle filter, because of its slow convergence rate, the required number of the particles for proper approximations may be well above many thousands for even low dimensional nonlinear systems.

For the sake of computational efficiency, the ensemble Kalman filter (EnKF) was proposed [8]. It is essentially a Monte Carlo implementation of the Kalman filter. At the beginning of each assimilation cycle, it is assumed that one has an ensemble of the system states, called the background ensemble, which can usually be obtained from the previous assimilation cycle. Then, given additional information from the observations, one applies the KF scheme to update each individual member of the background ensemble. To do this, the mean and error covariance of the background are approximated by the sample mean and covariance of the ensemble. After the updates, one will obtain a new set of the system states, called the analysis ensemble in this work, which will then be used to estimate the true mean and covariance of the underlying system state. By propagating the ensemble members of the analysis forward through the system model, one obtains a new ensemble of the background for the next assimilation cycle. Therefore, by using the ensembles to approximate the true statistics (e.g., the mean and covariance) of the system states, the computational cost can be significantly reduced. Moreover, in the EnKF there is no need to linearize the system model as in the EKF. Instead, the nonlinear problem becomes the one of how to approximate the statistics of the pdf of a Gaussian distribution which is transformed by a nonlinear function. This point of view leads to the idea of the unscented transform (UT) [16, 17], as will be introduced later.

Depending on whether to perturb the observations or not, the EnKF can be classified into two types: stochastic and deterministic [19, 28]. The stochastic EnKF uses the observations and the corresponding covariance matrix to produce an ensemble of the disturbed observations, which is then used to update the background ensemble. For examples, see [5, 8, 9, 14]. In contrast, the deterministic EnKF, often known as the ensemble square root filter (EnSRF hereafter), does not perturb the observations. Given a background ensemble, the EnSRF uses the observations to update the sample mean of the background, while the analysis ensemble is produced based on the sample mean plus the perturbations. For examples, see [1, 4, 34]; also see the review of [28]. Apart from the aforementioned EnKFs, there are also some other variants, e.g., [3, 26, 33, 35].

In this paper we will introduce a framework of the EnKF incorporating the concept of the unscented transform [16, 17], which will therefore be called the ensemble unscented KF (EnUKF for short). The basic idea of the EnUKF is that, at the end of the filtering step of an EnKF, one adopts the unscented transform to generate a set of carefully chosen system states, called the sigma points. The set of the sigma points is then treated as the analysis ensemble and propagated forward through the system model. At the next assimilation cycle, the mean and covariance of the background are estimated based on the propagations. In the Appendix, we show that, under the assumption of Gaussian errors, the accuracies of thus estimated mean and covariance are up to third order term in the Taylor series expansion (if available). Moreover, there are additional adjustable parameters in the framework which may be tuned to reduce the approximation error further. In contrast, for the ordinary EnKFs, we will show in the Appendix that the accuracies are normally up to second order. Therefore, by incorporating the unscented transform, one may improve the performance of the EnKF.

This paper is organized as follows. In section 2 we will review the general framework of the EnKF. Moreover we will also introduce the idea of the unscented transform. The accuracy analysis of the unscented transform is given in the Appendix. For comparison, we will also provide the accuracy analysis of the ordinary EnSRF. In section 3, based on the concept of the unscented transform, we will propose a modification scheme to the EnKF. In section 4, we will adopt the 40-dimensional model in [22] as the testbed to examine the performance of the EnUKF and compare it to the ordinary EnSRF. Finally we will conclude the work in section 5.

## 2 Background

### 2.1 The framework of the EnKF

The EnKF is a Monte Carlo implementation of the Kalman filter, which inherits the framework of the KF by nature. For illustration, we consider the following scenario. Suppose that we have an -dimensional discrete dynamical system

 xk+1=Mk,k+1(xk)+uk, (1)

where denotes the -dimensional system state at the instant , is the transition operator mapping to , and is the dynamical noise, which is assumed to be independent of the system state and observation noise (see below), with zero mean and covariance .

The above dynamical system is then measured by an observer such that

 yk=Hk(xk)+vk, (2)

where is the -dimensional observation at instant , and is the observation noise, which is independent of the system state and dynamical noise, with zero mean and covariance . For convenience of discussion, we also assume that there is an -member ensemble of the analysis available at the end of the -th assimilation cycle. For description, we split the framework of the EnKF into the propagation (or prediction) and filtering steps.

#### 2.1.1 The propagation step

We define the following set

 Xbk={xbk,i:xbk,i=Mk,k+1(xak−1,i),i=1,2,⋯,n}

as the predicted background ensemble at instant , which is the set of propagations of the analysis ensemble at the previous assimilation cycle. The sample mean and covariance can be evaluated according to the following unbiased estimators 111In some works, e.g., [33], the authors may choose other estimators..

 ^xbk=1nn∑i=1xbk,i, (3a) ^Pbk=1n−1n∑i=1(xbk,i−^xbk)(xbk,i−^xbk)T+Qk. (3b)

In practice, the approximation covariance need not be calculated. Instead, it is customary to compute

 ^Pkxh=1n−1n∑i=1(xbk,i−^xbk)(Hk(xbk,i)−Hk(^xbk))T,^Pkhh=1n−1n∑i=1(Hk(xbk,i)−Hk(^xbk))(Hk(xbk,i)−Hk(^xbk))T, (4)

which avoids the problem of linearizing when it is nonlinear, although under such circumstance the ordinary KF algorithm becomes sub-optimal. The Kalman gain is obtained through

 Kk=^Pkxh(^Pkhh+Rk)−1, (5)

where is the covariance of the observational error at step .

#### 2.1.2 The filtering step

With additional information from the observations, one can update the background ensemble according to a certain analysis scheme. For example, for a stochastic EnKF, the analysis ensemble can be generated according to

 xak,i=xbk,i+Kk(yk,i−Hk(xbk,i)),for i=1,2,⋯,n, (6)

where is an observation sample generated by the normal distribution with mean and covariance . Correspondingly, the sample mean and covariance of the analysis ensemble can be calculated according to

 ^xak=1nn∑i=1xak,i, (7a) ^Pak=1n−1n∑i=1(xak,i−^xak)(xak,i−^xak)T. (7b)

For an EnSRF, for instance, the ensemble transform Kalman filter (ETKF) [4, 33], the analysis ensemble is generated by adding perturbations to the ensemble mean. On one hand, the ensemble mean is obtained as follows

 ^xak=^xbk+Kk(yk−H(^xbk)). (8)

On the other hand, let be an square root matrix of the background error covariance such that 222In general can be obtained through some numerical algorithm, e.g., Cholesky decomposition., then an square root matrix of the analysis error covariance can be updated from according to

 Sak=SbkTk, (9)

where is an transformation matrix derived in [4]. Thus the analysis error covariance is computed by

 ^Pak=Sak(Sak)T. (10)

Moreover, given and , the analysis ensemble is generated according to

 xak,i=^xak+√n−1(Sak)i,i=1,2,⋯,n, (11)

where denotes the -th column of the square root matrix . After the analysis ensemble is generated, one propagates it forward to obtain the background ensemble at the next instant and starts a new assimilation cycle.

### 2.2 The unscented transform

Given an -dimensional Gaussian random variable with mean and covariance , one problem of interest is how to estimate the mean and covariance of the transformed variable , where is a nonlinear vector function.

In the ordinary EnKF, given a set of samples of the random variable , the mean and covariance of the transformed variable are estimated according to Eq. (3), i.e.,

 ^y=1nn∑i=1f(xi), (12a) ^Pyy=1n−1n∑i=1(f(xi)−^y)(f(xi)−^y)T. (12b)

The unscented transform [16, 17] is a different method for the estimation problem. Suppose that and of the random vector are unknown, but the sample mean and covariance can be estimated from the set . Then a set of state vectors , called the sigma points [16, 17], can be generated according to

 X0=^x,Xi=^x+(√(L+λ)^Pxx)i,i=1,2,⋯,L,Xi=^x−(√(L+λ)^Pxx)i−L,i=L+1,L+2,⋯,2L, (13)

where denotes the -th column of the square root matrix , and is an adjustable scaling parameter [16, 17].

Moreover, in the unscented transform, it also allocates a set of weights

 W0=λL+λ,Wi=12(L+λ),i=1,2,⋯,2L, (14)

to the sigma points. In this way, the weighted mean and covariance of the discrete distribution ,

 (15)

are exactly the same as and . If follows a Gaussian distribution, then can be chosen as in order to let the first four weighted moments of the discrete distribution match those of the distribution of [16, 17].

Because of the symmetry in the sigma points, the rank of the matrix is . To avoid rank deficiency in the sample covariance matrix, it is suggested that the number of the sigma points be larger than twice the dimension of the vector [16, 17], or equivalently, .

After the transformation, the weighted sample mean and covariance of the transformed sigma points are calculated according to

 ^y=2L∑i=0Wif(Xi), (16a) ^Pyy=2L∑i=0Wi(f(Xi)−^y)(f(Xi)−^y)T+β(f(X0)−^y)(f(X0)−^y)T, (16b)

where the second term on the rhs of Eq. (16b) is introduced to reduce the approximation error. In the case that follows a Gaussian distribution, the choice of is shown to be optimal [17].

To take into account the effect of the model error in a more general situation, wherein the system model is described by (such that the model error is possibly not additive, but is assumed to follow a Gaussian process with mean zero and covariance ), it is customary to adopt the joint state and the system model is changed to . In this case, the sigma points can be generated according to

 Z0=[^xT,0T]TZi=Z0+(√(L+λ)^Pzz)i,i=1,2,⋯,L,Zi=Z0−(√(L+λ)^Pzz)i−L,i=L+1,L+2,⋯,2L, (17)

where means the zero vector, and is the sample covariance matrix of such that

 ^Pzz=(^Pxx00Q) (18)

We will leave the awkward analysis of the estimation accuracies of both the ordinary EnKF and the unscented transform to the Appendix, where the general form described in terms of joint state is considered. We will show that, in the estimation scheme of the ordinary EnKF, the random samples will generally introduce spurious modes in the transformed distribution even if the set of sample points has the correct mean and covariance [16, 17], while for the unscented transform, by carefully choosing the samples (i.e. the sigma points), the effect of sample error can be reduced. This fact leads to our argument that incorporating the unscented transform may benefit the performance of the EnKF. In the next two sections we will firstly introduce the modification scheme which incorporates the unscented transform for large-scale problems, and then proceed to conduct some numerical experiments to examine its performance.

## 3 The ensemble unscented Kalman filter

For large-scale problems, it is not practical to fulfil the requirement that the number of the sigma points should be larger than twice the degrees-of-freedom of the system model [12]. Therefore we will introduce some modifications to make the unscented transform applicable for those problems. We will call the EnKF equipped with the modifications the ensemble unscented Kalman filter (EnUKF). For consistency, we again take Eqs. (1) and (2) as the -dimensional system model and the -dimensional observer respectively. Moreover, we also assume that there is a set of sigma points available at the -th step, which is associated with the weights ’s given by Eq. (14). If there is only an “ordinary” ensemble of the analysis available, one may first compute the sample covariance, and then conduct the truncated singular value decomposition (TSVD) [13] to construct a set of sigma points, as to be explained later.

With the zero mean of the dynamical noise, at the -th cycle, there exists a set of the predictions of the propagated sigma points , which is associated with a set of weights obtained from the previous cycle (to be discussed later). The weighted sample mean and covariance are given by

 ^xbk= 2lk−1∑i=0Wk−1,iXbk,i, (19a) ^Pbk= 2lk−1∑i=0Wk−1,i(Xbk,i−^xbk)(Xbk,i−^xbk)T+ (19b) β(Xbk,0−^xbk)(Xbk,0−^xbk)T+Qk.

Moreover, the above evaluation scheme is also applied to the projection of the background ensemble such that

 (20)

For numerical reason, it is customary to re-write the above error covariances in terms of some square root matrices. To this end, we introduce two square roots, and , which are defined by

 Sxk= [√Wβk−1,0(Xbk,0−^xbk),√Wk−1,1(Xbk,1−^xbk), (21a) ⋯,√Wk−1,2lk−1(Xbk,2lk−1−^xbk)], Shk= (21b) ⋯,√Wk−1,2lk−1(Hk(Xbk,2lk−1)−^yk)],

where . Then the covariances can be re-written as

 ^Pbk=Sxk(Sxk)T+Qk, (22a) ^Pkxh=Sxk(Shk)T, (22b) ^Pkhh=Shk(Shk)T, (22c)

Again, the Kalman gain follows Eq. (5), i.e.,

 Kk=^Pkxh(^Pkhh+Rk)−1. (5)

With the above information, the mean and covariance of the analysis can be computed according to

 ^xak=^xbk+Kk(yk−Hk(^xbk)), (23a) ^Pak=^Pbk−Kk(^Pkxh)T. (23b)

Apart from obtaining the updated sample mean and covariance, we also aim to generate a set of sigma points as the analysis ensemble, which will then be propagated to the next assimilation cycle. For this purpose, one may consider using an existing EnKF scheme, for example, the ETKF. However, in order to avoid doubling the ensemble size at each assimilation cycle, some sigma points have to be discarded. To do this, the sample mean can be preserved by maintaining the symmetry about among the remaining sigma points, while the corresponding sample covariance, denoted by , can only be an approximation to . This may appear to be a complicated problem for the existing EnKFs to design a selection criterion, because the perturbations produced by them have no indications of the relative importance for covariance approximation.

To tackle the above problem, the truncated singular value decomposition (TSVD) [13] is adopted in this work, which is similar to the idea of using singular vectors to produce initial perturbations for ensemble forecasting ([6], [18, ch 6], [30], [31], and the references therein). Specifically, to produce the sigma points, a singular value decomposition (SVD) is firstly conducted on . Suppose that can be expressed as

 ^Pak=EkDK(Ek)T, (24)

where is a diagonal matrix consisting of the eigenvalues ’s of , which are sorted in descending order, i.e., for , and is the matrix consisting of the corresponding eigenvectors ’s. Next, a set of perturbations, generated in terms of the first values of , are added to the sample mean to form sigma points. Another symmetric sigma points can be produced by subtracting the perturbations from the sample mean. Overall, in analogy to Eq. (13), the above procedure can be summarized as follows

 Xak,0=^xak,Xak,i=^xak+(lk+λ)1/2σk,iek,i,i=1,⋯,lk,Xak,i=^xak−(lk+λ)1/2σk,i−lkek,i−lk,i=lk+1,⋯,2lk, (25)

where is the adjustable scaling parameter. Using the generated sigma points as the analysis ensemble, the EnUKF is clearly an unbiased ensemble filter [20].

It is worthy to note that, Eq. (25) does not require the full spectra of the eigenvalues and eigenvectors. Therefore, to reduce the computational cost in large scale problems, some fast SVD algorithms, e.g., the Lanczos or block Lanczos algorithm [10, ch 9], can be adopted to compute the first pairs of eigenvalues and eigenvectors (for example, see [29]).

For convenience, we will hereafter call the truncation number. The choice of is important to the performance of the EnUKF, because it not only controls the number of sigma points to be produced, but also determines the quality of matrix approximation. Indeed, via SVD the matrices and can be expressed as

 ^Pak=m∑i=1σ2k,iek,i(ek,i)T,~Pak=lk∑i=1σ2k,iek,i(ek,i)T, (26)

respectively. It is clear that, if is too small, some important structures of , in terms of for , will be lost. However, as the computational cost is also a concern, it is not desirable for to get too large. Moreover, in many situations, if is large enough, may be already very small compared to the leading eigenvalues. Thus the improvement obtained by increasing becomes negligible. In this sense, one may choose a modest value of to achieve a tradeoff between accuracy and efficiency.

In our implementation, we let be an integer such that

 σ2k,i>trace(^Pak)/hk,i=1,⋯,lk,σ2k,i≤trace(^Pak)/hk,i>lk+1, (27)

where is the threshold at the -th cycle (we will discuss how to choose in section 4.1). This is equivalent to saying that we construct the sigma points based on the eigenvectors such that their corresponding eigenvalues are larger than a specified tolerance. Moreover, to prevent getting too large or too small, we also specify a lower bound and an upper bound and adjust the threshold so that .

Under the assumption of Gaussian error distribution, it can be verified that the perturbations of the sigma points, in terms of for , are equally likely in the sense that their values of the probability density function (PDF)

 p(δx)=(2π)m/2(det^Pak)−1/2exp{−12(δx)T(^Pak)−1(δx)} (28)

are the same (also see the discussions in [33]), where means the determinant of a matrix. Therefore it is natural to assign an identical weight to all the perturbations. Consequently, in the spirit of Eq. (14), a set of weights can be constructed as follows

 Wk,0=λlk+λ,Wk,i=12(lk+λ),i=1,⋯,2lk. (29)

Finally, all the sigma points in Eq. (25), which are associated with a set of weights given in Eq. (29), are propagated forward to the next assimilation cycle.

We adopt the time averaged relative rms error (relative rmse for short) to measure the performance of the EnUKF, which is defined as

 er=1kmaxkmax∑k=1∥^xak−xtrk∥2/∥xtrk∥2, (30)

where is the maximum assimilation cycle, denotes the truth (the state of a control run) at the -th cycle, and means the norm.

Moreover, we also use the time averaged rms ratio to examine the similarity of the truth to the sigma points, which also qualitatively reflects the performance in estimating the error covariance, e.g., overestimation or underestimation (cf. [1, 34] and the references therein). As an estimation, the time averaged rms ratio, denoted by , is computed by [1, 34]

 R=1kmaxkmax∑k=1(2lk+1)∥^xak−xtrk∥2/2lk∑i=0∥Xak,i−xtrk∥2, (31)

while the expectation of the rms ratio is [1, 34]

 Re=√(leff+1)/(2leff+1),

where is the “effective” truncation number over the whole assimilation window. Hence, if the truth is statistically indistinguishable from the sigma points, the values of and shall be very close. Note that for any large , so for simplicity we let equal the average of the truncation number , i.e., . means that the covariance of the sigma points underestimates the error of the state estimation, while implies the opposite, i.e., overestimation of the error of of the state estimation [24, 34].

## 4 Numerical experiments with a 40-dimensional system

This section is dedicated to examining the performance of the EnUKF through the numerical simulations, and studying the effects of the parameters on the performance of the EnUKF. To this end, we choose the -dimensional system model due to Lorenz and Emanuel [21, 22] (LE98 model hereafter) as the testbed. The LE98 model is a simplified system proposed to model atmospheric dynamics, which “shares certain properties with many atmospheric models” [22]. We consider the perfect model scenario, wherein the governing equations are described as follows

 dxidt=(xi+1−xi−2)xi−1−xi+F,i=1,⋯,m. (32)

The quadratic terms simulate the advection, the linear term represents the internal dissipation, while the constant acts as the external forcing ([21]). The variables ’s are defined cyclically such that , , and .

We choose the observer to be a time-invariant identity operator. Specifically, given a system state at the -th assimilation cycle, the observations are obtained according to

 yk=Hk(xk)+vk=xk+vk, (33)

where follows the -dimensional Gaussian distribution with the covariance matrix being the identity matrix .

Note that in Eq. (30) the measure can also be interpreted as the time-averaged noise level of the trajectory (with respect to the truth). From this point of view, we can define the divergence of a filter in a restrictive sense: Suppose that the relative rmse of the observations is , which, with the identity observation operator , is defined as . If , then we say the filter is divergent because in such circumstances, the trajectory obtained by the filter, on average, is more noisy than the observations, which implies that it might not make any sense to use the filter for assimilation.

In our experiments, we set and and integrate the system through the fourth-order Runge-Kutta method. We choose the length of the integration window to be dimensionless units, and the integration time step to be units (corresponding to about a 6-h interval in reality, see [22]), thus there are assimilation cycles overall.

For the LE98 model, in the numerical experiments (not reported there) we found that, the EnSRF, implemented following either work of [1, 4, 34], outperforms the stochastic EnKF, which is consistent with the result reported in [34], while the performances of the EnSRFs are very close. Therefore, here we choose to compare the EnUKF with the EnSRF only. Moreover, for an ordinary EnSRF, we will introduce the spherical simplex centering scheme to its analysis ensembles. The reason to do this is two-fold. Firstly, with the spherical simplex centering scheme, the produced analysis perturbations are equally likely in probability under the assumption of Gaussian error distribution [33]. Secondly, with the centering scheme, an EnSRF can be shown to be an unbiased ensemble filter, which avoids an error that systematically underestimates the analysis covariance (see [20] for the details). In this work, we single out the ETKF [4] for our experiments, and equip it with the spherical simplex centering scheme following the work [33], which will be further discussed below.

### 4.1 Issues in implementing the EnUKF and the ETKF

When implementing the EnUKF, an issue worth of special attention is the positive semi-definiteness of the covariance matrices. Normally, we require so that in Eq. (29), the weights ’s are positive for . Nevertheless, the weight can be negative if . If it is so, when computing the background covariances according to Eqs. (19b) and (20), the positive semi-definiteness may not be guaranteed. However, one may note that, in Eqs. (19b) and (20), the effective weight of is actually (). So in order to guarantee the positive semi-definiteness, we shall choose the parameters and properly to satisfy that and . Given , it means that . Since is bounded within , by letting , one can guarantee the positive semi-definiteness.

The threshold in Eq. (27) is chosen in the following way. At the beginning we specify a threshold . If is a proper value such that satisfies , then we keep and at the next cycle we start with . If is too small such that , then we replace by . We continue the replacement until falls into the specified range, or the number of the replacement operations is up to (in this case we simply put , regardless of what is). Similarly, if is too large such that , then we replace by , we continue the replacement until falls in the specified range, or the number of the operations is up to (in this case we simply put ). After the adjustment, at the next cycle we start with and adjust it (if necessary) to let fall into the specified range, and so on. In this way, one can obtain the threshold at each cycle.

To apply the spherical simplex centering scheme to the ETKF, we follow the algorithm proposed in [17] to construct a centering matrix , where follows Eq. (C15) in [33]. Compared to the alternative centering matrix provided in [33], we favor the one described in Eq. (C15) because it is time-invariant. Moreover, its construction is independent of the concrete system in assimilation, and does not involve the operation of matrix inversion and the observation operator.

In our experiments, we process the observations simultaneously in both the EnUKF and the ETKF. In order to improve the performances of the filters, we also consider two additional techniques. One is the method of covariance inflation, which is based on the observation that the covariance of the analysis error will be systematically underestimated in the EnKF [34]. Therefore, it can be beneficial to increase either the background error covariance, or the analysis error covariance [2, 25, 34]. In this work, we follow the method used in [2, 34] and choose to multiply the perturbations to the sample mean of the analysis by a constant , which is equivalent to increasing the analysis error covariance by a factor . Note that, if one chooses to process the observations in a serial way (e.g. [34]) with a covariance factor , then after processing an -dimensional observation, the corresponding covariance will be increased by a factor of . The relationship between the inflation factors and is thus given by , therefore one will find that the inflation factor adopted in this work is much larger than those used in, for example, [34].

The other technique is covariance filter [11, 15], which introduces the Schur-product to a covariance matrix in order to reduce the effect of sample errors. We say a length scale of covariance filter is optimal within a certain range if it minimizes the relative rmse among the values in test. As an example, in Fig. 1 we plot the relative rms errors of the EnUKF (upper panel) and the ETKF (lower panel) vs the length scale. Both filters start with the same initial condition, and takes no covariance inflation (i.e., ). The length scale is varied from to , with an even increment of at each step. The other settings of the filters are as follows: For the EnUKF (upper panel), the initial ensemble size is . , , the lower bound , the upper bound , and the threshold ; For the ETKF, the ensemble size is . From Fig. 1, it is clear that the optimal length scale of the EnUKF is , while the optimal length scale of the ETKF is . For simplicity, we choose for both filters in the subsequent simulations.

### 4.2 Comparison between the EnUKF and the ETKF

For comparison, we randomly select an initial condition , and use it to start a control run. The observations are obtained by adding Gaussian white noise to the states of the control run at each cycle, in accordance with Eq. (33). In subsequent simulations, both the EnUKF and the ETKF will start with the same initial condition , and use the same observations for assimilation. To initialize the filters, at the first assimilation cycle we randomly generate a background ensemble . Given and , the ETKF is already able to start running recursively. For the EnUKF, however, at the first cycle there is no propagated sigma points from the previous cycle. So, similarly to the ETKF, one may use the background ensemble to compute the sample mean and covariance of the analysis, and then generate the sigma points accordingly. After propagating the sigma points forward, the EnUKF can start running recursively from the second cycle.

For the EnUKF, we let the parameters , , the threshold , the lower bound , the upper bound , the length scale of covariance filter , and the covariance inflation factor vary from to , with an even increment of . We consider the scenarios with different ensemble sizes at the first assimilation cycle in order to explore the effect of initial ensemble size on the performance of the EnUKF. The corresponding relative rms errors and ratios, as functions of the covariance inflation factor, are plotted in Figs. 2(a) and 2(b) respectively

From the above two figures, it can be seen that different initial ensemble sizes leads to similar behaviors of both the relative rmse and rms ratio. Interestingly, a larger initial ensemble size does not necessarily guarantee a smaller rmse error. This can be observed either from Fig. 2(a) by fixing the covariance inflation factor at some point, say , or from Table 1 by comparing the minimum relative rms errors.

Fig. 2(a) shows that, as the covariance inflation factor increases from , the relative rmse of the EnUKF tends to decline. However, if gets too large, say , then further increments in will instead boost the relative rms errors. An examination on the rms ratio also reveals the same trend, although, as indicated in Fig. 2(b), the turning points, now at for and for , are larger than those of the relative rms errors. To make the sigma point indistinguishable from the truth (i.e., rms ratio ), one needs the inflation factor . However, modestly larger inflation factors, say, , can benefit the performance of the EnUKF in terms of the relative rmse, although they also cause the over-estimations of the error covariances.

For the ETKF, we let the length scale of covariance filter and the covariance inflation factor be the same as those in the EnUKF. Suppose that in a run of the EnUKF we have the average truncation number . Then for comparison, we consider the ETKF with an ensemble size , where means the nearest integer that is larger than, or equal to, the real number . In our experiments, the EnUKF with different initial ensemble sizes leads to the same value , so it is not surprising to find in Figs. 3(a) and 3(b) that the relative rms errors and ratios of the ETKF, which correspond to the EnUKF starting with different initial ensemble sizes, actually coincide.

From Fig. 3(a), one can see that, starting from , as the covariance inflation factor increases, the relative rmse of the ETKF tends to decrease. However, unlike the situation in the EnUKF, in the test range, as gets larger, say , the corresponding relative rmse enters a plateau region. The rms ratio indicates a similar behaviour. As increases, the change of the rms ratio becomes smaller, or in other words, the curve appears more and more flat. In order to make the ensemble in the ETKF indistinguishable from the truth, one needs the inflation factor . Like the EnUKF, modest overestimation of the error covariance (i.e., ) can also benefit the performance of the ETKF in terms of the relative error.

We use the minimum relative rms errors of the EnUKF and the ETKF to compare their performances. To this end, in Table 1 we list the minimum relative rms errors of the EnUKF with different initial ensemble sizes, and the minimum relative rms errors of the ETKF with the ensemble size about twice the average truncation number plus 1333Note that different initial ensemble sizes in the EnUKF lead to the same ensemble size in the ETKF. Therefore, in Table 1, the ETKF has the same minimum relative rmse in different rows.. Moreover, as a reference, we also show the average noise level (relative rmse) in the observations.

From Table 1, one can see that the average noise level in the trajectory consisting of the observations is roughly 0.226 (22.6%), while the minimum relative rmse (i.e. noise level) of the assimilated trajectory through the ETKF scheme is about 0.207 (20.7%), a reduction of 1.9% noise level compared to the observations. Similarly, the minimum relative rmse of the assimilated trajectory through the EnUKF scheme is roughly less than 0.175 (17.5%), a reduction of more than 5% noise level compared to the observations, and more than 3% compared to the ETKF scheme. In this sense, both the ETKF and the EnUKF do not diverge as their minimum relative rms errors are less than the average noise level in the observations, yet the EnUKF exhibits a better performance in state estimation than the ETKF.

### 4.3 Effects of parameters on the performance of the EnUKF

There is a set of adjustable parameters in the EnUKF, e.g., threshold , lower bound and upper bound , in Eqs. (25) and (29), and in Eqs. (19b) and (20). In this section we study the effects of these parameters on the performance of the EnUKF. Note that in the previous section, we have already examined the effect of the inflation factor on the performance of the EnUKF. Thus in the subsequent experiments, we will just fix the inflation factor at a particular value (say, zero, but other choice will also do) for the sake of simplicity.

#### 4.3.1 Effects of the threshold h1 and the bounds ll, lu

The parameters , and determine the truncation numbers ’s. To examine their effects on the performance of the EnUKF, we fix the covariance inflation factor , the length scale , parameters and . We specify the upper bound in the experiments, but vary the lower bound from to . This choice is used to represent the typical situation in data assimilation, wherein the ensemble member is often much lower than the dimension of the dynamical system. We also vary the threshold , in the logarithmic scale , from to , with an even increment of each time. This range represents the moderate values of so as to make the truncation numbers neither too large nor too small. In all the experiments, we initialize the EnUKF with the same initial conditions and background ensemble, with the ensemble size at the first assimilation cycle.

In Fig. 4 we show the simulation results. Clearly, the larger the threshold and the bound are, the larger the truncation number tends to be, which, however, does not mean the better performance in terms of the relative rmse. Indeed, in Fig. 4, there exists the same optimal threshold for lower bounds 444 means at every cycle, while the threshold does not affect the choice of ., while the thresholds larger than this value result in larger relative rms errors. For the lower bound , its relative rms errors are smaller than, or at least equal to those of the bounds in most cases. However, at , the relative rmse given is worse than the other cases. To explain these phenomena, we conjecture that, a too small truncation number is not likely to achieve a performance as good as a modest value because it means poor quality of covariance approximation. In contrast, a too large truncation number also does not necessarily achieve a better performance than a modest value because, at some local points of the attractor, a too large truncation number may introduce some spurious structures–from the null space of the SVD– into the sigma points, which are then treated as equally likely as the other sigma points, and propagated forward to the next cycle. The effect of the spurious structures can be accumulated and eventually deteriorates the overall performance.

#### 4.3.2 Effects of the parameters λ and β

We proceed to examine the effects of the parameters and . In the experiments, we set the covariance inflation factor , the length scale , lower bound , upper bound , threshold , and initial ensemble size . We consider four scenarios with respectively. For each value of , we compute values of . To guarantee the positive semi-definiteness of the sample covariances, we start from , with an even increment each time. In particular, when and , the effective weight of the ensemble mean equals zero for any . Therefore, in this case, the unscented transform is actually equivalent to the analysis scheme of positive-negative pairs (PNP) in the literature (cf [33] and the references therein).

We plot the simulation results in Fig. 5. As can be seen, when increases from to , the minimum relative rmse for a given value of intends to decrease. This may be interpreted as follows: In Eqs. (19b) and (20), the second terms on the rhs also act like a covariance inflation technique. Therefore, similar to the covariance inflation factor , a larger value of tends to result in a smaller minimum relative rmse.

However, for each fixed , there is no clear trend of the optimal value of the parameter . A larger value of does not imply a smaller relative rmse, or vice verse. Particularly, from Fig. 5(a) it can be seen that, by choosing suitable values for the parameters and , the EnUKF can outperform the EnKF equipped with the analysis scheme of positive-negative pairs.

As an explanation of the above phenomenon, one may note that, with the other parameters fixed, is the parameter that determines the relative weights between the sample mean and the other sigma points (cf. Eqs. (25) and (29)). If the underlying system state is linear, then in principle one shall be able to compute the optimal relative weights between the sample mean and the other sigma points under the Gaussianity assumption, and thus determine the optimal value of . Nevertheless, the existence of nonlinearity may make the problem intractable. For nonlinear systems, the optimal relative weights (hence the parameter ) may vary from cycle to cycle. However, to search for the optimal parameter at each assimilation cycle will be computationally expensive, thus in our experiments, we chose to fix the parameter in the same assimilation window555Here by “assimilation window” we mean the time window from the first assimilation cycle to the maximum.. This choice cannot reflect the variation of the optimal values of at different assimilation cycles, therefore it would not be surprising to see that there is a lack of clear trend of the optimal value of in Fig. (5(a)).

## 5 Conclusions

A new ensemble Kalman filter scheme, called the EnUKF, was introduced in this work. The ensemble generation scheme of the EnUKF is similar to the idea of positive-negative pairs (PNP) ([33] and the references therein), but it differs from the PNP scheme in that, apart from generating symmetric positive-negative pairs, the EnUKF also propagates the ensemble mean forward. Like the EnKF equipped with the PNP scheme, in the EnUKF all symmetric perturbations are associated with an identical weight. Nevertheless, a different weight can be assigned to the ensemble mean. In this sense, the EnUKF can be deemed as a hybrid of central forecast (i.e., the forecast made by propagating the ensemble mean solely) and ensemble forecast (i.e., the forecast made by propagating the ensemble forward), while the PNP scheme is a special case of the unscented transform with the weight of the ensemble mean being zero.

From the analytic results in the Appendix, it can be seen that, in estimation of the sample mean and the covariance of a transformed random variable, the accuracies of the EnUKF will be at least up to second order, no matter whether the original random variable follows a Gaussian distribution or not. If the original random variable follows a symmetric distribution (not necessarily Gaussian), then the accuracies of the estimations increase to third order. Moreover, additional parameters, such as and , are available in the unscented transform to improve the estimation accuracies by choosing proper values.

By comparing the Taylor series of the transformation function term by term in the Appendix, we also show that the EnUKF has better accuracies than the ordinary EnKF. For numerical verification, using the LE98 model, we compared the performances of the EnUKF and the ETKF in terms of the relative rms errors. Experiment results confirmed that incorporating the unscented transform into an EnKF can benefit its performance.

## Acknowledgments

The authors would like to thank Dr Sarah L. Dance and two anonymous reviewers for their very constructive comments and suggestions.

## Appendix: Accuracies of the sample mean and covariance of the EnKF and the unscented transform

In this Appendix we analyze the accuracies of the ordinary EnKF and the EnUKF in estimating the mean and covariance of a random variable transformed by a nonlinear function. In analysis, we assume that the nonlinear function can be expanded in a Taylor series, which converges to the true value of the transformation [16, 17].

### The actual mean and covariance of the transformed variable in terms of Taylor series

Given a vector and a Gaussian perturbation with zero mean and covariance , let us first expand the transform in a Taylor series around the point , then the mean is given by

 ¯y=E(f(¯z+δz))=f(¯z)+(∇TPzz∇2!)f+E(D4δzf4!+⋯), (34)

where the operator

 Dδz≡δzT∇. (35)

Similarly, the covariance matrix is given by

 Pyy=E((y−¯y)(y−¯y)T)=JPzzJT+E⎡⎢ ⎢ ⎢⎣Dδzf(D3δzf)T3!+D2δzf(D2δzf)T2!×2!+D3δzf(Dδzf)T3!]−[(∇TPzz∇2!)f][(∇TPzz∇2!)f]T+⋯, (36)

where is the Jacobian matrix of at .

### Accuracies of the EnKF

In the EnKF, given an ensemble , the sample mean, in terms of Taylor series, is given by

 ^y=f(¯z)+∑ni=1D2δzifn×2!+∑ni=1D3δzifn×3!+∑ni=1D4δzifn×4!+⋯, (37)

where with being the sample mean of .

In the rhs of Eq. (37),

 ∑ni=1D2δzifn×2!=12!∇T(1nn∑i=1δziδzTi)∇f,

which is a biased estimation of the second term on the rhs of Eq. (34). Moreover, spurious modes may rise in higher order terms of Eq. (37), for example, the third order term

 1nn∑i=1D3δzif=∇T(1nn∑i=1δziδzTi∇δzTi)∇f (38)

in general will not vanish.

Similarly, we have the sample covariance

 ^Pyy=JPzzJT+1(n−1)×2!n∑i=1[Dδzif(D2δzif)T]