A Hybrid MonteCarlo Sampling Smoother for Four Dimensional Data Assimilation
Abstract
This paper constructs an ensemblebased sampling smoother for fourdimensional data assimilation using a Hybrid/Hamiltonian MonteCarlo approach. The smoother samples efficiently from the posterior probability density of the solution at the initial time. Unlike the wellknown ensemble Kalman smoother, which is optimal only in the linear Gaussian case, the proposed methodology naturally accommodates nonGaussian errors and nonlinear model dynamics and observation operators. Unlike the fourdimensional variational method, which only finds a mode of the posterior distribution, the smoother provides an estimate of the posterior uncertainty. One can use the ensemble mean as the minimum variance estimate of the state, or can use the ensemble in conjunction with the variational approach to estimate the background errors for subsequent assimilation windows. Numerical results demonstrate the advantages of the proposed method compared to the traditional variational and ensemblebased smoothing methods.
Computational Science Laboratory Technical Report CSLTR192015
July 21, 2019
Ahmed Attia, Vishwas Rao, and Adrian Sandu
“A Hybrid MonteCarlo Sampling Smoother for Four Dimensional Data Assimilation”
Computational Science Laboratory
Computer Science Department
Virginia Polytechnic Institute and State University
Blacksburg, VA 24060
Phone: (540)2312193
Fax: (540)2316075
Email: sandu@cs.vt.edu
Web: http://csl.cs.vt.edu
Innovative Computational Solutions 
Contents
1 Introduction
Data assimilation (DA) is the process of combining information from predictions made by imperfect models, from noisy observations, and from priors to produce a consistent description of the state of a dynamical system. The application of DA to large scale systems such as the atmosphere is of great practical interest. Two approaches for solving large DA problems have gained widespread acceptance. The first approach, originating from control theory, are variational methods such as the threedimensional variational (3DVar) and fourdimensional variational (4DVar) strategies [35]. The variational methods find a maximum aposteriori (MAP) estimate of the true state of the system. The second approach are the ensemblebased statistical estimation methods. The most successful family of ensemble data assimilation algorithms includes the ensemble Kalman filter (EnKF) [19] and its variants, the squareroot Kalman filters [42], the ensemble adjustment Kalman filter [6], the ensemble transform Kalman filter [11], and efficient implementations of Kalman Filter, such as [4], using the ShermanMorrison formula. All variants of EnKF provide a minimum variance estimate of the state by approximating the expected value of the posterior distribution. Variational and statistical estimation methods yield identical estimates (only) in the case of linear dynamics, linear observations, and Gaussian errors.
Both 3DVar and EnKF are filtering methods that estimate the true state of the system at the specific time instances where observations are available. For many oceanographic, meteorological, and hydrological applications, it is advantageous to employ smoothing methods such as 4DVar and EnKS that simultaneously use information from all observations available at different time points within an assimilation time window. Strongconstraint 4DVar updates the state of the system at the initial time of an assimilation window given a background estimate of the initial condition and a set of observations distributed through this interval. The ensemble Kalman smoother, on the other hand, estimates the posterior distributions of the state at time points in the window given all past, present, and future observations (in the assimilation window).
4DVar requires the derivation and implementation of the tangent linear and the adjoint numerical models, a challenging and effortintensive task for largescale models. The 4DVar algorithm provides a single analysis state, the best posterior estimate of the state of the system. The uncertainty in the estimated state is not inherently provided by the 4DVar algorithm [14]. Previous work proposed to quantify the uncertainty in the 4DVar analysis, by approximating the analysis error covariance matrix using an ensemble of simulations [24, 25]. These schemes provide an approximation of the analysis error covariance matrix that is inconsistent with the 4DVar analysis itself because the covariance estimates are usually obtained from independent schemes such as EnKF. Approaches to quantify 4DVar analysis uncertainty based on subspace error decompositions [14, 35] are statistically consistent but require additional computational effort.
The EnKS is optimal when the observation operators are linear and the errors are Gaussian. However, these assumptions are unlikely to hold for real applications. The analysis ensemble generated by the EnKS allows to find a minimum variance estimate (e.g., ensemble mean), as well as a measure of the analysis uncertainty (e.g., the analysis error covariance matrix). When the posterior distribution is nearly Gaussian EnKS offers an efficient practical algorithm. However, when the observation operators are nonlinear and the errors are nonGaussian, the EnKS is not expected to yield good results.
The Markov Chain Monte Carlo (MCMC) family of algorithms provides a powerful foundation to sample from complicated probability distributions. These algorithms work in general by generating a Markov chain whose stationary distribution is the target probability distribution. MCMC sampling is considered to be the gold standard [26] in data assimilation. The main practical limitation of MCMC is the considerable computational cost required to achieve convergence, and to explore the entire state space in the case of high dimensional state spaces. Scalable and accelerated MCMC algorithms are being continuously developed to improve convergence and space exploration. Hybrid Monte Carlo (HMC), also known as Hamiltonian Monte Carlo, is an accelerated MCMC sampling algorithm that reduces the correlation between successive states by using Hamiltonian dynamics to generate proposal states [16]. Moreover, HMC targets states with high acceptance probability leading to fast convergence and fast space exploration. To the best of our knowledge, HMC was first considered in the context of DA in [8] to solve a nonlinear inverse problem by minimizing the residual between the solution and illposed boundary conditions. Posterior error statistics are approximated by sampling the nearby states to the optimal state after convergence. In [8] a simulated annealing strategy is used during the sampling process where the minimum is obtained at a low temperature and posterior samples are collected at high temperatures. Solving the weakconstraint 4DVar problem using a gradient method then using HMC to estimate the analysis error statistics is also discussed in [23, Chapter 6].
This work develops a nonlinear nonGaussian smoother to solve the four dimensional data assimilation problem. The new method uses a Hybrid Monte Carlo approach to sample the posterior distribution and is named the HMC smoother. This work extends the sampling filter, proposed in [7], to the four dimensional case where timedistributed observations are assimilated at once. HMC smoother provides an ensemble of states sampled from the posterior distribution of the state of the system at the initial time of the assimilation window. In a practical setting the smoothing step is carried out sequentially over consecutive assimilation windows. The generated ensemble encapsulates the uncertainty in the posterior distribution at the beginning of the current window; when propagated to the beginning of the next assimilation window it provides flowdependent information about the background error distribution in the next assimilation cycle.
The use of HMC for solving smoothing problems was presented also in [5], where a generalized version of HMC is used in an attempt to reducing the number of chain steps required to ensure independence of the generated states. The underlying dynamical system in the generalized version of HMC is not Hamiltonian. There are several important differences between this work and [5]. In [5] the only source of uncertainty is model error, in form of additive random noise included in the dynamics. Here we work in the strong constraint 4DVar framework and consider the model to be perfect; the system state is uncertain due to uncertainties in the initial conditions. While in [5] numerical experiments are carried out with a onedimensional system, here we experiment with shallow water equations over the sphere, a moderately large multidimensional nonlinear model relevant for geophysical applications. Finally, we propose to use higher order symplectic integrators, as tested in [7], to efficiently sample from complex posterior distributions that arise when the observation operators and models are highly nonlinear.
The remaining part of the paper is organized as follows. Section 2 reviews the variational and the ensemble approaches for solving the data assimilation problem. The HMC smoother is presented in Section 3. Experimental settings and numerical results are discusses in Section 4. Conclusions and future directions are summarized in Section 5.
2 Data Assimilation
This section reviews the 4DVar and the EnKS data assimilation schemes.
2.1 Fourdimensional variational data assimilation
4DVar calculates the optimal initial condition for the state of the dynamical system over a specific assimilation time window, based on a background state and using all observations available within this time window [34]. The background initial state is usually the forecast produced by propagating the previous window analysis through the model dynamics. To be specific, let the current assimilation window be the time interval . Given a background state , and a set of observations , available at the discrete time points , the 4DVar analysis is obtained by solving the following optimization problem:
(1)  
Here is the observation operator (generally nonlinear) that maps the model space into the observation space at time point . The dimension of observation space is usually much lower than the dimension of the state space, that is . is the background error covariance matrix, and ’s are the observation error covariance matrices at each times . The background error covariance matrix determines how information from observed areas are extrapolated to unobserved regions or where observations are sparsely available [41]. The state , is produced by propagating the initial state through the model dynamics from time to point
(2) 
The model solution operator represents the discretized partial differential equations that govern the evolution of the dynamical system. Realistic atmospheric and ocean models typically have state variables.
In this work we consider the strongconstraint 4DVar case which assumes that the numerical model (2) is perfect. The methodology proposed here can be immediately extended to the weakconstraint 4DVar framework [38], where model errors are accounted for by adding the corresponding model error terms to equation (1) [41].
Perturbations (small errors ) of the state of the system evolve according to the tangent linear model:
(3) 
where is the Jacobian of the model solution operator.
In strongconstraint 4DVar the model dynamics act as constraints to the optimization problem (2). The optimal initial condition obtained by solving the optimization problem (1) constrained by the model dynamics (2), is referred to as the analysis state . The gradient of the cost functional (1) is:
(4)  
where is the adjoint of the tangent linear model operator (3), is the Jacobian of the observation operator , and is its adjoint. Gradientbased minimization using (4) requires the development of the tangent linear and the adjoint models, which is a challenging task for highdimensional complex models of practical interest. The performance of the optimization can be improved by using the second order derivative information and adaptive observations as described in [3].
2.2 Bayesian interpretation of 4DVar
The knowledge of the system state at the initial time prior to obtaining new observations is described by the background (prior) probability density . The “sampling model” gives the probability distribution of observations conditioned by the initial state
,
under the belief that the dynamical model perfectly represents reality. From Bayes’ theorem:
(5a)  
(5b) 
The posterior (analysis) PDF is the probability distribution of the initial state after incorporating the new knowledge contained in the observations. The denominator in (5b) is the marginal density of the observations and acts as a normalization factor.
The background and observations errors are usually assumed to have Gaussian distributions:
(6a)  
(6b)  
where is the background error covariance matrix and ’s are the observation error covariance matrices at times . If the observation errors at different time points are independent, and the model is perfect, the joint sampling model can be written as  
(6c) 
Bayes’ rule (5) yields the following posterior PDF
(7a)  
(7b) 
For nonlinear models and nonlinear observation operators the posterior (7) is not Gaussian. The kernel of the posterior is , where is the cost functional of the 4DVar problem. 4DVar computes the analysis as the minimizer of . The 4DVar solution is the MAP estimate of the initial state under the assumptions that the background and observation errors are Gaussian. For highly nonlinear observation operators and highly nonlinear models the posterior can have multiple modes. In this case the 4DVar numerical solution can be trapped in a local minimum of the cost functional.
The posterior distribution (5) contains the complete characterization of the uncertain initial state of the dynamical system. However, calculating the full posterior with high dimensional models is infeasible in practice. A practical approach is to describe the posterior probability density by an ensemble of states, and to use it to estimate moments of the distribution. Sampling directly from the posterior PDF of the initial condition (7) acts as a smoother; the ensemble mean provides an estimate of the true state of the system, and the ensemble covariance an estimate of the posterior uncertainty that is totally consistent with the analysis state. In contrast, the current practice to use the analysis obtained from 4DVar and the covariance obtained from EnKF or EnKS leads to inconsistent representations of uncertainty.
2.3 Ensemble Kalman filter and smoother
Filtering is the process of calculating the posterior distribution of the uncertain state of a dynamical system at a specific time given observations only available at that time instance. The ensemble Kalman filter [19] represents probability distributions by samples. Let be an ensemble of forecast states at time , and the observation vector (set of measurements) at . If the forecast (background) ensemble is represented by the matrix , whose columns are the forecast ensemble members, then the updated (analysis) ensemble matrix at the same time is obtained as [20]
(8) 
where the matrix is a nonlinear transformation constructed from the forecast ensemble and the observations at time [20]. Squareroot filters [42], the ensemble transform Kalman filter [11], and the ensemble adjustment Kalman filter [6] can all be written in the form (8) for specific choices of the transformation .
Smoothing is the process of calculating the posterior distribution of the uncertain states of a dynamical system given past, present, and future observations [13]. There are three approaches to smoothing: fixedpoint, fixedinterval, and fixedlag smoothing [13], with the second and third ones being the most popular [33]. Any scheme that can be used to solve any of the three smoothing problems can also be employed as a single fixedinterval smoothing scheme [13]. The ensemble smoother (ES) was introduced in [21] as a linear variance minimization algorithm. The ensemble Kalman smoother (EnKS) [22] employs an ensemble of states to describe distributions and obtains the posterior using the Kalman filter updates equations. EnKS is optimal in case of linear dynamics, Gaussian errors, and large number of ensemble members [23].
To construct EnKS [22] the EnKF update equations (8) are used repeatedly to develop a fixedlag, and a fixedinterval smoothing algorithms. A fixedpoint smoother can be written as [20]
(9) 
The update equation (9) is used recursively for fixedinterval smoothing, where smoothed ensembles are obtained at specified set of times, and they are conditioned only on observations available at later times in the interval. Ravela and McLaughlin [33] presented efficient, fast versions of the fixedinterval and the fixedlag EnKS. The fast fixedinterval smoother has a computational cost that scales linearly with respect to the length of the interval. In this work, we use the fast fixedinterval EnKS [33], with a single smoothing point (fixedpoint smoother) chosen at the beginning of the time interval.
EnKS computes the minimum variance estimate of the state. This is not expected to be very accurate if the observations are highly nonlinear or if the Gaussianity assumptions are severely violated. As shown in Section 4, the HMC sampling smoother proposed herein is capable of handling nonlinear observation and model operators, and consequently produces posterior estimates that are more useful than the EnKS ensemble, and contain more information than the 4DVar MAP analysis. Hybrid methods such as [31] make use of optimization over ensembles using the trustregion framework.
3 The hybrid MonteCarlo sampling smoother
The most popular and successful class of sampling algorithms is the Markov chain MonteCarlo (MCMC) [29], first introduced by Metropolis et al. [27]. MCMC algorithms sample from a general probability distribution by building a Markov chain whose invariant distribution is . MCMC algorithms have an advantage of not requiring the normalization of target distributions. However, traditional MCMC samplers are often considered impractical for large dimensional problems due to the following drawbacks: The Markov chain may take a very long time to reach stationarity. A large number of (burnin) states are generated and discarded before starting the sampling process, in order to guarantee that the collected samples are obtained from the true target PDF. The samples should be independent, however the Markov chain is not completely memoryless; in order to achieve independence of sampled states, the sampler usually drops some intermediate states between each selected state. Another drawback of most of MonteCarlo sampling methods is the curse of dimensionality [29]: as the dimension of the state spaces grows, the number of sample members needed to represent the probability distribution, grows rapidly. The number of samples required to efficiently represent the probability distribution can be controlled if the sampler surveys sufficiently fast the entire state space. The sampler can become trapped in a highprobability mode of a multimodal distribution, and fail to represent the other probability modes.
3.1 Hybrid MonteCarlo
Hybrid/Hamiltonian MonteCarlo (HMC) [16] follows an auxiliaryvariable approach in order to alleviate the limitations of the traditional MCMC algorithms.
The phase space of a Hamiltonian dynamical system consists of points , where is the position variable, and is the momentum variable. The Hamiltonian dynamics is governed by the set of ordinary differential equations (ODEs):
(10a)  
where the Hamiltonian function describes the total energy of the system  
(10b) 
The first term of the Hamiltonian (10b) represents the potential energy of the system, while the second term corresponds to the kinetic energy. The exact (analytic) flow of the Hamiltonian system (10a) advances the solution in time from to :
(11) 
This flow cannot be calculated exactly in practice, and has to be approximated by an equivalent numerical solution using a time reversible symplectic integrator [36, 37]. The most common symplectic integrator is leapfrog (Störmer–Verlet) [36, 37]. One step of the position Verlet algorithm advances the solution of the Hamiltonian equations (10a) from time to time using:
(12a)  
(12b)  
(12c) 
The optimal time step size must satisfy [9], and careful empirical tuning of the step size is usually required for good performance [7]. Several other symplectic integrators with more stages and higher accuracy than Verlet have also been developed [12]. An infinite dimensional time integrator was also introduced in [10].
For practical considerations it is advisable to split the interval where the Hamiltonian system evolves into smaller sub steps of length . The flow of the numerical solution obtained by the symplectic integrator will be denoted by and is an approximation of the exact flow .
The key idea in HMC sampling is to add an auxiliary variable to the target variable and sample from the joint probability distribution of . The auxiliary variable is chosen such that the sampling procedure from the joint distribution is much faster than sampling from the marginal distribution of the target variable. In HMC sampling the target and the auxiliary variables are thought of as the position and momentum components of a Hamiltonian system, respectively. The Hamiltonian dynamics of the system serves as a transition kernel to the Markov chain.
The kernel of the stationary probability distribution of the Hamiltonian system (10) is [29]
(13a)  
(13b) 
where is the probability distribution of the position variable. The joint probability distribution of the state in the phase space is the product of the marginal distributions of both the position and the momentum. This simply means that the two variables are independent [36]. Independence of both position and momentum makes it possible to sample from the marginal distribution of each variable by sampling from their joint distribution. The marginal PDF of the momentum variable is a Gaussian distribution with zero mean and covariance matrix (also known as the mass matrix), i.e., .
Let be a random variable that is the target of the MCMC sampling algorithm. View as the position variable in the Hamiltonian system, and add the momentum as an auxiliary variable. The symplectic integrator is used to propose a state that is either accepted or rejected using an acceptance/rejection rule based on the loss of energy. Algorithm 1 [36] summarizes the HMC steps to sample from the probability distribution .
(14) 
(15) 

If accept the proposal as the next sample: ;

If reject the proposal and continue with the current state: .
The loss of energy between the current and the proposed state is usually calculated as the difference between the Hamiltonians at the current and the proposed states:
(16) 
This equation (16) is valid for the Verlet (12), twostage, threestage, and fourstage symplectic integrators [12, 36]. See [7] for details on different symplectic integrators and corresponding expressions for energy. The length of the Hamiltonian trajectory and the number of steps are parameters to be tuned by the user [30]. Another usertunable parameter is the mass matrix , a symmetric positive definite matrix that represents the covariance of the momentum variable. The choice of the mass matrix does not alter the fact that HMC sampling Algorithm 1 converges to the stationary distribution . However, a good choice of can considerably improve sampling efficiency. One popular and simple choice is to take a constant multiple of the identity matrix. Ideally, if the variances of the target distribution are known (or can be approximated), the diagonal of should be chosen as the corresponding precisions (reciprocals of these variances) [30]. We found that this choice results in a very fast convergence of the chain to stationarity.
HMC sampling Algorithm 1 tends to explore the state space faster than traditional MCMC, and the acceptance probability of all generated states is close to one. Several enhancements to the HMC sampling, such as parallel tempering [18, 40], have been proposed to guarantee that the algorithm escapes local modes of high probability.
3.2 Sampling smoother algorithm
We now present the HMC smoother (smoothing by sampling) that simultaneously accounts for all observations available within a specific assimilation window to obtain posterior estimates of the initial system state.
Consider the assimilation window with a set of observations available at times inside the window, where . Under the assumptions discussed in Section 2.2 the posterior (analysis) probability distribution of the initial state takes the form (7). We seek to sample from this posterior distribution using the HMC approach. For this we set the potential energy term in (10b) to be the 4DVar cost functional (7b). Consequently the target probability distribution coincides with the 4DVar posterior distribution (7), i.e., .
The smoother works sequentially over consecutive assimilation windows by applying the forecast and analysis (sampling) steps in succession. In the forecast step each state of the analysis ensemble is propagated in time to the end of the previous assimilation window (the beginning of the current window). The result of the forecast step is a forecast ensemble (or just a single background state ) at the beginning of the current time window, i.e. at . One can just propagate the analysis state (e.g. the mean of the analysis ensemble) to obtain the current background state . However, propagating the full analysis ensemble makes it possible to build an ensemblebased (flowdependent) background error covariance matrix at the beginning of the current window. This background error covariance matrix includes the errors of the day, and can considerably enhance the quality of the analyses generated by a data assimilation scheme. We will assume herein that the full forecast ensemble is generated in the forecast step. In the analysis step, the HMC sampling strategy summarized in Algorithm 1 is applied to obtain the analysis ensemble at the initial time of the current assimilation window.
The HMC sampling smoother is detailed in Algorithm 2.
The generated ensemble of states , samples the posterior PDF , and can be used to calculate the best estimate of the initial condition of the system (e.g., the mean () of the ensemble), and to estimate the analysis error covariance matrix :
(17a)  
(17b) 
The forecast and analysis steps are repeated sequentially on subsequent assimilation windows. The propagated ensemble can be used to estimate the analysis covariance at the final time using (17b), such as to provide “flowdependent” information for the background error covariance matrix used in the subsequent assimilation interval.
4 Numerical Experiments
The proposed HMC sampling smoother is tested against the EnKS and the 4DVar schemes on two numerical models. We first illustrate the distinctive features of the HMC smoother with a simple onedimensional model with a nonlinear observation operator and a bimodal posterior distribution. Next, we employ the shallow water on the sphere to test the sampling smoother on a problem relevant to geophysics, and to compare its performance against the conventional 4DVar scheme.
4.1 A onedimensional model
Consider the following model:
(18a)  
(18b) 
that describes the position of a particle over the entire real line moving under the effect of the potential field (18b). This model is similar to the one used in [5]. The potential field has two local minima at , which are expected to act as attractors for the particle. We set the reference initial condition to , and chose the background initial condition to be . Note that the true and the background initial conditions lie in the basins of attraction of different equilibria. The background errors are assumed to be normally distributed with zero mean and standard deviation .
Synthetic observations are obtained from the reference solution by applying the quadratic observation operator
(19) 
Observation errors are assumed to be Gaussian with zero mean and standard deviation . The simulation time window is (units), with equally spaced observation points. The posterior distribution of the initial state reads
(20)  
where is obtained by propagating forward in time, from to (units), using the model (18). The nonnormalized posterior density (20) is illustrated in Figure 1.
Traditional assimilation methods, like 4DVar and EnKS, are expected to have difficulties capturing the bimodal nature of the posterior distribution of the initial condition. Since the prior PDF is a Gaussian centered around the background state with standard deviation , the right peak in Figure 1 is slightly taller than the left peak. With Gaussian background prior centered around one of the peaks, smaller standard deviation would damp the other peak. Capturing only that right peak completely misses the true solution, which is negative. Numerical results presented below show that the proposed HMC smoother is capable of building a representative ensemble from the bimodal posterior distribution. The analysis ensemble can then be used to draw more useful conclusions (e.g. statistics) than what can be obtained from analysis results obtained by the traditional methods.
4.1.1 Numerical results with the onedimensional model
HMC smoothing was carried out to collect an ensemble of members from the posterior (20). We tested several symplectic integrators [7], and found that all show similar behavior. We chose the position Verlet symplectic integrator due to its minimal computational cost for all our experiments. The Hamiltonian system step size is empirically tuned to , with step length , and number of steps . The number of burnin steps is chosen to be (for this simple model we already know that the forecast state lies in the support of the posterior and the burnin steps could be omitted; in general one can incorporate convergence tests to shorten the number of burnin steps and ensure that the collected samples are from the target distribution). Four states are dropped between consecutive selected states at stationarity to guarantee the independence of the samples.
The histograms of the analysis ensembles obtained with the HMC smoother and EnKS are shown in Figure 2. The HMC smoother generates an analysis ensemble that matches the kernel shown in Figure 1, but EnKS fails to generate an accurate analysis ensemble. The most likely state seems to be located in the correct place.
A single analysis state (best estimate) in this case might be misleading. One needs to consider more than one analysis with certain probability to give better description of the true state of the system in case of multimodal systems.
The 4DVar algorithm is expected to be trapped in a local minimum of the posterior distribution. Since the background state is closer to than to , and since the observations (19) are insensitive to the sign of solution, we expect the analysis to follow the behavior of the true solution but with the opposite sign. This is confirmed by results in Figure 3.
4.2 Shallow water model on the sphere
The shallow water equations have been used extensively as a simplified model of the atmosphere that contains the essential wave propagation mechanisms found in general circulation models (GCMs)[39]. The shallow water equations in spherical coordinates are [1]:
(21a)  
(21b)  
(21c) 
Here is the Coriolis parameter, given by , where is the angular speed of the rotation of the Earth. In addition, represents the height of the homogeneous atmosphere, and are the zonal and meridional wind components, respectively. The latitudinal and longitudinal directions are respectively denoted by and . The radius of the Earth is denoted by and is the acceleration due to gravity. The space discretization is performed using the unstaggered TurkelZwas scheme [28]. The discretization has nlon=72 nodes in longitudinal direction and nlat=36 nodes in the latitudinal direction. The semidiscretization in space leads to a system of ordinary differential equations:
(22) 
The vector with contains discrete versions of the zonal wind, meridional wind, and the height variables. We perform the time integration using a order RungeKutta method. This timeintegrator is part of the MATLODE suite [15], which also has sensitivity analysis capabilities.
4.2.1 Observations and background information
A reference initial condition is used to generate a reference trajectory. Synthetic linear observations are created from the reference trajectory by adding Gaussian noise with zero mean for each of the three components. The observation error standard deviation for height component is set to of the average magnitude of the reference height component in the reference initial condition. The observation error standard deviation for wind components is set to of the average magnitude of the reference wind component in the initial condition. The initial background state is created by perturbing the reference initial condition with a Gaussian error drawn from the distribution , with a modeled background error covariance matrix. The background error covariance is modeled as follows:

Start with a diagonal background error covariance matrix. The standard deviation of the background errors for the height component is of the average magnitude of the reference height component in the reference initial condition. The standard deviation of the background errors for the wind components is of the average magnitude of the reference wind component in the reference initial condition.

Synthetic initial ensemble is created by adding zeromean Gaussian noise to the reference initial condition with covariances set to the initial (diagonal) background error covariance matrix. Apply the ensemble Kalman filter for cycles with observations obtained each hour. The uncertainties in observations are fixed to , and for the height and wind components respectively. The synthetic observations are obtained by adding Gaussian noise with zero mean and standard deviation equal to the uncertainty level multiplied by the average magnitude of the corresponding component (height and wind) in the initial condition.

Decorrelate the ensemblebased covariances using a decorrelation matrix with decorrelation distance .

Calculate by averaging the ensemble covariances over the last hours with one matrix per hour. In this version the background noise levels are no longer and .
This method of creating a synthetic initial background error covariance matrix is empirical, but we found that the resulting background error covariance matrix performs well for several algorithms including 4DVar. Enhancing the quality of this background error covariance matrix can be done by making use of the ensembles generated by the sampling smoother, an idea that we will investigate in future work.
Data assimilation experiments with this model were conducted for three consecutive assimilation windows. The time interval of the first assimilation window is hours, the second window is hours, and the third is hours. The short first window can be regarded as a spinoff period for the data assimilation system. Hourly (synthetic) observations are available each of the three windows, with a total of observation times in the first window, and observation times in each of the last two windows.
Two experiments were conducted. In the first one the background error covariance matrix is kept fixed for each of the three windows. In the second experiment is updated with information from the generated ensemble according to the following expression:
(23) 
where is the updated version of , and is the fixed version used in the first experiment. The scalar weight is a number in the interval . Selecting ignores the erroroftheday, while forces the use of only the flowdependent background error covariance matrix obtained from the ensemble, possibly leading to a singular covariance matrix. In our experiments we chose .
The error metric used to compare analyses against the reference solution is the root mean squared error (RMSE):
(24) 
where is the reference state of the system. The RMSE is calculated hourly along the trajectory over each assimilation window.
4.2.2 Numerical results with the shallow water on the sphere model
The numerical optimization step in 4DVar is carried out using the the LBFGS routine implemented in the Poblano optimization toolbox [17]. Here the optimization process is stopped when the norm of the gradient is or when the relative function tolerance hits . The optimization process takes at least iterations of LBFGS to converge for the experiment considered here; (see Table 1) .
The HMC sampling smoother is used to generate ensemble members in each assimilation window. The symplectic integrator used is Verlet (12) with an empirically tuned step length of , with , and . A practically useful recipe is to perturb the step length of the symplectic integrator, a procedure that guarantees that the results obtained are not contingent on that specific selection of step settings [12, 30]. The step length is perturbed with uniform random noise: , . It is important to notice that the step is pertubed only once at the beginning of the Hamiltonian trajectory and kept fixed for all the steps. This actually means that the length of the Hamiltonian trajectory is perturbed for each proposal state while keeping the number of steps constant such that at each run the step size scales accordingly
The number of burnin steps is set to . We noticed that the HMC smoother converges to the posterior in much fewer steps (). Four generated states are discarded between each selected state in the ensemble to guarantee independence of the generated ensemble members.
The RMS errors for both HMC smoother and 4DVar over the three assimilation windows are shown in Figure 4. Figure (a)a reports the case where the background error covariance matrix is kept fixed, and Figure (b)b shows the case where is updated, at the beginning of each assimilation window, according to equation (23). The quality of the analyses in both cases is very similar, however in the second case a slight reduction in RMSE is noticed along the entire trajectory. This is appreciated by Figure 5 zooming onto the RMSE results over the second assimilation window.
The HMC smoother can sample efficiently from the posterior distribution and resulting analysis competes in accuracy with that obtained using 4DVar. Figure 6 shows the three components at the beginning of the first window for the reference solution, the background state, 4DVar analysis, and HMC smoother analysis (ensemble mean). Results shown in The analysis recovered from the noisy background by both 4DVar and HMC smoother are almost identical.
The assimilation results obtained over the next two windows with kept fixed are shown in Figures 7 and 8. The performance of the two schemes, 4DVar, and the HMC smoother is quite similar, and the HMC smoother analysis competes with the 4DVar analysis.
Updating the background error covariance can, in principle, enhance the performance of both the 4DVar and the HMC smoother. In the case of 4DVar, updating results in lower RMSE which indicates that in real applications, the analysis is expected to be closer to reality. In addition to more accurate prior kernel, updating will result in a better update of the mass matrix which in turn is expected to result in better performance of the smoother. In our experiments the update has a small positive impact on the performance of the two data assimilation schemes as explained in Figures 4, 5. The positive effect here is explained by reduction in the RMS errors. The resulting ensemblebased forecast error covariance matrix that is used to update can be crucial for cases where observations are sparse or not uniformly distributed over the grid, and therefore well worth the computational overhead of the forward propagation of all the analysis ensemble members to build the full forecast ensemble. Results of the data assimilation process with hybrid (updated) background error covariance matrix on the next two windows are shown in Figures 9 and 10.
4.3 Computational considerations
Both 4DVar and HMC smoother require the same computational infrastructure, namely, an adjoint model that computes the gradient of the 4DVar cost functional (7b). This gradient calculation is the computational bottleneck for both 4DVar and HMC smoother. It requires forward propagation of the model, and a backward propagation of the adjoint model. In our shallow water test the cost of one adjoint model run is approximately equivalent to times the cost of one forward model run. This makes the cost of gradient evaluation approximately equal to the cost of a forward model run.
Data assimilation scheme  Cost  Assimilation window  

Fixed  Fixed  Hybrid  Fixed  Hybrid  
4DVar  Function evaluations  151  97  101  96  93 
Number of iterations  49  47  46  46  45  
Cost in equivalent forward model runs  322.5  261.5  262  257  250.5  
HMC smoother  Number of proposed states  530  
Cost per proposal  4.5  
Cost in equivalent forward model runs  2,385 
The cost of the 4DVar depends on the number of iterations and function evaluations required by the optimization algorithm. On the first window the number of iterations required by the LBFGS optimizer is , with function evaluations. The total cost of the 4DVar solution is then equivalent forward model runs.
The cost of the HMC smoother depends on the configuration of the chain: the number of burnin step, the number of dropped states at stationarity, and stepsize settings of the symplectic integrator. The symplectic integrator itself controls the number of adjoint runs to evaluate the gradient of the cost functional in order to propose a new state to the chain. The size of the desired ensemble controls the length of the Markov chain and consequently the total cost of the analysis step by the HMC smoother. The Verlet integrator (12), used in the current experiments, requires a single adjoint run to propose a new state to the chain. The acceptance/rejection criterion requires an additional forward run to evaluate the loss of energy. This makes the cost of generating a proposal state to the Markov chain approximately equal to the cost of a model run. On all assimilation windows the chosen ensemble size is . The number of burnin states is , and states are rejected between consecutive selected samples. The HMC sampling smoother, in this case generates states to collect the analysis ensemble, with a total cost roughly equal to forward model runs.
The computational cost of the two data assimilation schemes, 4DVar and HMC smoother, on each assimilation window are summarized in Table 1. Notice that the total cost of DA schemes is given in terms of the total number of forward model runs. On the first window the cost of the HMC smoother is approximately times the cost of the 4DVar scheme. On the next two windows, the HMC smoother costs roughly times the cost of the 4DVar. A costreduction in 4DVar is expected because the forecast state is closer to the MAP than the case on the first window. The higher computational cost of the HMC smoother can be handled more efficiently by parallelizing the sampling scheme and the gradient calculations. Another way to reduce the computational cost of the HMC smoother is to replace the burnin steps with a suboptimal 4DVar obtained using a small number of iterations. The computational cost of the proposed sampling smoother can of course by be reduced by decreasing the ensemble size, however this will result in higher sampling error. The impact of the sampling errors can be assessed by the techniques developed in [2, 32].
The increased cost of the HMC smoother could be acceptable in view of the additional useful information it provides: a sample estimate of the analysis probability distribution (and as immediate consequences an analysis error covariance matrix and a flowdependent background error covariance matrix for the next cycle).
5 Conclusion and Future Work
A fourdimensional data assimilation smoother is proposed in this paper. The smoother samples from the posterior distribution using a Hybrid MonteCarlo approach. The 4DVar approach provides a MAP estimate of the true state, but it does not compute a measure of uncertainty of the analysis. The HMC smoother builds an ensemble approximating the posterior PDF. This can be used to estimate the true state together with the uncertainty in analysis, e.g., by calculating the ensemble mean and ensemblebased analysis error covariance matrix. Moreover, propagating the analysis ensemble to the beginning of the next assimilation window provides a forecast ensemble that can be used to construct a flowdependent background covariance matrix for this new window. Unlike several popular hybrid approaches, the HMC smoother generates an analysis error covariance that is consistent with the analysis state – because both statistics are produced by one consistent data assimilation scheme.
The HMC smoother requires an adjoint of the simulation model, and runs on the same computational infrastructure as 4DVar. The computational cost of the HMC smoother is  as of now  larger than that of 4DVar. The efficiency issue must be addressed before the HMC smoother becomes fully practical. We are currently investigating several strategies to enhance the performance of the sampling smoother and to reduce its computational cost. Parallelizing the sampling smoother will be considered. We will also test the HMC smoother on the case of nonlinear observations, imperfect models, and nonGaussian errors.
Acknowledgements
This work was partially supported by awards AFOSR FA9550–12–1–0293–DEF, NSF CCF–1218454, NSF DMS–1419003, and by the Computational Science Laboratory at Virginia Tech.
References
References
 [1] Exshall: A TurkelZwas explicit large timestep FORTRAN program for solving the shallowwater equations in spherical coordinates. Computers and Geosciences 17(9), 1311 – 1343 (1991)
 [2] Aposteriori error estimates for variational inverse problems. SIAM/ASA Journal on Uncertainty Quantification Submitted (2014)
 [3] An optimization framework to improve 4dvar data assimilation system performance. Journal of Computational Physics 275(0), 377 – 389 (2014)
 [4] An efficient implementation of the ensemble Kalman filter based on an iterative ShermanâMorrison formula. Statistics and Computing 25(3) (2015). DOI 10.1007/s1122201494544
 [5] Alexander, F.J., Eyink, G.L., Restrepo, J.M.: Accelerated monte carlo for optimal estimation of time series. Journal of Statistical Physics 119(56), 1331–1345 (2005)
 [6] Anderson, J.: An ensemble adjustment Kalman filter for data assimilation. Monthly Weather Review 129, 2884–2903 (2001)
 [7] Attia, A., Sandu, A.: A hybrid Monte Carlo sampling filter for nongaussian data assimilation. Quarterly Journal of the Royal Meteorological Society Submitted (2014)
 [8] Bennett, A.F., Chua, B.S.: Openocean modeling as an inverse problem: the primitive equations. Monthly weather review 122(6), 1326–1336 (1994)
 [9] Beskos, A., Pillai, N., Roberts, G., SanzSerna, J.M., Stuart, A.: Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19(5A), 1501–1534 (2013)
 [10] Beskos, A., Pinski, F., SanzSerna, J.M., Stuart, A.: Hybrid Monte Carlo on Hilbert spaces. Stochastic Processes and their Applications 121(10), 2201–2230 (2011)
 [11] Bishop, C., Etherton, B., Majumdar, S.: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Monthly Weather Review 129, 420–436 (2001)
 [12] Blanes, S., Casas, F., SanzSerna, J.M.: Numerical integrators for the hybrid Monte Carlo method. arXiv preprint arXiv:1405.3153 (2014)
 [13] Briers, M., Doucet, A., Maskell, S.: Smoothing algorithms for statespace models. Annals of the Institute of Statistical Mathematics 62(1), 61–89 (2010)
 [14] Cheng, H., Jardak, M., Alexe, M., Sandu, A.: A hybrid approach to estimating error covariances in variational data assimilation. Tellus A 62(3), 288–297 (2010)
 [15] D’Augustine, A., Sandu, A.: MATLODE: A MATLAB ODE solver and sensitivity analysis toolbox. ACM Transactions on Mathematical Software (TOMS) Submitted (2015)
 [16] Duane, S., Kennedy, A., B.J. Pendleton, J.B., Roweth, D.: Hybrid Monte Carlo. Physics Letters B 195(2), 216–222 (1987)
 [17] Dunlavy, D., Kolda, T., Acar, E.: Poblano v1. 0: A MATLAB toolbox for gradientbased optimization. Sandia National Laboratories, Tech. Rep. SAND20101422 (2010)
 [18] Earl, D.J., Deem, M.W.: Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7(23), 3910–3916 (2005)
 [19] Evensen, G.: Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forcast error statistics . Journal of Geophysical Research 99(C5), 10,143–10,162 (1994)
 [20] Evensen, G.: The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dynamics 53 (2003)
 [21] Evensen, G., van Leeuwen, P.: Assimilation of Geosat altimeter data for the Agulhas Current using the ensemble Kalman filter with a quasigeostrophic model . Monthly Weather Review 124, 85–96 (1996)
 [22] Evensen, G., van Leeuwen, P.: An ensemble Kalman smoother for nonlinear dynamics . Monthly Weather Review 128, 1852–1867 (2000)
 [23] G., E.: Data Assimilation: The ensemble Kalman filter. Springer, Berlin (2007)
 [24] Gustafsson, N., Bojarova, J.: Fourdimensional ensemble variational (4DEnVar) data assimilation for the high resolution limited area model (HIRLAM). Nonlinear Processes in Geophysics 21(4), 745–762 (2014)
 [25] Hunt, B.R., Kalnay, E., Kostelich, E., Ott, E., patil, D., sauer, T., Szunyogh, I., Yorke, J., Zimin, A.: Fourdimensional ensemble kalman filtering. Tellus 56(4), 273–277 (2004)
 [26] Law, K.J.H., Stuart, A.M.: Evaluating data assimilation algorithms. Monthly Weather Review 140(11), 3757–3782 (2012)
 [27] Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21(6), 1087–1092 (1953)
 [28] Navon, I.M., De Villiers, R.: The application of the TurkelZwas explicit large timestep scheme to a hemispheric barotropic model with constraint restoration. Monthly weather review 115(5), 1036–1052 (1987)
 [29] Neal, R.: Probabilistic inference using Markov chain Monte Carlo methods. Department of Computer Science, University of Toronto Toronto, Ontario, Canada (1993)
 [30] Neal, R.: MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte Carlo (2011)
 [31] Nino Ruiz, E., Sandu, A.: A derivativefree trust region framework for variational data assimilation. Computational and Applied Mathematics In print (2015)
 [32] Rao, V., Sandu, A.: A posteriori error estimates for dddas inference problems. Procedia Computer Science 29, 1256–1265 (2014)
 [33] Ravela, S., McLaughlin, D.: Fast ensemble smoothing. Ocean Dynamics 57(2), 123–134 (2007)
 [34] Sandu, A., Chai, T.: Chemical data assimilationâan overview. Atmosphere 2(3), 426–463 (2011)
 [35] Sandu, A., Cheng, H.: A subspace approach to data assimilation and new opportunities for hybridization. International Journal for Uncertainty Quantification In print (2015)
 [36] SanzSerna, J.: Markov chain Monte Carlo and numerical differential equations. In: Current Challenges in Stability Issues for Numerical Differential Equations, pp. 39–88. Springer (2014)
 [37] SanzSerna, J., MP.Calvo: Numerical Hamiltonian problems, vol. 7. Chapman & Hall London (1994)
 [38] Sasaki, Y.: Some basic formalisms in numerical variational analysis. Monthly Weather Review 98(12), 875–883 (1970)
 [39] StCyr, A., Jablonowski, C., Dennis, J., Tufo, H., Thomas, S.: A comparison of two shallow water models with nonconforming adaptive grids. Monthly Weather Review 136, 1898–1922 (2008)
 [40] Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spinglasses. Physical Review Letters 57(21), 2607 (1986)
 [41] Trémolet, Y.: Accounting for an imperfect model in 4DVar. Quarterly Journal of the Royal Meteorological Society 132(621), 2483–2504 (2006)
 [42] Whitaker, J., Hamill, T.M.: Ensemble data assimilation without perturbed observations. Monthly Weather Review 130, 1913–1924 (2002)