A Timeparallel Approach to Strongconstraint Fourdimensional Variational Data Assimilation
Abstract
A parallelintime algorithm based on an augmented Lagrangian approach is proposed to solve fourdimensional variational (4DVar) data assimilation problems. The assimilation window is divided into multiple subintervals that allows to parallelize cost function and gradient computations. Solution continuity equations across interval boundaries are added as constraints. The augmented Lagrangian approach leads to a different formulation of the variational data assimilation problem than weakly constrained 4DVar. A combination of serial and parallel 4DVars to increase performance is also explored. The methodology is illustrated on data assimilation problems with Lorenz96 and the shallow water models.
Computational Science Laboratory Technical Report CSLTR182015
July 11, 2019
Vishwas Rao and Adrian Sandu
“A Timeparallel Approach to Strongconstraint Fourdimensional Variational Data Assimilation”
Computational Science Laboratory
Computer Science Department
Virginia Polytechnic Institute and State University
Blacksburg, VA 24060
Phone: (540)2312193
Fax: (540)2316075
Email: sandu@cs.vt.edu
Web: http://csl.cs.vt.edu
Innovative Computational Solutions 
Contents
1 Introduction
Predicting the behavior of complex dynamical systems, such as the atmosphere, requires using information from observations to decrease the uncertainty in the forecast. Data assimilation combines information from a numerical model, prior knowledge, and observations (all with associated errors) in order to obtain an improved estimate of the true state of the system. Data assimilation is an important application of datadriven application systems (DDDAS [5], or InfoSymbiotic systems) where measurements of the physical system are used to constrain simulation results.
Two approaches to data assimilation have gained widespread popularity: variational and ensemblebased methods. The ensemblebased methods are rooted in statistical theory, whereas the variational approach is derived from optimal control theory. The variational approach formulates data assimilation as a nonlinear optimization problem constrained by a numerical model. The initial conditions (as well as boundary conditions, forcing, or model parameters) are adjusted to minimize the discrepancy between the model trajectory and a set of timedistributed observations. In realtime operational settings the data assimilation process is performed in cycles: observations within an assimilation window are used to obtain an optimal trajectory, which provides the initial condition for the next time window, and the process is repeated in the subsequent cycles. The variational methodology is widely adopted by most national and international numerical weather forecast centers to provide the initial state for their forecast models.
Performing nonlinear optimization in the 4DVar framework is an inherently sequential process. Computer architectures progressively incorporate more parallelism, while maintaining a constant processor speed. As our understanding of the physics improves the computer models become increasingly more complex. Advanced and scalable parallel algorithms to solve 4DVar need to be developed to continue to perform data assimilation in real time. This challenge has been partially addressed by exploring parallelism in spatial dimension. Trémolet and Le Dimet [30] have shown how variational data assimilation can be used to couple models and to perform parallelization in space for the assimilation process. Rantakokko [21] considers different data distribution strategies to perform parallel variational data assimilation in the spatial dimension. A scalable approach for three dimensional variational (3DVar) data assimilation is presented in [12] and parallelism is achieved by dividing the global problem into multiple local 3DVar subproblems. Multiple copies of modified 3DVar problem, which ensures feasibility at the boundaries of the subdomains, are solved across processors and the global 3DVar minimum is obtained by collecting the local minima.
An important challenge associated with computing solutions to 4DVar problem is parallelization in the temporal dimension. Fisher [14] attempts to address this challenge by the saddle point formulation that solves directly the optimality equations. Aguiar et al. [1] apply the augmented Lagrangian approach to constrained inference problems in the context of graphical models. The approach proposed herein uses the augmented Lagrangian framework in the context of 4DVar data assimilation. The most computationally expensive components of 4DVar, namely the cost function and gradient evaluations, are performed in a timeparallel manner.
The remainder of this paper is organized as follows: Section 2 introduces data assimilation and the traditional 4DVar approach. Section 3 formulates the 4DVar problem in an augmented Lagrangian framework to expose the time parallelism in the cost function and gradient evaluations. Section 4 gives a detailed description of the parallel assimilation algorithm. Section 5 shows the numerical results with the small, chaotic Lorenz96 model, and a relatively large shallow water on the sphere model. Concluding remarks and future research directions are discussed in Section 6.
2 Fourdimensional variational data assimilation
Data assimilation (DA) is the fusion of information from priors, imperfect model predictions, and noisy data, to obtain a consistent description of the true state of a physical system [11, 17, 27, 26]. The best estimate that optimally fuses all these sources of information is called the analysis .
The prior information encapsulates our current knowledge of the system. Usually the prior information is contained in a background estimate of the state and the corresponding background error covariance matrix .
The model captures our knowledge about the physical laws that govern the evolution of the system. The model evolves an initial state at the initial time to future states at future times . A general model equation is represented as follows:
(1) 
Observations are noisy snapshots of reality available at discrete time instances. Specifically, measurements of the physical state are taken at times , . The model state is related to observations by the following relation:
(2)  
The observation operator maps the model state space onto the observation space. The observation error term accounts for both measurement and representativeness errors. Measurement errors are due to imperfect sensors. The representativeness errors are due to the inaccuracies of the mathematical and numerical approximations inherent to the model.
Variational methods solve the data assimilation problem in an optimal control framework, where one finds the control variable which minimizes the mismatch between the model forecasts and the observations. Strongconstraint 4DVar assumes that the model (1) is perfect [27, 26]. The control parameters are the initial conditions , which uniquely determine the state of the system at all future times via the model equation (1). The background state is the prior best estimate of the initial conditions , and has an associated initial background error covariance matrix . Observations at have the corresponding observation error covariance matrices , . The 4DVar problem provides the estimate of the true initial conditions as the solution of the following optimization problem
(3) 
with the following cost function:
The first term of the sum (2) quantifies the departure of the solution from the background state at the initial time . The second term measures the mismatch between the forecast trajectory (model solutions ) and observations at all times in the assimilation window. The weighting matrices and need to be predefined, and their quality influences the accuracy of the resulting analysis.
Weak constraint 4DVar [27] removes the perfect model assumption by allowing a model error . Under the assumption that the model errors are normally distributed, , the weak constraint 4DVar solution is the unconstrained minimizer of the cost function
The control variables are the states of the system at all times in the assimilation window.
In this paper we focus on the strong constraint formulation (3). The minimizer of (3) is computed iteratively using gradientbased numerical optimization methods. Firstorder adjoint models provide the gradient of the cost function [8], while secondorder adjoint models provide the Hessianvector product (e.g., for Newtontype methods). The methodology for building and using various adjoint models for optimization, sensitivity analysis, and uncertainty quantification is discussed in [9, 26]. Various strategies to improve the the 4DVar data assimilation system are described in [10]. The procedure to estimate the impact of observation and model errors is developed in [22, 24]. A framework to perform derivative free variational data assimilation using the trustregion framework is given in [25].
The iterative solution of (3) is highly sequential: first, one iteration follows the other; next, the forward and adjoint models are run sequentially forward and backward in time, respectively. In order to reveal additional parallelism the solution to 4DVar problem is is approached using the augmented Lagrangian framework. In this framework, the assimilation window is divided into multiple subintervals, and the model constraints are explicitly imposed at the boundaries. This approach bears similarities with the Parareal approach that exploits time parallelism in the solution of ordinary differential equations [23].
3 4DVar solution by the augmented Lagrangian approach
The 4DVar cost function (2) is minimized subject to the generic model constraints (1). The model equations can also be written as
(6) 
where represents the model solution operator that propagates the state at to the state at . The minimizer of (2) under the constraints (6) is the unconstrained minimizer of the Lagrangian
(7) 
To expose time parallelism the assimilation window is divided into subintervals, namely,
(8) 
The forward model and adjoint model states at the interval boundaries are denoted by
(9) 
respectively. We denote by the optimal solution and by the optimal value of the Lagrange multiplier in (7).
The augmented Lagrangian [31, Section 17.3] associated with (2) and (6) reads:
where ’s are error scaling matrices. This is the (regular) Lagrangian for the problem that minimizes the cost function
subject to the model constraint (6). The constrained minimization of (3) is equivalent to(3) since the additional term is zero along the constraints. Note that (3) is a constrained minimization problem, unlike (2), which is unconstrained.
The original 4DVar problem in (3) is solved in the augmented Lagrangian framework by performing a sequence of unconstrained minimizations
(12) 
If then , and the solution error decreases with increasing [31, Section 17.3].
The optimization proceeds in cycles of inner and outer iterations. Inner iterations solve the optimization problem (12) for particular values of and . After each solution of (12) the outer iteration is completed by updating the Lagrange multiplier approximation and the penalty parameter, as follows:
(13a)  
(13b) 
The penalty parameter is progressively increased by a constant in order to impose the model constraints (6). Different other strategies to update and can be used [6].
3.1 Augmented Lagrangian optimization
Figure 1 illustrates the convergence process of the augmented Lagrangian 4DVar. The assimilation window is divided into multiple subintervals (8). The control vector for the optimization process contains the state vector dat the beginning of each of the subintervals (8). The initial value of the control vector contains the background states at the beginning of each of the subinterval, and is therefore a continuous curve. The outer iterations start with a small value of that only imposes the constraints (6) loosely. Consequently, the solutions during the first outer iterations show large discontinuities at the interval boundaries. Subsequently, is increased and as a result the constraints are satisfied accurately. This results in a smooth solution curve resembling closely the serial 4DVar solution.
4 The parallel algorithm
Most of the computational time required by the 4DVar solution is claimed by the multiple cost function and gradient evaluations. In the augmented Lagrangian 4DVar formulation (12)–(13) the forward and adjoint models can be run in parallel over subintervals, as explained next.
4.1 Parallelintime runs of the forward model
The value of the augmented Lagrangian cost function (3) can be computed in parallel. Specifically, on each subinterval in (8) a forward solution is computed starting from the initial value (9). The forward model runs on each subinterval can be carried out concurrently. The computational steps are detailed in Algorithm 1.
4.2 Parallelintime runs of the adjoint model
The gradient of the augmented Lagrangian (3) with respect to the forward model state is given by:
(14a)  
where the tangent linear operators of the observation and model solution operators are
respectively, and their transposes are the corresponding adjoint operators. Using the notation
the gradient (14) can be written as
4.3 Initial solution guess
The optimization needs to start with some initial guess for and . The initial guess for the state is obtained by performing a serial forward integration using the background value for initial conditions, . The initial value for the adjoint variable could be obtained by running once the adjoint model once serially along the background trajectory. In our experiments we choose the simpler, and less expensive, initialization .
4.4 Updating Lagrange multipliers
In order to accelerate the convergence of the optimization process we replace the standard updates (13) with the strategy proposed in [15]. The new update process uses information from the previous two iterations instead of just one iteration. The process takes the following steps:

Choose and set .

The updated Lagrange multiplier is obtained as follows:
It is important to note that above procedure requires the values of from two successive outer iterations, namely and .
5 Numerical experiments
We study the performance of the parallel implementation of augmented Lagrangian 4DVar algorithm using the Lorenz96 model with 40 variables [19] and a shallow water on the sphere model with 8,000 variables [29].
5.1 Lorenz96 model
The Lorenz96 model [19] is given by
(15) 
with periodic boundary conditions and the forcing term [19]. We use synthetic observations generated by perturbing the reference trajectory with normal noise with mean zero and standard deviation of of average magnitude of the reference solution. The background uncertainty is set to of average magnitude of the reference solution. The background and observation error covariance matrices are assumed to be diagonal. A vector of equidistant components ranging from to was integrated forward in time for 200 time steps and the final state is taken as a reference initial condition for the experiments.
5.2 Shallow water model on the sphere
The shallow water equations have been used extensively as a simple model of the atmosphere since they contain the essential wave propagation mechanisms found in general circulation models [29]. The shallow water equations in spherical coordinates are:
(16a)  
(16b)  
(16c) 
Here is the Coriolis parameter given by , where is the angular speed of the rotation of the Earth, is the height of the homogeneous atmosphere, and are the zonal and meridional wind components, respectively, and are the latitudinal and longitudinal directions, respectively, is the radius of the earth and is the gravitational constant. The space discretization is performed using the unstaggered TurkelZwas scheme [20, 16]. The discretized spherical grid has nlon=36 nodes in longitudinal direction and nlat=72 nodes in the latitudinal direction. The semidiscretization in space leads to a discrete model of the form (6). In (6) the zonal wind, meridional wind and the height variables are combined into the vector with . We perform the time integration using an adaptive timestepping algorithm. For a tolerance of the average time step size of the timeintegrator is 180 seconds. A reference initial condition is used to generate a reference trajectory.
Synthetic observation errors at various times are normally distributed with mean zero and a diagonal observation error covariance matrix with entries equal to for and components and for components. The values correspond to a standard deviation of for and components, and for component. We construct a flow dependent background error covariance matrix as described in [3, 2]. The standard deviation of the background errors for the height component is 2% of the average magnitude of the reference height component in the reference initial condition. The standard deviation of the background errors for the wind components is 15% of the average magnitude of the reference wind component in the reference initial condition.
5.3 Experimental setup
All numerical experiments are carried out in Matlab. The parallel implementations are developed using Matlab’s parallel toolbox. We compare the performance of the proposed parallel implementation with that of the standard 4DVar. The accuracy of numerical solutions is measured by the root mean square error (RMSE) with respect to a reference solution. The RMSE is given by
(17) 
where the reference solution and the analysis are propagated forward in time using the full model, and the difference is measured at all times throughout the assimilation window.
Computing the cost functions and gradients is the most important aspect of the 4DVar algorithm and hence it is necessary that their computations are scalable. To evaluate the gradient and cost function we carry out numerical integrations of the forward and adjoint models using MATLODE [4]. MATLODE is a Matlab version of FATODE, which was developed in Fortran [32]. The optimization is carried out using the LBFGSB solver [18] implemented in the Poblano optimization toolbox developed at Sandia National Laboratory [13].
5.4 Results with the Lorenz96 model
Figure 1 illustrates the convergence of augmented Lagrangian 4DVar iterations. The intermediate solutions are discontinuous at subinterval boundaries. The corresponding errors, with respect to the traditional 4DVar solution, are shown in Figure 2. They decrease quickly to zero, showing that the augmented Lagrangian 4DVar solution converges to the solution of traditional 4DVar.
The errors of the augmented Lagrangian and traditional 4DVar solutions with respect to the reference solution are shown in Figure 3. The reference solution is obtained by propagating the reference initial condition using the forward model in (15). The sequential 4DVar requires gradient and cost function evaluations, where as the parallel 4DVar requires a total of gradient and cost function evaluations for observations. The Weak scalability results are presented in Figure 4. As the length of the assimilation window increases, the number of subintervals increases, and so does the number of observations (one per subinterval). The number of cores on which the parallel algorithm is run increases such as to remain equal to the number of subintervals. The parallel algorithm is scalable in weak sense: the total computational time increases very slowly with an increased problem size. The most time consuming calculations are those of the cost function and gradient evaluations, which require running the forward and the adjoint models, respectively. The results shown in Figure 5 confirm the good weak scalability of the cost function and gradient computations.
5.5 Results with the the shallow water model
Figure 6 shows the weak scalability of cost function and gradient evaluations with the shallow water model. It can be seen that in both cases the parallel computational time is nearly constant with increasing problem size (number of subintervals) and a proportional increase in the number of cores (the number of cores is equal to the number of subintervals). Figure 7 presents the workprecision diagrams, i.e., the evolution of solution accuracy (RMSE) with increasing number of iterations (increasing CPU time). The initial iterates of the parallel 4DVar solution proceed rapidly, but afterwards the convergence slows down. At this stage the penalty parameter fairly large and the optimization problem becomes more difficult for the LBFGS algorithm. The performance can be improved by replacing LBFGS with algorithms specially tailored to solve optimization problems in the augmented Lagrangian framework [7]. An alternative strategy is to use a hybrid method: employ parallel 4DVar for several iterations in the beginning, then continue with traditional serial 4DVar. Here we perform two outer iterations with small values of and then use this solution as an initial guess for the serial 4DVar. This strategy improves the performance considerably as seen in Figure 7(a). Table 1 provides the computational times for parallel and serial 4DVar algorithms. The serial 4DVar for observations requires gradient and cost function evaluations, whereas the parallel 4DVar algorithm requires gradient and cost function evaluations. In the hybrid methodology, we perform iterations of parallel 4DVar and iterations of serial 4DVar for observations. The final RMSE over the assimilation window for both the serial 4DVar and hybrid methods is 125. We notice a steady increase in speedup as the problem size (number of subintervals) is increased. It is possible to further improve the performance of the parallel algorithm by using second derivative information in the form of Hessianvector products and employ Newtontype methods [9].
No. of subintervals  Serial time [sec.]  Parallel/Hybrid time [sec.]  Speedup 

5  8,515  10,834  0.786 
7  15,745  12,782  1.232 
9  22,277  12,971  1.756 
6 Conclusions and future work
This work presents an augmented Lagrangian framework to perform strongconstraint 4DVar data assimilation in parallel. The assimilation window is split in subintervals; cost function and gradient evaluations, which are the main components of the algorithm, are performed by running the forward and the adjoint model in parallel across different subintervals. To the best of our knowledge this is the first work that uses an augmented Lagrangian approach to data assimilation, and the first one to propose a timeparallel implementation of strongconstraint 4Dvar.
Future work will focus on tuning the optimization procedure to improve performance on large scale problems, e.g., data assimilation with the weather research and forecast model [28]. The size of the control variable in the augmented Lagrangian framework increases with the number of subintervals and can become a bottleneck for the optimization. One possible strategy to overcome this is to perform optimization on a coarse grid and use the projected solution as an initial guess for the fine grid. A natural extension to our methodology is to use the augmented Lagrangian framework to expose spacetime parallelism in the data assimilation problem such as to create more parallel tasks and improve the overall scalability. Space parallelism in a penalty formulation has been recently discussed in [12]. We will consider the use of optimization algorithms that are specifically tuned to work well in the augmented Lagrangian framework [7]. Next, Hessian information can be used to accelerate the convergence significantly when the iterates are close to minima. In order to implement this it is useful to explore the construction of second order adjoint models that compute the Hessianvector products in parallel.
Acknowledgements
This work was supported in part by the awards AFOSR FA9550–12–1–0293–DEF, AFOSR 12264006, NSF DMS–1419003, NSF CCF–1218454, and by the Computational Science Laboratory at Virginia Tech.
References
References
 [1] P. Aguiar, E. P. Xing, M. Figueiredo, N. A. Smith, and A. Martins. An augmented Lagrangian approach to constrained MAP inference. In Proceedings of the 28th International Conference on Machine Learning (ICML11), pages 169–176. ACM, 2011.
 [2] A. Attia, V. Rao, and A. Sandu. A hybrid MonteCarlo sampling smoother for four dimensional data assimilation. Ocean Dynamics, Submitted, 2014.
 [3] A. Attia, V. Rao, and A. Sandu. A sampling approach for four dimensional data assimilation. In Proceedings of the Dynamic Data Driven environmental System Science Conference, 2014.
 [4] A. Augustine and A. Sandu. MATLODE: A MATLAB ODE solver and sensitivity analysis toolbox. ACM TOMS, Submitted, 2015.
 [5] A. Aved, F. Darema, and E. Blasch. Dynamic data driven application systems. www.1dddas.org, 2014.
 [6] E. G. Birgin and J. M. Martínez. Improving ultimate convergence of an augmented Lagrangian method. Optimization Methods and Software, 23(2):177–195, 2008.
 [7] E. G. Birgin and J. M. Martínez. Practical augmented Lagrangian methods for constrained optimization, volume 10. SIAM, 2014.
 [8] D. G. Cacuci, M. IonescuBujor, and I. M. Navon. Sensitivity and uncertainty analysis, volume II: applications to largescale systems, volume 2. CRC Press, 2005.
 [9] A. Cioaca, M. Alexe, and A. Sandu. Secondorder adjoints for solving PDEconstrained optimization problems. Optimization methods and software, 27(45):625–653, 2012.
 [10] A. Cioaca and A. Sandu. An optimization framework to improve 4DVar data assimilation system performance. Journal of Computational Physics, 275:377–389, 2014.
 [11] R. Daley. Atmospheric data analysis, volume 2. Cambridge University Press, 1993.
 [12] L. D’Amore, R. Arcucci, L. Carracciuolo, and A. Murli. A scalable approach for variational data assimilation. Journal of Scientific Computing, 61(2):239–257, 2014.
 [13] D. M. Dunlavy, T. G. Kolda, and E. Acar. Poblano v1.0: A MATLAB toolbox for gradientbased optimization. Technical report, Sandia National Laboratories, 2010.
 [14] M. Fisher. Parallelization of 4DVar in the time dimension using a saddlepoint algorithm. Slides., 2013.
 [15] B. He and X. Yuan. On the acceleration of augmented Lagrangian method for linearly constrained optimization. optimization online, 2010.
 [16] M. Navon I and R. De Villiers. The application of the TurkelZwas explicit large timestep scheme to a hemispheric barotropic model with constraint restoration. Monthly weather review, 115(5):1036–1052, 1987.
 [17] E. Kalnay. Atmospheric modeling, data assimilation, and predictability. Cambridge University Press, 2003.
 [18] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(13):503–528, 1989.
 [19] E. N. Lorenz. Predictabilty: A problem partly solved. In Proceedings of Seminar on Predictability, pages 40–58, 1996.
 [20] I. M. Navon and J. Yu. Exshall: A TurkelZwas explicit large timestep FORTRAN program for solving the shallowwater equations in spherical coordinates. Computers and Geosciences, 17(9):1311 – 1343, 1991.
 [21] J. Rantakokko. Strategies for parallel variational data assimilation. Parallel Computing, 23(13):2017 – 2039, 1997.
 [22] V. Rao and A. Sandu. Aposteriori error estimates for inverse problems. SIAM/ASA Journal on Uncertainty Quantification, Submitted, 2014.
 [23] V. Rao and A. Sandu. An adjointbased scalable algorithm for timeparallel integration. Journal of Computational Science, 5(2):76 – 84, 2014. Empowering Science through Computing + BioInspired Computing.
 [24] V. Rao and A. Sandu. A posteriori error estimates for dddas inference problems. Procedia Computer Science, 29:1256–1265, 2014.
 [25] E. D. N. Ruiz and A. Sandu. A derivativefree trust region framework for variational data assimilation. Journal of Computational and Applied Mathematics, 2015.
 [26] A. Sandu, D. N. Daescu, G. R. Carmichael, and T. Chai. Adjoint sensitivity analysis of regional air quality models. Journal of Computational Physics, 204(1):222–252, 2005.
 [27] Adrian Sandu and Tianfeng Chai. Chemical data assimilation—an overview. Atmosphere, 2(3):426–463, 2011.
 [28] William C Skamarock, Joseph B Klemp, Jimy Dudhia, David O Gill, Dale M Barker, Wei Wang, and Jordan G Powers. A description of the advanced research wrf version 2. Technical report, DTIC Document, 2005.
 [29] A. StCyr, C. Jablonowski, J. M. Dennis, H. M. Tufo, and S. J. Thomas. A comparison of two shallow water models with nonconforming adaptive grids. Monthly Weather Review, 136:1898–1922, 2008.
 [30] Y. Trémolet and F.X. Le Dimet. Parallel algorithms for variational data assimilation and coupling models. Parallel Computing, 22(5):657 – 674, 1996.
 [31] S. J. Wright and J. Nocedal. Numerical optimization, volume 2. Springer New York, 1999.
 [32] H. Zhang and A. Sandu. FATODE: A library for forward, adjoint, and tangent linear integration of ODEs. SIAM Journal on Scientific Computing, 36(5):C504–C523, 2014.