1 Introduction

Predicting Flow Reversals in a Computational Fluid Dynamics Simulated Thermosyphon using Data Assimilation

Abstract

A thermal convection loop is a circular chamber filled with water, heated on the bottom half and cooled on the top half. With sufficiently large forcing of heat, the direction of fluid flow in the loop oscillates chaotically, forming an analog to the Earth’s weather. As is the case for state-of-the-art weather models, we only observe the statistics over a small region of state space, making prediction difficult. To overcome this challenge, data assimilation methods, and specifically ensemble methods, use the computational model itself to estimate the uncertainty of the model to optimally combine these observations into an initial condition for predicting the future state. First, we build and verify four distinct DA methods. Then, a computational fluid dynamics simulation of the loop and a reduced order model are both used by these DA methods to predict flow reversals. The results contribute to a testbed for algorithm development.

\defensedate

\titlecontents

section [3.8em] \contentslabel2.3em \titlerule*[1pc].\contentspage

\makeacceptance{dedication}

in dedication to

my parents, Donna and Kevin Reagan, for their unwavering support.

{acknowledgements}

I would like to thank Professors Chris Danforth and Peter Dodds for their outstanding advising, as well as Professors Darren Hitt and Yves Dubief for their valuable feedback. This work was made possible by funding from the Mathematics and Climate Research Network and Vermont NASA EPSCoR program. I would like to thank my fellow students for all of their help and toleration through the past two years. And finally I would like to thank my girlfriend Sam Spisiak, whose love makes this thesis worth writing.

\mainmatter\titlecontents

section[3.5em]\titlerule*[1pc].\contentspage \titlecontentssubsection[5em]

Chapter 1 Introduction

In this chapter we explore the current state of numerical weather prediction, in particular data assimilation, along with an introduction to computational fluid dynamics and reduced order experiments.

1.1 Introduction

Prediction of the future state of complex systems is integral to the functioning of our society. Some of these systems include weather \shortciteweather-violence2013, health \shortciteginsberg2008detecting, the economy \shortcitesornette2006predictability, marketing \shortciteasur2010predicting and engineering \shortcitesavely 1972. For weather in particular, this prediction is made using supercomputers across the world in the form of numerical weather model integrations taking our current best guess of the weather into the future. The accuracy of these predictions depend on the accuracy of the models themselves, and the quality of our knowledge of the current state of the atmosphere.

Model accuracy has improved with better meteorological understanding of weather processes and advances in computing technology. To solve the initial value problem, techniques developed over the past 50 years are now broadly known as data assimilation. Formally, data assimilation is the process of using all available information, including short-range model forecasts and physical observations, to estimate the current state of a system as accurately as possible \shortciteyang2006.

We employ a toy climate experiment as a testbed for improving numerical weather prediction algorithms, focusing specifically on data assimilation methods. This approach is akin to the historical development of current methodologies, and provides a tractable system for rigorous analysis. The experiment is a thermal convection loop, which by design simplifies our problem into the prediction of convection. The dynamics of thermal convection loops have been explored under both periodic \shortcitekeller1966 and chaotic \shortcitewelander1967,creveling1975stability,gorman1984,gorman1986,ehrhard1990dynamical,yuen1999,jiang2003,burroughs2005reduced,desrayaud2006numerical,yang2006,ridouane2010 regimes. A full characterization of the computational behaivor of a loop under flux boundary conditions by Louisos et. al. describes four regimes: chaotic convection with reversals, high Ra aperiodic stable convection, steady stable convection, and conduction/quasi-conduction \shortcitelouisos2013. For the remainder of this work, we focus on the chaotic flow regime.

Computational simulations of the thermal convection loop are performed with the open-source finite volume C++ library OpenFOAM \shortcitejasak2007. The open-source nature of this software enables its integration with the data assimilation framework that this work provides.

1.2 History of NWP

The importance of Vilhelm Bjerknes in early developments in NWP is described by Thompson’s “Charney and the revival of NWP” \shortcitethompson1990:

It was not until 1904 that Vilhelm Bjerknes - in a remarkable manifesto and testament of deterministic faith - stated the central problem of NWP. This was the first explicit, coherent, recognition that the future state of the atmosphere is, in principle, completely determined by its detailed initial state and known boundary conditions, together with Newton’s equations of motion, the Boyle-Charles-Dalton equation of state, the equation of mass continuity, and the thermodynamic energy equation. Bjerknes went further: he outlined an ambitious, but logical program of observation, graphical analysis of meterological data and graphical solution of the governing equations. He succeeded in persuading the Norwegians to support an expanded network of surface observation stations, founded the famous Bergen School of synoptic and dynamic meteorology, and ushered in the famous polar front theory of cyclone formation. Beyond providing a clear goal and a sound physical approach to dynamical weather prediction, V. Bjerknes instilled his ideas in the minds of his students and their students in Bergen and in Oslo, three of of whom were later to write important chapters in the development of NWP in the US (Rossby, Eliassen, and Fjörtoft).

It then wasn’t until 1922 that Lewis Richardson suggested a practical way to solve these equations. Richardson used a horizontal grid of about 200km, and 4 vertical layers in a computational mesh of Germany and was able to solve this system by hand \shortciterichardson1965. Using the observations available, he computed the time derivative of pressure in the central box, predicting a change of 146 hPa, whereas in reality there was almost no change. This discrepancy was due mostly to the fact that his initial conditions (ICs) were not balanced, and therefore included fast-moving gravity waves which masked the true climate signal \shortcitekalnay2003. Regardless, had the integration had continued, it would have “exploded,” because together his mesh and timestep did not satisfy the Courant-Friedrichs-Lewy (CFL) condition, defined as follows. Solving a partial differential equation using finite differences, the CFL requires the Courant number to be less than one for a convergent solution. The dimensionless Courant number relates the movement of a quantity to the size of the mesh, written for a one-dimensional problem as for the velocity, the timestep, and the mesh size.

Early on, the problem of fast traveling waves from unbalanced IC was solved by filtering the equations of motion based on the quasi-geostrophic (slow time) balance. Although this made running models feasible on the computers available during WW2, the incorporation of the full equations proved necessary for more accurate forecasts.

The shortcomings of our understanding of the subgrid-scale phenomena do impose a limit on our ability to predict the weather, but this is not the upper limit. The upper limit of predictability exists for even a perfect model, as Edward Lorenz showed in with a three variable model found to exhibit sensitive dependence on initial conditions \shortcitelorenz1963.

Numerical weather prediction has since constantly pushed the boundary of computational power, and required a better understanding of atmospheric processes to make better predictions. In this direction, we make use of advanced computational resources in the context of a simplified atmospheric experiment to improve prediction skill.

1.3 IC determination: data assimilation

Areas as disparate as quadcopter stabilization \shortciteachtelik2009visual to the tracking of ballistic missle re-entry \shortcitesiouris1997tracking use data assimilation. The purpose of data assimilation in weather prediction is defined by Talagrand as “using all the available information, to determine as accurately as possible the state of the atmospheric (or oceanic) flow.” \shortcitetalagrand1997assimilation One such data assimilation algorithm, the Kalman filter, was originally implemented in the navigation system of Apollo program \shortcitekalman1961new,savely1972.

Data assimilation algorithms consist of a 3-part cycle: predict, observe, and assimilate. Formally, the data assimilation problem is solved by minimizing the initial condition error in the presence of specific constraints. The prediction step involves making a prediction of the future state of the system, as well as the error of the model, in some capacity. Observing systems come in many flavors: rawindsomes and satellite irradiance for the atmosphere, temperature and velocity reconstruction from sensors in experiments, and sampling the market in finance. Assimilation is the combination of these observations and the predictive model in such a way that minimizes the error of the initial condition state, which we denote the analysis.

The difficulties of this process are multifaceted, and are addressed by parameter adjustments of existing algorithms or new approaches altogether. The first problem is ensuring the analysis respects the core balances of the model. In the atmosphere this is the balance of Coriolis force and pressure gradient (geostrophic balance), and in other systems this can arise in the conservation of physical (or otherwise) quantities.

In real applications, observations are made with irregular spacing in time and often can be made much more frequently than the length of data assimilation cycle. However the incorporation of more observations will not always improve forecasts: in the case where these observations can only be incorporated at the assimilation time, the forecasts can degrade in quality \shortcitekalnay20074. Algorithms which are capable of assimilating observations at the time they are made have a distinct advantage in this respect. In addition, the observations are not always direct measurements of the variables of the model. This can create non-linearities in the observation operator, the function that transforms observations in the physical space into model space. In Chapter 2 we will see where a non-linear operator complicates the minimization of the analysis error.

As George Box reminds us, “all models are wrong” \shortcitebox1978statistics. The incorporation of model bias into the assimilation becomes important with imperfect models, and although this issue is well-studied, it remains a concern \shortciteallgaier2012empirical.

The final difficulty we need to address is the computational cost of the algorithms. The complexity of models creates issues for both the computation speed and storage requirements, an example of each follows. The direct computation of the model error covariance, integrating a linear tangent model which propagates errors in the model forward in time, amounts to model integrations where is the number of variables in the model. For large systems, e.g. for global forecast models, this is well beyond reach. Storing a covariance matrix at the full rank of model variables is not possible even for a model of the experiment we will address, which has 240,000 model variables. Workarounds for these specific problems include approximating the covariance matrix (e.g. with a static matrix, or updating it using an ensemble of model runs), and reformulating the assimilation problem such that the covariance is only computed in observation space, which is typically orders of magnitude smaller.

1.4 NWP and fluid dynamics: computational models

As early as 1823, the basic equations governing flow of a viscous, heat-conducting fluid were derived from Newton’s law of motion \shortcitenavier,stokes. These are the Navier-Stokes equations. The equations of geophysical fluid dynamics describe the flow in Earth’s atmosphere, differing from the Navier-Stokes equations by incorporating the Coriolis effect and neglecting the vertical component of advection. In both cases, the principle equations are the momentum equations, continuity equation and energy equation.

As stated, our interest here is not the accuracy of computational fluid dynamic simulations themselves, but rather their incorporation into the prediction problem. For this reason, we limit the depth of exploration of available CFD models.

We chose to perform simulations in the open-source C++ library OpenFOAM. Because the code is open-source, we can modify it as necessary for data assimilation, and of particular importance becomes the discretized grid.

There are two main directions in which we do explore the computational model: meshing and solving. To solve the equations of motion numerically, we first must discretize them in space by creating the computational grid, which we refer to as a mesh. Properly designing and characterizing the mesh is the basis of an accurate solution. Still, without the use of (notoriously unreliable) turbulence models, and without a grid small enough to resolve all of the scales of turbulence, we inevitably fail to capture the effects of sub-grid scale phenomena. In weather models, where grid sizes for global models are in the 1-5km range horizontally, this is a known problem and is an open area of research. In addition to the laminar equations, to account for turbulence, we explore include the model, model, and large eddy simulations (in 3-dimensions only).

Secondly, the choice of solving technique is a source of numerical and algorithmic uncertainty. The schemes which OpenFOAM has available are dissapative and are limited in scope. That is, small perturbations experience damping (such as the analysis perturbations to an ensemble members in ensemble Kalman filters). Without consideration this problem, the ensemble does not adequately sample the uncertainty of the model to obtain an approximation of the model error growth.

We investigate these meshing and solving techniques in greater detail in Chapter 3.

1.5 Toy climate experiments

Mathematical and computational models span from simple to kitchen-sink, depending on the phenomenon of interest. Simple, or “toy”, models and experiments are important to gain theoretical understanding of the key processes at play. An example of such a conceptual model used to study climate is the energy balance model, Figure 1.1.

With a simple model, it is possible to capture the key properties of simple concepts (e.g. the greenhouse effect in the above energy balance model). For this reason they have been, and continue to be, employed throughout scientific disciplines. The “kitchen-sink” models, like the Earth System Models (ESMs) used to study climate, incorporate all of the known physical, chemical, and biological processes for the fullest picture of the system. In the ESM case, the simulations represent our computational laboratory.

We study a model that is more complex than the EBM, but is still analytically tractable, known as the Lorenz 1963 system (Lorenz 63 for short). In 1962, Barry Saltzmann attempted to model convection in a Rayleigh-Bérnard cell by reducing the equations of motion into their core processes \shortcitesaltzman1962finite. Then in 1963 Edward Lorenz reduced this system ever further to 3 equations, leading to his landmark discovery of deterministic non-periodic flow \shortcitelorenz1963. These equations have since been the subject of intense study and have changed the way we view prediction and determinism, remaining the simple system of choice for examining nonlinear behaivor today [\citeauthoryearKalnay, Li, Miyoshi, YANG, and BALLABRERA-POYKalnay et al.2007].

Thermal convection loops are one of the few known experimental implementations of Lorenz’s 1963 system of equations. Put simply, a thermal convection loop is a hula-hoop shaped tube filled with water, heated on the bottom half and cooled on the top half. As a non-mechanical heat pump, thermal convection loops have found applications for heat transfer in solar hot water heaters \shortcitebelessiotis2002analytical, cooling systems for computers \shortcitebeitelmal2002two, roads and railways that cross permafrost \shortcitecherry2006next, nuclear power plants \shortcitedetman1968thermos,beine1992integrated,kwant1992prism, and other industrial applications. Our particular experiment consists of a plastic tube, heated on the bottom by a constant temperature water bath, and cooled on top by room temperature air.

Varying the forcing of temperature applied to the bottom half of the loop, there are three experimentally accessible flow regimes of interest to us: conduction, steady convection, and unsteady convection. The latter regime exhibits the much studied phenomenon of chaos in its changes of rotational direction. This regime is analytically equivalent the Lorenz 1963 system with the canonical parameter values listed in Table B.6. Edward Lorenz discovered and studied chaos most of his life, and is said to have jotted this simple description on a visit to the University of Maryland [\citeauthoryearDanforthDanforth2009]:

Chaos: When the present determines the future, but the approximate present does not approximately determine the future.

In chapter 3, we present a derivation of Ehrhard-Müller Equations governing the flow in the loop following the work of Kameron Harris (2011). These equations are akin to the Lorenz 63 system, as we shall see, and the experimental implementation is described in more detail in the first section of Chapter 3.

In chapter 4 we present results for the predictive skill tests and summarize the successes of this work and our contributions to the field. We find that below a critical threshold of observations, predictions are useless, where useless forecasts are defined as those with RMSE greater than 70% of climatological variance. The skill of forecasts is also sensitive to the choice of covariance localization, which we discuss in more detail in chapter 4. And finally, we present “foamLab,” a test-bed for DA which links models, DA techniques, and localization in a general framework with the ability to extend these results and test experimental methods for DA within different models.

Chapter 2 Data Assimilation

The goal of this chapter is to introduce and detail the practical implementation of state-of-the-art data assimilation algorithms. We derive the general form for the problem, and then talk about how each filter works. Starting with the simplest examples, we formulate the most advanced 4-dimensional Variational Filter and 4-dimensional Local Ensemble Transform Kalman Filter to implement with a computational fluid dynamics model. For a selection of these algorithms, we illustrate some of their properties with a MATLAB implementation of the Lorenz 1963 three-variable model:

 dxdt =σ(y−x) dydt =ρx−y−xz dzdt =xy−βz

The cannonical choice of and , produce the well known butterfly attractor, is used for all examples here (Figure 2.1).

2.1 Basic derivation, OI

We begin by deriving the equations for optimal interpolation (OI) and for the Kalman Filter with one scalar observation. The goal of our derivation is to highlight which statistical information is useful, which is necessary, and what assumptions are generally made in the formulation of more complex algorithms.

Beginning with the one variable , which could represent the horizontal velocity at one grid-point, say we make two independent observations of : and . If the true state of is given by , then the goal is to produce an analysis of which we call , that is our best guess of .

Denote the errors of by respectively, such that

 U1,2=Ut+ϵ1,2.

For starters, we assume that we do not know . Next, assume that the errors vary around the truth, such that they are unbiased. This is written as

 E[U1,2−Ut]=E[ϵ1,2]=0.

As of yet, our best guess of would be the average , and for any hope of knowing well we would need many unbiased observations at the same time. This is unrealistic, and to move forward we assume that we know the variance of each . With this knowledge, which we write

 E[ϵ21,2]=σ21,2. (2.1)

Further assume that these errors are uncorrelated

 E[ϵ1ϵ1]=0, (2.2)

which in the case of observing the atmosphere, is hopeful.

We next attempt a linear fit of these observations to the truth, and equipped with the knowledge of observation variances we can outperform an average of the observations. For , our best guess of , we write this linear fit as

 Ua=α1U1+α2U2. (2.3)

If is to be our best guess, we desire it to unbiased

 E[Ua]=E[Ut]⇒α1+α2=1

and to have minimal error variance .

At this point, note all of the assumptions that we have made: independent observations, unbiased instruments, knowledge of error variance, and uncorrelated observation errors. Of these, knowing the error variances (and in multiple dimensions, error covariance matrices) will become our focus, and better approximations of these errors have been largely responsible for recent improvements in forecast skill.

Solving 2.3, by substituting , we have

 σ2a =E[(α1(U1−Ut)+(1−α1)(U2−Ut))2] =α21E[ϵ21]+(1−α21)E[ϵ22]+2α1(1−α1)E[ϵ1ϵ2] =α21σ21+(1−α21)σ22

To minimize with respect to we compute of the above and solve for 0 to obtain

 α1=σ222(σ22−σ21)

and similarly for .

We can also state the relationship for as a function of by substituting our form for into the second step of 2.3 to obtain:

 1σ2a=1σ21+1σ22. (2.4)

This last formula says that given accurate statistical knowledge of , the precision of the analysis is the sum of the precision of the measurements.

We can now formulate the simplest data assimilation cycle by using our forecast model to generate a background forecast and setting . We then assimilate the observations by setting and solving for the analysis velocity as

 Ua=Ub+W(Uo−Ub). (2.5)

The difference is commonly reffered to as the “innovation” and represents the new knowledge from the observations with respect to the background forecast. The optimal weight is then given by 2.4 as

 W=σ2bσ2b−σ2o=σ2b(σ2b−σ2o)−1. (2.6)

The analysis error variance and precision are defined as before, and following Kalnay (2003) for convenience we can rewrite this in terms of as:

 σ2a=(1−W)σ2b. (2.7)

We now generalize the above description for models of more than one variable to obtain the full equations for OI. This was originally done by Aerology et al (1960), and independently by Gandin (1965). The following notation will be used for the OI and the other multivariate filters. Consider a field of grid points, where there may be variables at each point for pressure, hydrostatic pressure, temperature, and three dimensions of velocity. We combine all the variables into one list of length where is the product of the number of grid points and the number of model variables. This list is ordered by grid point and then by variable (e.g. for a field with just variables pressure and temperature ). Denote the observations as , where the ordering is the same as but the dimension is much smaller (e.g. by a factor of for NWP) and the observations are not strictly at the locations of or even direct measurements of the variables in . In our experiment, temperature measurements are in the model variable , but are understood as the average of the temperature of the grid points surrounding it within some chosen radius. Examples of indirect and spatially heterogeneous observations in NWP are radar reflectivities, Doppler shifts, satellite radiances, and global positioning system atmospheric refractivities \shortcitekalnay2003.

We can now write the analysis with similar notation as for the scalar case with the addition of the observation operator which transforms model variables into observation space:

 xa =xb+Wd (2.8) d =y0−H(xb) (2.9) ϵa =xt−xa. (2.10)

Note that is not required to be linear, but here we write as an matrix for a linear transformation.

Again, our goal is to obtain the weight matrix that minimizes the analysis error . In Kalman Filtering, the weight matrix is called the gain matrix . First, we assume that the background and observations are unbiased.

 E[ϵb] =E[xb]−E[xt]=0 E[ϵo] =E[yo]−E[H(xt)]=0.

This a reasonable assumption, and if it does not hold true, we can account for this as a part of the analysis cycle \shortcitedee1998data.

Define the forecast error covariances as

 Pa =A=E[ϵaϵTa] (2.11) Pb =B=E[ϵbϵTb] (2.12) Po =R=E[ϵoϵTo]. (2.13)

Linearize the observation operator

 H(x+∂x)=H(x)+H∂x (2.14)

where are the entries of . Since the background is typically assumed to be a reasonably good forecast, it is convenient to write our equations in terms of changes to . The observational increment is thus

 d =yo−H(xb)=yo−H(xt+(xb−xt)) (2.15) =yo−H(xt)H(xb−xt)=ϵo−Hϵb. (2.16)

To solve for the weight matrix assume that we know and , and that their errors are uncorrelated:

 E[ϵoϵTb]=E[ϵbϵTo]=0. (2.17)

We now use the method of best linear unbiased estimation (BLUE) to derive in the equation , which approximates . Since and the well known BLUE solution to this simple equation, the that minimizes is given by

 W =E[(xt−xb)(yo−Hxb)T](E[(yo−Hxb)(y−Hxb)T])−1 =E[(−ϵb)(ϵo−Hϵb)T](E[(ϵo−Hϵb)(ϵo−Hϵb)T])−1 Missing \left or extra \right =(E[−ϵbϵTo]+E[ϵbϵTb]HT)⋅ Missing \left or extra \right =BHT(R+HBHT)−1

Now we solve for the analysis error covariance using from above and . This is (with the final bit of sneakiness using ):

 Pa =E[ϵaϵTa]=E[(ϵb+Wd)(ϵb+Wd)T] =E[(ϵb+W(ϵo−Hϵb))(ϵb+W(ϵo−Hϵb))T] =E[(ϵb+W(ϵo−Hϵb))(ϵTb+(ϵTO−ϵTbHT)WT)] =E[ϵbϵTb+ϵb(ϵTo−ϵTbHT)WT+W(ϵo−Hϵb)ϵTb+ W(ϵo−Hϵb)(ϵTO−ϵTbHT)WT] =E[ϵbϵTb]+E[ϵb(ϵTo−ϵTbHT)]WT+WE[(ϵo−Hϵb)ϵTb]+ Missing \left or extra \right =B−BHTWT−WHB+ WE[ϵoϵTo−ϵoϵTbHT−HϵbϵTo+HϵbϵTbHT]WT =B−BHTWT−WHB+WRW+WHBHTWT =B−WHB+(WR−BHT)WT+WHBHTWT =(I−WH)B

Note that in this formulation, we have assumed that (the observation error covariance) accounts for both the instrumental error and the “representativeness” error, a term coined by Andrew Lorenc to describe the fact that the model does not account for subgrid-scale variability \shortcitelorenc1981global. To summarize, the equations for OI are

 W =BHT(R+HBHT)−1 (2.18) Pa =(I−WH)B (2.19)

As a final note, it is also possible to derive the “precision” of the analysis but in practice it is impractical to compute the inverse of the full matrix , so we omit this derivation. In general is also not possible to solve for for the whole model, and localization of the solution is achieved by considering each grid point and the observations within some set radius of this. For different models, this equates to different assumptions on the correlations of observations, which we do not consider here.

2.2 3D-Var

Simply put, 3D-Var is the variational (cost-function) approach to finding the analysis. It has been shown that 3D-var solves the same statistical problem as OI \shortcitelorenc1986analysis. The usefulness of the variational approach comes from the computational efficiency, when solved with an iterative method. This is why the 3D-Var analysis scheme became the most widely used data assimilation technique in operational weather forecasting, only to be succeeded by 4D-Var today.

The formulation of the cost function can be done in a Bayesian framework \shortcitepurser1984new or using maximum likelihood approach (below). First, it is not difficult to see that solving for for where we define as

 J(U)=12[(U−U1)2σ21+(U−U2)2σ22] (2.20)

gives the same solution as obtained through least squares in our scalar assimilation. The formulation of this cost function obtained through the ML approach is simple \shortciteedwards1984likelihood. Consider the likelihood of true value given an observation with standard deviation :

 Lσ1(U|U1)=pσ1(U1|U)=1√2πσ1e−(T1−T)22σ21. (2.21)

The most likely value of given two observations is the value which maximizes the joint likelihood (the product of the likelihoods). Since the logarithm is increasing, the most likely value must also maximize the logarithm of the joint likelihood:

 maxUlnLσ1,σ2(U|U1,U2)=maxU[a−(U−U1)2σ21−(U−U2)2σ22] (2.22)

and this is equivalent to minimizing the cost function above.

Jumping straight the multi-dimensional analog of this, a result of Andrew Lorenc \shortcitelorenc1986analysis, we define the multivariate 3D-Var as finding the that minimizes the cost function

 J(x)=(x−xb)TB−1(x−xb)+(yo+H(x))TR(yo−H(x)). (2.23)

The formal minimization of can be obtained by solving for , and the result from Eugenia Kalnay is \shortcitekalnay2003:

 xa=xb+(B−1+HTR−1H)−1HTR−1(yo−H(xb)). (2.24)

The principle difference between the Physical Space Analysis Scheme (PSAS) and 3D-Var is that the cost function is solved in observation space, because the formulation is so similar we do not derive it here. If there are less observations than there are model variables, the PSAS is a more efficient method.

Finally we note for 3D-Var that operationally can be estimated with some flow dependence with the National Meteorological Center (NMC) method, i.e.

 B≈αE[(xf(48h)−xf(24h))(xf(48h)−xf(24h))T] (2.25)

with on the order of 50 different short range forecasts. Up to this point, we have assumed knowledge of a static model error covariance matrix , or estimated it using the above NMC method. In the following sections, we explore advanced methods that compute or estimate a changing model error covariance as part of the data assimilation cycle.

2.3 Extended Kalman Filter

The Extended Kalman Filter (EKF) is the “gold standard” of data assimilation methods \shortcitekalnay2003. It is the Kalman Filter “extended” for application to a nonlinear model. It works by updating our knowledge of error covariance matrix for each assimilation window, using a Tangent Linear Model (TLM), whereas in a traditional Kalman Filter the model itself can update the error covariance. The TLM is precisely the model (written as a matrix) that transforms a perturbation at time to a perturbation at time , analytically equivalent to the Jacobian of the model. Using the notation of Kalnay [\citeauthoryearKalnayKalnay2003], this amounts to making a forecast with the nonlinear model , and updating the error covariance matrix with the TLM , and adjoint model

 xf(ti) =Mi−1[xa(ti−1)] Pf(ti) =Li−1Pa(ti−1)LTi−1+Q(ti−1)

where is the noise covariance matrix (model error). In the experiments with Lorenz 63 presented in this section, since our model is perfect. In NWP, must be approximated, e.g. using statistical moments on the analysis increments \shortcitedanforth2007estimating,li2009accounting.

The analysis step is then written as (for the observation operator):

 xa(ti) =xf(ti)+Kidi (2.26) Pa(ti) =(I−KiHi)Pf(ti) (2.27)

where

 di=yoi−H[xf(ti)]

is the innovation. The Kalman gain matrix is computed to minimize the analysis error covariance as

 Ki=Pf(ti)HTi[Ri+HiPf(ti)HT]−1

where is the observation error covariance.

Since we are making observations of the truth with random normal errors of standard deviation , the observational error covariance matrix is a diagonal matrix with along the diagonal.

The most difficult, and most computationally expensive, part of the EKF is deriving and integrating the TLM. For this reason, the EKF is not used operationally, and later we will turn to statistical approximations of the EKF using ensembles of model forecasts. Next, we go into more detail regarding the TLM.

2.3.1 Coding the TLM

The TLM is the model which advances an initial perturbation at timestep to a final perturbation at timestep . The dynamical system we are interested in, Lorenz ’63, is given as a system of ODE’s:

 dxdt=F(x).

We integrate this system using a numerical scheme of our choice (in the given examples we use a second-order Runge-Kutta method), to obtain a model discretized in time.

 x(t)=M[x(t0)].

Introducing a small perturbation , we can approximate our model applied to with a Taylor series around :

 M[x(t0)+y(t0)] =M[x(t0)]+∂M∂xy(t0)+O[y(t0)2] Extra open brace or missing close brace

We can then solve for the linear evolution of the small perturbation as

 dydt=Jy (2.28)

where is the Jacobian of . We can solve the above system of linear ordinary differential equations using the same numerical scheme as we did for the nonlinear model.

One problem with solving the system of equations given by Equation 2.28 is that the Jacobian matrix of discretized code is not necessarily identical to the discretization of the Jacobian operator for the analytic system. This is a problem because we need to have the TLM of our model , which is the time-space discretization of the solution to . We can apply our numerical method to the to obtain explicitly, and then take the Jacobian of the result. This method is, however, prohibitively costly, since Runge-Kutta methods are implicit. It is therefore desirable to take the derivative of the numerical scheme directly, and apply this differentiated numerical scheme to the system of equations to obtain the TLM. A schematic of this scenario is illustrated in Figure 2.3. To that the derivative of numerical code for implementing the EKF on models larger than 3 dimensions (i.e. global weather models written in Fortan), automatic code differentiation is used \shortciteautodiff1981.

To verify our implementation of the TLM, we propagate a small error in the Lorenz 63 system and plot the difference between that error and the TLM predicted error, for each variable (Figure 2.4).

2.4 Ensemble Kalman Filters

The computational cost of the EKF is mitigated through the approximation of the error covariance matrix from the model itself, without the use of a TLM.

One such approach is the use of a forecast ensemble, where a collection of models (ensemble members) are used to statistically sample model error propagation. With ensemble members spanning the model analysis error space, the forecasts of these ensemble members are then used to estimate the model forecast error covariance. In the limit of infinite ensemble size, ensemble methods are equivalent to the EKF \shortciteevensen2003ensemble. There are two prevailing methodologies for maintaining independent ensemble members: (1) adding errors to the observation used by each forecast and assimilating each ensemble individually or (2) distributing ensemble initial conditions from the analysis prediction. In the first method, the random error added to the observation can be truly random \shortciteharris2011predicting or be drawn from a distribution related to the analysis covariance. The second method distributes ensemble members around the analysis forecast with a distribution drawn to sample the analysis covariance deterministically. It has been show that the perturbed observation approach introduces additional sampling error that reduces the analysis covariance accuracy and increases the probability of of underestimating analysis error covariance \shortcitehamill2001distance. For the examples given, we demonstrate both methods.

With a finite ensemble size this method is only an approximation and therefore in practice it often fails to capture the full spread of error. To better capture the model variance, additive and multiplicative inflation factors are used to obtain a good estimate of the error covariance matrix (Section 2.6). The spread of ensemble members in the variable of the Lorenz model, as distance from the analysis, can be seen in Figure 2.5.

The only difference between this approach and the EKF, in general, is that the forecast error covariance is computed from the ensemble members, without the need for a tangent linear model:

 Extra open brace or missing close brace

In computing the error covariance from the ensemble, we wish to add up the error covariance of each forecast with respect to the mean forecast. But this would underestimate the error covariance, since the forecast we’re comparing against was used in the ensemble average (to obtain the mean forecast). Therefore, to compute the error covariance matrix for each forecast, that forecast itself is excluded from the ensemble average forecast.

We can see the classic spaghetti of the ensemble with this filter implemented on Lorenz 63 in Figure 2.6.

We denote the forecast within an ensemble filter as the average of the individual ensemble forecasts, and an explanation for this choice is substantiated by Burgers \shortciteburgers1998analysis. The general EnKF which we use is most similar to that of Burgers. Many algorithms based on the EnKF have been proposed and include the Ensemble Transform Kalman Filter (ETKF) \shortciteott2004local, Ensemble Analysis Filter (EAF) \shortciteanderson2001new, Ensemble Square Root Filter (EnSRF) \shortcitetippett2003ensemble, Local Ensemble Kalman Filter (LEKF) \shortciteott2004local, and the Local Ensemble Transform Kalman Filter (LETKF) \shortcitehunt2007efficient. We explore some of these in the following sections. A comprehensive overview through 2003 is provided by Evensen \shortciteevensen2003ensemble.

2.4.1 Ensemble Transform Kalman Filter (ETKF)

The ETKF introduced by Bishop is one type of square root filter, and we present it here to provide background for the formulation of the LETKF \shortcitebishop2001adaptive. For a square root filter in general, we begin by writing the covariance matrices as the product of their matrix square roots. Because are symmetric positive-definite (by definition) we can write

 Pa=ZaZTa  ,   Pf=ZfZTf (2.29)

for the matrix square roots of respectively. We are not concerned that this decomposition is not unique, and note that must have the same rank as which will prove computationally advantageous. The power of the SRF is now seen as we represent the columns of the matrix as the difference from the ensemble members from the ensemble mean, to avoid forming the full forecast covariance matrix . The ensemble members are updated by applying the model to the states such that an update is performed by

 Zf=MZa. (2.30)

Per \shortcitetippett2003ensemble, the solution for can be solved by perturbed observations

 Za=(I−KH)Zf+KW (2.31)

where is the observation perturbation of Gaussian random noise. This gives the analysis with the expected statistics

 ⟨ZaZTa⟩ =(I−KH)Pf(I−KH)T+KRKT (2.32) =Pa (2.33)

but as mentioned earlier, this has been shown to decrease the resulting covariance accuracy.

In the SRF approach, the “Potter Method” provides a deterministic update by rewriting

 Pa =ZaZTa=(I−PfHT(R+HPfHT)−1H)Pf (2.34) =Zf(I−ZfHT(HZfZTfHT+R)−1HZf)ZTf (2.35) =Zf(I−VD−1ZT)ZTf. (2.36)

We have defined and . Now for the ETFK step, we use the Sherman-Morrison-Woodbury identity to show

 I−ZD−1ZT=(I+ZfHTR−1HZf)−1. (2.37)

The matrix on the right hand side is practical to compute because it is of dimension of the rank of the ensemble, assuming that the matrix inverse is available. This is the case in our experiment, as the observation are uncorrelated. The analysis update is thus

 Za=ZfC(Γ+I)−1/2, (2.38)

where is the eigenvalue decomposition of . From \shortcitetippett2003ensemble we note that the matrix of orthonormal vectors is not unique.

Then the analysis perturbation is calculated from

 Za=ZfXU, (2.39)

where and is an arbitrary orthogonal matrix.

In this filter, the analysis perturbations are assumed to be equal to the backgroud perturbations post-multiplied by a transformation matrix so that that the analysis error covariance satisfies

 Pa=(1−KH)Pf.

The analysis covariance is also written

 Pf=1P−1(Xa(Xa)T=Xb^A(Xb)T

where . The analysis perturbations are where .

To summarize, the steps for the ETKF are to (1) form , assuming is easy, and (2) compute its eigenvalue decomposition, and apply it to .

2.4.2 Local Ensemble Kalman Filter (LEKF)

The LEKF implements a strategy that becomes important for large simulations: localization. Namely, the analysis is computed for each grid-point using only local observations, without the need to build matrices that represent the entire analysis space. This localization removes long-distance correlations from and allows greater flexibility in the global analysis by allowing different linear combinations of ensemble members at different spatial locations \shortcitekalnay20074.

Difficulties are presented however, in the form of computational grid dependence for the localization procedure. The toroidal shape of our experiment, when discretized by OpenFOAM’s mesher and grid numbering optimized for computational bandwidth for solving, does not require that adjacent grid points have adjacent indices. In fact, they are decidedly non-adjacent for certain meshes, with every other index corresponding to opposing sides of the loops. We overcome this difficulty by placing a more stringent requirement on the mesh of localization of indexing, and extracting the adjacency matrix for use in filtering.

The general formulation of the LEKF by Ott \shortciteott2004local goes as follows, quoting directly:

1. Globally advance each ensemble member to the next analysis timestep. Steps 2-5 are performed for each grid point

2. Create local vectors from each ensemble member

3. Project that point’s local vectors from each ensemble member into a low dimensional subspace as represented by perturbations from the mean

4. Perform the data assimilation step to obtain a local analysis mean and covariance

5. Generate local analysis ensemble of states

6. Form a new global analysis ensemble from all of the local analyses

7. Wash, rinse, and repeat.

Figure 2.7 depicts this cycle in greater detail, with similar notations to that used thus far \shortciteott2004local.

In the reconstructed global analysis from local analyses, since each of the assimilations were performed locally, we cannot be sure that there is any smoothness, or physical balance, between adjacent points. To ensure a unique solution of local analysis perturbations that respects this balance, the prior background forecasts are used to constrain the construction of local analysis perturbations to minimize the difference from the background.

2.4.3 Local Ensemble Transform Kalman Filter (LETKF)

Proposed by Hunt in 2007 with the stated objective of computational efficiency, the LETKF is named from its most similar algorithms from which it draws \shortcitehunt2007efficient. With the formulation of the LEKF and the ETKF given, the LETKF can be described as a synthesis of the advantages of both of these approaches. This is the method that was sufficiently efficient for implementation on the full OpenFOAM CFD model of 240,000 model variables, and so we present it in more detail and follow the notation of Hunt et al (2007). As in the LEKF, we explicitly perform the analysis for each grid point of the model. The choice of observations to use for each grid point can be selected a priori, and tuned adaptively. The simplest choice is to define a radius of influence for each point, and use observations within that. In addition to this simple scheme, Hunt proposes weighing observations by their proximity to smoothly transition across them. The goal is to avoid sharp boundaries between neighboring points.

Regarding the LETKF analysis problem as the construction of an analysis ensemble from the background ensemble, where we specify that the mean of the analysis ensemble minimizes a cost function, we state the equations in terms of a cost function. The LETKF is thus

 {xb(i):i=1..k}       LETKF      −−−−−−−−−−−−−−−−→{xa(i):i=1..k} (2.40)

where

 J(x)= (x−¯¯¯xb)T(Pb)T(x−¯¯¯xb) +(yo−H(x))TR−1(yo−H(x)).

Since we have defined for the matrix , the rank of is at most , and the inverse of is not defined. To ameliorate this, we consider the analysis computation to take place in the subspace spanned by the columns of (the ensemble states), and regard as a linear transformation from a -dimensional space into . Letting be a vector in , then belongs to and . We can therefore write the cost function as

 ~J= 1k−1wTw+(yo−H(¯xb+Xbw))T ×R−1(yo−H(¯¯¯xb+Xbw))

and it has been shown that this is an equivalent formulation by \shortcitehunt2007efficient.

Finally, we linearize by applying it to each and interpolating. This is accomplished by defining

 yb(i)=H(xb(i)) (2.41)

such that

 H(¯¯¯xb−Xbw)=¯¯¯yb−Ybw (2.42)

where is the matrix whose columns are for the average of the over . Now we rewrite the cost function for the linearize observation operator as

 ~J∗= (k−1)wTw+(yo−¯¯¯yb−Ybw)T ×R−1(yo−¯¯¯yb−Ybw)

This cost function minimum is the form of the Kalman Filter as we have written it before:

 ¯¯¯¯¯wa =~Pa(Yb)TR−1(yo−¯¯¯yb), ~Pa Missing \left or extra \right

In model space these are

 ~xa =¯¯¯xb+Xb¯¯¯¯¯wa Pa =Xb~Pa(Xb)T.

We move directly to a detailed description of the computational implementation of this filter. Starting with a collection of background forecast vectors , we perform steps 1 and 2 in a global variable space, then steps 3-8 for each grid point:

1. apply to to form , average the for , and form .

2. similarly form . (now for each grid point)

3. form the local vectors

4. compute (perhaps by solving

5. compute where is a tunable covariance inflation factor

6. compute

7. compute and add it to the column of

8. multiply by each and add to get to complete each grid point

9. combine all of the local analysis into the global analysis

The greatest implementation difficulty in the OpenFOAM model comes back to the spatial discretization and defining the local vectors for each grid point.

Four Dimensional Local Ensemble Transform Kalman Filter (4D-LETKF) We expect that extending the LETKF to incorporate observations from within the assimilation time window will become useful in the context of our target experiment high-frequency temperature sensors. This is implemented without great difficulty on top of the LETKF, by projecting observations as they are made into the space with the observation time-specific matrix and concatenating each successive observation onto . Then with the background forecast at the time , we compute and , concatenating onto and , respectively.

2.5 4D-Var

The formulation of 4D-Var aims to incorporate observations at the time they are observed. This inclusion of observations has resulted in significant improvements in forecast skill, cite. The next generation of supercomputers that will be used in numerical weather prediction will allow for increased model resolution. The main issues with the 4D-Var approach include attempting to parallelize what is otherwise a sequential algorithm, and the computational difficulty of the strong-constraint formulation with many TLM integrations, with work being done by Fisher and Andersson (2001). The formulation of 4D-Var is not so different from 3D-Var, with the inclusion of a TLM and adjoint model to assimilation observations made within the assimilation window. At the ECMWF, 4D-Var is run on a long-window cycle (24 and 48 hours) so that many observations are made within the time window. Generally, there are two main 4D-Var formulations in use: strong-constraint and weak-constaint. The strong-constraint formulation uses the forecast model as a strong constraint, and is therefore easier to solve. From Kalnay (2007) we have the standard weak-constaint formulation with superscript denoting the timestep within the assimilation window (say there are observation time steps within the assimilation window):

 J(x0)= Missing \left or extra \right (Ri)−1(Hi(xi)−yi).

The weak-constraint formulation has only recently been implemented operationally. This is \shortcitekalnay20074:

 J(x0,β)= 12(x0−x0b)TB−1(x0−x0b)+12N∑i=1 (Hi(xi+β)−yi)T(Ri)−1(Hi(xi+β)−yi) +12βTQ−1β=Jb+Jo+Jβ.

Future directions of operational data assimilation combine what is best about 4D-Var and ensemble-based algorithms, and these methods are reffered to as ensemble-variational hybrids. We have already discussed a few such related approaches: the use of the Kalman gain matrix used within the 3D-Var cost function, the “NCEP” method for 3D-Var, and inflating the ETKF covariance with the 3D-Var static (climatological) covariance .

2.5.1 Ensemble-variational hybrids

Ensemble-variational hybrid approaches benefit from flow dependent covariance estimates obtained from an ensemble method, and the computational efficiency of iteratively solving a cost function to obtain the analysis, are combined. An ensemble of forecasts is used to update the static error covariance in the formulation of 4D-Var (this could be done in 3D-Var as well) with some weighting from the climatological covariance and the ensemble covariance. This amounts to a formulation of

 B=αPf+(1−α)B (2.43)

for some between 0 and 1. Combining the static error covariance and the flow dependent error covariance is an active area of research \shortciteclayton2012operational.

2.6 Performance with Lorenz 63

With the EKF, EnKF, ETKF, and EnSRF algorithms coded as above, we present some basic results on their performance. First, we see that using a data assimilation algorithm is highly advantageous in general (Figure 2.8). Of course we could think to use the observations directly when DA is not used, but since we only observing the variable (and in real situations measurements are not so direct), this is not prudent.

Next, we test the performance of each filter against increasing assimilation window length. As we would expect, longer windows make forecasting more challenging for both algorithms. The ensemble filter begins to out-perform the extended Kalman filter for longer window lengths because the ensemble more accurately captures the nonlinearity of the model.

Each of the filters is sensitive to the size of the covariance matrix, and in some cases to maintain the numerical stability of the assimilation it is necessary to use additive and/or multiplicative covariance inflation. In all filters multiplicative error is performed with inflation factor by

 K←(1+Δ)K. (2.44)

Additive inflation is performed for the EKF as in Yang et al (2006) and Harris et al (2011) where uniformly distributed random numbers in are added to the diagonal. This is accomplished in MATLAB by drawing a vector of length of uniformly distributed numbers in , multiplying it by , and adding it to the diagonal:

 K←K+diag(μ⋅ν). (2.45)

For the EnKF we perform additive covariance inflation to the analysis state itself for each ensemble:

 x←x+ν. (2.46)

The result of searching over these parameters for the optimal values to minimize the analysis RMS error differs for each window length, so here here we present the result for a window of length 390 seconds for the ETKF. In Figure 2.10 we see that there is a clear region of parameter space for which the filter performs with lower RMS, although this region appears to extend into values of .

Chapter 3 Thermal Convection Loop Experiment

The thermosyphon, a type of natural convection loop or non-mechanical heat pump, can be likened to a toy model of climate \shortciteharris2011predicting. In this chapter we present both physical and computationally simulated thermosyphons, the equations governing their behavior, and results of the computational experiment.

3.1 Physical Thermal Convection Loops

Under certain conditions, thermal convection loops operate as non-mechanical heat pumps, making them useful in many applications. In Figures 3.3 and 3.2 we see four such applications. Solar hot water heaters warm water using the sun’s energy, resulting in a substantial energy savings in solar-rich areas \shortciteWSJthermo. Geothermal systems heat and cool buildings with the Earth’s warmth deep underground, and due to low temperature differences are often augmented with a physical pump. CPU coolers allow modern computers to operate at high clock rates, while reducing the noise and energy consumption of fans \shortciteCPUThermo. Ground thermosyphons transfer heat away from buildings built on fragile permafrost, which would otherwise melt in response to a warming climate, and destroy the building \shortciteXu2008experimental. The author notes that these systems rely on natural convection, and when augmented with pumps the convection is mixed or forced, and would necessitate different models.

3.1.1 Ehrhard-Müller Equations

The reduced order system describing a thermal convection loop was originally derived by Gorman \shortcitegorman1986 and Ehrhard and Müller \shortciteehrhard1990dynamical. Here we present this three dimensional system in non-dimensionalized form. In Appendix B we present a more complete derivation of these equations, following the derivation of Harris \shortciteharris2011predicting. For the mean fluid velocity, temperature different , and deviation from conductive temperature profile, respectively, these equations are:

 dx1dt=α(x2−x1), (3.1) dx2dt=βx1−x2(1+Kh(|x1|))−x1x3, (3.2) dx3dt=x1x2−x3(1+Kh(|x1|)). (3.3)

The parameters and , along with scaling factors for time and each model variable can be fit to data using standard parameter estimation techniques.

Tests of the data assimilation algorithms in Chapter 2 were performed with the Lorenz 63 system, which is analogous to the above equations with Lorenz’s , and . From these tests, we have found the optimal data assimilation parameters (inflation factors) for predicting time-series with this system. We focus our efforts on making prediction using computational fluid dynamics models. For a more complete attempt to predict thermosyphon flow reversals using this reduced order system, see the work of Harris et al (2012).

3.1.2 Experimental Setup

Originally built (and subsequently melted) by Dr. Chris Danforth at Bates College, the University of Vermont maintains experimental thermosyphons. Operated by Dave Hammond, UVM’s Scientific Electronics Technician, the experimental thermosyphons access the chaotic regime of state space found in the principled governing equations. We quote the detailed setup from Darcy Glenn’s undergraduate thesis \shortciteglenn2013:

The [thermosyphon] was a bent semi-flexible plastic tube with a 10-foot heating rope wrapped around the bottom half of the upright circle. The tubing used was light-transmitting clear THV from McMaster-Carr, with an inner diameter of 7/8 inch, a wall thickness of 1/16 inch, and a maximum operating temperature of 200F. The outer diameter of the circular thermosyphon was 32.25 inches. This produced a ratio of about 1:36 inner tubing radius to outside thermosyphon radius (Danforth 2001). There were 1 inch ’windows’ when the heating cable was coiled in a helix pattern around the outside of the tube, so the heating is not exactly uniform. The bottom half was then insulated using aluminum foil, which allowed fluid in the bottom half to reach 176F. A forcing of 57 V, or 105 Watts, was required for the heating cable so that chaotic motion was observed. Temperature was measured at the 3 o’clock and 9 o’clock positions using unsheathed copper thermocouples from Omega.

We first test our ability to predict this experimental thermosyphon using synthetic data, and describe the data in more detail in Chapter 4. Synthetic data tests are the first step, since the data being predicted is generated by the predictive model. We consider the potential of up to 32 synthetic temperature sensors in the loop, and velocity reconstruction from particle tracking, to gain insight into which observations will make prediction possible.

3.2 Computational Experiment

In this section, we first present the governing equations for the flow in our thermal convection loop experiment. A spatial and temporal discretization of the governing equations is then necessary so that they may be solved numerically. After discretization, we must specify the boundary conditions. With the mesh and boundary conditions in place, we can then simulate the flow with a computational fluid dynamics solver.

We now discuss the equations, mesh, boundary conditions, and solver in more detail. With these considerations, we present our simulations of the thermosyphon.

3.2.1 Governing Equations

We consider the incompressible Navier-Stokes equations with the Boussinesq approximation to model the flow of water inside a thermal convection loop. Here we present the main equations that are solved numerically, noting the assumptions that were necessary in their derivation. In Appendix C a more complete derivation of the equations governing the fluid flow, and the numerical for our problem, is presented. In standard notation, for the velocity in the direction, respectively, the continuity equation for an incompressible fluid is

 ∂u∂x+∂v∂y+∂w∂z=0. (3.4)

The momentum equations, presented compactly in tensor notation with bars representing averaged quantities, are given by

 ρref(∂¯ui∂t+∂∂xj(¯uj¯ui))=−∂¯p∂xi+μ∂¯ui∂x2j+¯ρgi (3.5)

for the reference density, the density from the Boussinesq approximation, the pressure, the viscocity, and gravity in the -direction. Note that for since gravity is assumed to be the direction. The energy equation is given by

 ∂∂t(ρ¯¯¯e)+∂∂xj