Leveraging Bayesian Analysis To Improve Accuracy of Approximate Models
Abstract
We focus on improving the accuracy of an approximate model of a multiscale dynamical system that uses a set of parameterdependent terms to account for the effects of unresolved or neglected dynamics on resolved scales. We start by considering various methods of calibrating and analyzing such a model given a few wellresolved simulations. After presenting results for various point estimates and discussing some of their shortcomings, we demonstrate (a) the potential of hierarchical Bayesian analysis to uncover previously unanticipated physical dependencies in the approximate model, and (b) how such insights can then be used to improve the model. In effect parametric dependencies found from the Bayesian analysis are used to improve structural aspects of the model. While we choose to illustrate the procedure in the context of a closure model for buoyancydriven, variabledensity turbulence, the statistical nature of the approach makes it more generally applicable. Towards addressing issues of increased computational cost associated with the procedure, we demonstrate the use of a neural network based surrogate in accelerating the posterior sampling process and point to recent developments in variational inference as an alternative methodology for greatly mitigating such costs. We conclude by suggesting that modern validation and uncertainty quantification techniques such as the ones we consider have a valuable role to play in the development and improvement of approximate models.
keywords:
Bayesian analysis, reduced order modeling, turbulence modeling, Reynoldsaveraged Navier Stokes, neural network, surrogate modeling1 Introduction
Natural and engineered systems that exhibit multiscale behavior due to coupling of subsystems with different spatial and temporal scales are commonplace. For example, interactions between wave modes, interactions between vortical modes, and coupled interactions between wave and vortical modes gives rise to complex phenomena in an array of fluid dynamic problems ranging from homogeneous isotropic turbulence, to common and engineering instances of turbulent fluid flow, to rotatingstratified turbulence relevant to flows on global and astrophysical scales. In a different setting, interactions between fast vibrations and slow conformational changes lead to complex behavior in molecular dynamics relevant to protein folding, and so on.
If we represent the comprehensive mathematical model of such a complex, multiscale, dynamical system, (the full NavierStokes equations and the full forcefield based molecular dynamics respectively in the two examples above) symbolically as
(1.1) 
where represents the state vector, the tendency of the state vector, is a set of physical parameters, and the superscript stands for wellresolved, then it is almost invariably the case that the computational complexity of such a model prevents it from being used routinely in design and analysis of the system of interest. Therefore, a pragmatic consideration in seeking an approximate model is to reduce the computational cost of the orginal comprehensive model by limiting the associated number of degrees of freedom, while keeping the error in relevant quantities of interest (QoI) small. In the context of (1.1), an approximate model may be symbolically represented as
(1.2) 
where superscript stands for reducedorder, dimension of dimension of , and is a set of modeling parameters.
We note that a Reduced Order Model (ROM) is one kind of an approximate model that is popular in the engineering fields and industry wherein the original governing equations are projected on to a reducedorder basis, but where the basis vectors are determined from snapshot data, e.g., using Principal Component Analysis (PCA) (e.g., see Jolliffe, 2011, and references therein), or Dynamic Mode Decomposition (DMD) (e.g., see Schmid, 2010, and others), etc., and where the snapshot data is obtained by solving the original system. In the ROM context, the model parameters may be thought of as associated with terms such as closures (as in, e.g., Wang et al., 2012, and others), etc.. We also note that an approximate model may be based entirely on a datadriven approach. For example, in the field of molecular dynamics (MD), it is common to use an approximate approach such as a Markov State Model (MSM) to permit longer simulation times. In this approach the conformation space is first partitioned, based on MD simulation data, into discrete subspaces, by using techniques such as clustering or PCA. And then, in such a reduced subspace, molecular kinetics itself is reduced to transitions between the identified discrete subspaces that are governed by transition probabilities that may be parameterized and learnt, again, from MD simulation data (e.g., see Deuflhard et al., 2000; Pande et al., 2010; Schütte et al., 2011; Gerber and Horenko, 2017, and references therein).
1.1 Approximate Models of Turbulent Flows
While further combinations of such datadriven and equationbased techniques are clearly possible (e.g., see Dorrestijn et al., 2013), we shift attention to the specific context of a turbulent flow. A common objective in the study of such a flow, is to be able to accurately compute QoIs of practical relevance. However, in a turbulent flow not only are the relevant fields (velocity, density, and others) threedimensional, timedependent and random, but there is a large range of time and length scales. Whereas the domain size () and geometry directly affect the flow, the Kolmogorov length—a measure of the smallest scales of turbulence—scales as Re and the Kolmogorov time scales as Re Pope (2001). Here, Re is the Reynolds number, a nondimensional number that characterizes the relative effects of nonlinearity and viscosity, and it is large in turbulent flows.
Thus, if all the scales were to be resolved, as in the Direct Numerical Simulation (DNS) procedure, the computational cost would scale as Re. Here, computational cost is estimated as scaling with the number of operations , with , where the cube accounts for threedimensionality of the flow and rather than since DNS practitioners commonly use explicit timestepping with CourantFrederickLevy number of O(1). In this case, the computational timestep goes down as . Such steep scaling of computational cost with Reynolds number for a fully resolved simulation makes it feasible only for low to moderate Reynolds numbers, meaning to say prohibitively expensive for realistic flows of interest.
Computational intractability of being able to fully resolve a (high Reynolds number) turbulent flow, leads naturally to the question of modeling only a limited range of scales that are of direct interest. In this context, the procedure of averaging the governing equations over a scale of the order of the smallest scale of interest leads to the closure problem. Here by closure we refer to how the neglected degrees of freedom affect the evolution of the resolved degrees of freedom.
In the Large Eddy Simulation (LES) approach (e.g., see Pope, 2001), equations are solved for filtered fields (with a filter scale ) that are representative of the largerscale turbulent motions and these equations include a (closure) model for how the unresolved scales affect the resolved scales. While the computational cost of this scalerestricted model is seen straightforwardly to scale as , the requirement of to lie within the inertial range of turbulence (a range of scales within which dynamics is selfsimilar) renders its computational cost still too high for realistic flows given current computational resources.
For these reasons, in order to obtain a model that attains acceptable levels of accuracy while being computationally affordable, an ensembleaverage (Reynoldsaverage) of the original NavierStokes equations (ReynoldsAveraged Navier Stokes or RANS equations) is considered. Since the RANS equations solve only for the ensembleaveraged fields, a solution procedure using these equations is not required to resolve either the inertial range (cf. LES) or the dissipative range (cf. DNS). The RANS equations are therefore amenable to being solved using much coarser resolutions. The flip side of solving the RANS equations at coarse resolutions is that in order to attain an acceptable level of accuracy, the effects of a large set of unrepresented scales (on resolved scales) have to be explicitly modeled. Such closure issues notwithstanding, the RANS approach has emerged as the method of choice (e.g., see Pope, 2001) in both understanding realistic turbulent flows and to address practical issues of design, optimization, and operations in the context of such flows. The RANS approach may be written symbolically as
(1.3) 
where is the ensemble (or Reynolds) average of (=), so that if , = 0, represent the closure terms, and for convenience the vector of modeling parameters in the RANS approach is represented by .
The modeling of closures in the LES and RANS approaches to modeling turbulent flows is central to the accuracy and efficiency of these methods. Such modeling is based largely on phenomenological understanding of prototypical turbulent flows by subject matter experts and is mostly deterministic. However, stochastic methods that typically use stochastic models of turbulent fluid flow to evolve probability density functions of turbulent fields such as velocity have also been used to address the closure problem (e.g., see Pope, 2001), and datadriven stochastic closures are also being considered (e.g., see Duan and Nadiga, 2007; Nadiga, 2008; Plant and Craig, 2008, and others). In either case, however, improvement of a model, in terms of reducing modelform related error has almost exclusively been the domain of subject matter experts and theoreticians. Thus, whereas a computational approach for such model improvement currently does not exist, we present one.
1.2 Structural and Parametric Uncertainty in the Approximate Model
When additional explicit model terms, such as in (1.3), are used to achieve a simplified approximation of the full system, the form of such a closure is a choice that is made and is, therefore, not unique. It also follows that the OoIs in the approximate model depend on the choice of the form of the closure. We refer to this as structural or modelform uncertainty/error. After the choice of the form of the closure has been made, the values of the parameters influence the accuracy of the approximate model, leading to parameter uncertainty/error. Even when additional explicit model terms, such as in (1.3), are not used, it is easy to imagine hypothetically introducing such tendencies as a means of reducing the discrepency between the average of and . Indeed such terms are routine when neural networks are used to improve the model, but with the crucial difference that the dimension of the dimension of in such a case (and consequently structural uncertainty is less important). Once the choice of the form of the closure has been made, we next consider the process of its validation and calibration given parametric uncertainty.
1.3 Model Calibration and Validation
The calibration and validation of a model, e.g., as in the RANS approach to modeling turbulent flows is based either on experimental data or observations or data from numerical studies that resolve dynamics and physics of the full range of relevant scales. In the case of some idealized turbulent flows, such calibration data may come from DNS of the flow. Here, by calibration, we refer to the assignment or adjustment of values of model parameters, in (1.3), so as to bring model prediction of QoIs, , in certain scenarios into agreement with known values of QoIs, , for those scenarios. And, validation refers to the process of determining the degree to which the model, (1.3), is an accurate representation of the real world, here (1.1), from the perspective of the intended uses of the model (Roache, 1998; Babuska and Oden, 2004; Farrell et al., 2015).
Different approaches to calibration and validation exist. However, calibration and validation of RANS turbulence models has traditionally relied on point estimates—that is one optimal set of parameters are sought that best fits the calibration data (e.g., see Pope, 2001; Schwarzkopf et al., 2016):
(1.4) 
However, it is not guaranteed that such calibration is optimal. For example, there may be multiple, possibly very different multiway balances of phenomena that lead to similar evolutions of QoIs. In the turbulent flow context, different balances of dissipation, transport, and decay processes could be consistent with the evolution of turbulent moments as specified by given calibration data.
In contrast to approaches that aim to find a single optimal set of parameters, a Bayesian framework models the model parameters as random variables and seeks to approximate the joint distribution of such variables. In other words, it estimates the posterior probability of given : . In so doing, the methodology gives consideration to the possibility that different balances of key modeled processes can explain the calibrating data. That is, the Bayesian framework integrally permits consideration of parameter uncertainty in the calibration and validation of a model. Furthermore, the Bayesian approach allows for better integration of both prior knowledge and probabilistic structure into the calibration and validation process. We are, therefore, interested in examining the utility of a Bayesian approach to calibration and validation, and what advantages it may hold over the traditional point estimate based approaches in developing insights into improving the model under consideration.
A few recent studies have used Bayesian estimation techniques to calibrate RANS models. For example, Oliver and Moser (2011), and Cheung et al. (2011) use Bayesian uncertainty analysis to calibrate and intercompare four wellknown RANS models: the BaldwinLomax model Baldwin and Lomax (1978), the SA model Spalart and Allmaras (1992), the Chien  model Chien (1982), and the model Durbin (1991); Edeling et al. (2014b, a) used Bayesian estimates of parameter variability in the LaunderSharma model Launder and Sharma (1974) as a means to estimate errors in RANS simulations; Farrell et al. (2015) present an adaptive modeling algorithm for selection and validation of models, however, in the domain of atomistic systems. In other work that uses a Bayesian framework in the context of RANS models, Edeling et al. (2014b, a), note that the distribution of model parameters provides information on error associated with the model: when the joint probability density function (PDF) of the model parameters is propagated through the model, the distribution of the QoIs can be used to provide confidence bounds for the QoIs.
In addition to such uses of the Bayesian methodology, we think that the Bayesian methodology has a useful role to play in the development and improvement of the approximate or reducedorder descriptions—a role that should be thought of as complementary to that of subject matter expertise that remains central to developing approximate models. Indeed, we show how Bayesian analysis of a RANS model that uses DNS (or equivalently any other calibration data) can uncover unanticipated dependencies which in turn point to structural deficiencies in the model and specific ways in which the model can be improved.
Organization of the rest of the article is as follows. In Sec. 2 we discuss the specifics of the problem we consider, the fully resolved model, the specific approximate model we consider, and the QoIs. In Sec. 3 we discuss the details about the methods we use before presenting respective results. This includes details about point estimates and Bayesian estimation, how we go about improving the model and details about the neural networksurrogate strategy that we propose for situations wherein the approximate model itself may still be expensive enough to prohibit its direct use in Bayesian analysis. This is followed by a brief section that summarizes the work and concludes.
2 The Problem Setting, the Fully Resolved Model, and the Approximate Model
Buoyancydriven turbulence is commonplace in a wide variety of naturallyoccuring flows (e.g., oceanatmosphere dynamics, astrophysics, mantle convection etc.) and engineering flows (e.g., ranging from smokestacks to combustion to inertialconfinement fusion), and encompassing both, a wide range of instabilities and rich phenomenology. Homogeneous RayleighTaylor (hRT) turbulence is a particular idealization of buoyancydriven turbulence. The initial condition for the onset of hRT turbulence consists of isolated regions in the domain of interest being occupied by two miscible fluids at rest that have different densities, and the flow evolves in accordance with the resultant buoyancy force.
2.1 The Fully Resolved Model
If the flow is incompressible (lowspeed) and the densities of the two fluids are and , the fullorder governing equations (symbolically represented by Eqn. 1.1) for this problem are the NavierStokes equations along with the species mass fraction transport equations.
Species mass fraction transport equations
Given mass fractions and for the two fluids which sum to unity (i.e., ) , the density of the mixture can be written as:
(2.5) 
The species mass fraction transport equations assuming Fickian diffusion are:
(2.6) 
The continuity equations is obtained by summing over . Nonzero divergence of velocity results from mixing due to the change in specific volume, :
(2.7) 
Here is the diffusion coefficient and it is assumed to be constant.
NaverStokes equations
After nondimensionalization, using an arithmetic mean of the two densities for the reference density , a reference velocity , and a reference length , the wellresolved model equations are:
(2.8)  
(2.9)  
(2.10) 
and where . In the above equations, the nondimensional state variables are the density , direction velocity , and pressure . The nondimensional parameters in the above equations are the Reynolds number , Schmidt number , and Froude number , defined by:
(2.11) 
and where is gravitational acceleration, and is for dynamic viscosity (assumed constant and equal for both fluids). In the above equations, we note that (a) because we consider large density differences, the flow is not amenable to the commonly used Boussinesq approximation, leading to the full density being used consistently in all the terms of the momentum equations, and (b) even though we consider lowspeed dynamicallyincompressible flows, the velocity field is not divergencefree because molecular mixing leads to changes in the specific volume.
Equations (2.8)(2.10) are solved using the Direct Numerical Simulation procedure at four different values of Atwood number: 0.05, 0.25, 0.50, and 0.75, where the Atwood number is the normalized density ratio. The reader is referred to Livescu and Ristorcelli (2007) for details. These values of the Atwood number characterize the initial conditions, when the fluids are completely segregated. As the fluids molecularly mix, the evolving Atwood number can be calculated as the largest value of the (normalized) density difference between pairs of points in the flow. Such an evolving Atwood number decreases with time.
A brief phenomenological description of the flow evolution is as follows, and the reader is referred to Livescu and Ristorcelli (2007) for further details. The unstable nature of the initial distribution of density endows it with high potential energy, and the ensuing buoyancydriven instability leads to a conversion of potential energy into kinetic energy. The flow subsequently transitions to turbulence. Once turbulent, the flow can be described in terms of the evolution of singlepoint secondorder turbulent correlations. The relevant turbulent correlations in this setting are the Favre average Reynolds stresses (, where , with being the Favre or densityweighted averaged velocity ) and the turbulent kinetic energy (), the turbulent mass flux which is the correlation between density and velocity fluctuations (), and the densityspecificvolume covariance . The densityspecific volume correlation is largest at initial times and begins to decay as the turbulence grows and the fluids mix (see solid lines in Fig. 2). In particular, at these initial times, the mean pressure gradient that develops in response to gravity couples to , to produce turbulent mass flux , which then couples back with the pressure gradient to generate turbulent kinetic energy. At early to intermediate times, the Reynolds stresses (and turbulent kinetic energy), the turbulent mass flux, and turbulent dissipation all grow. Eventually, as the fluids mix, they reach peak values at different times and eventually decay asymptotically to zero.
In this problem setting, the QoIs are the various second order turbulent correlations described previously, and a length scale as discussed further in the next section. Temporal evolution of some of the QoIs, as given by the fully resolved model, is shown in solid lines in Fig. 2.
2.2 The Approximate Model
In this context, and in continuation of the long history of turbulence model development, Schwarzkopf et al. (2011) showed that the single point turbulence equations developed by Besnard et al. (1992) (BHR model) could be applied to a range of selfsimilar turbulent mixing flows generated from different instabilities such as RayleighTaylor, KelvinHelmholtz, and RichtmyerMeshkov without changing the model coefficients. Key to the wide applicability of the model developed by Besnard et al. (1992) was their consideration of a transport equation for density specific volume covariance . This, in conjunction with full consideration of the Reynolds stress transport, allowed the proper description of anisotropy that is fundamental to buoyancydriven turbulence. Nevertheless, a shortcoming of this model was its inability to properly represent turbulence growth rates encountered in settings that involve transient evolution of buoyancydriven turbulence, as commonly encountered in flows dominated by the RaleighTaylor instability. This shortcoming is better understood by considering the popular  class of turbulence models wherein the transport term is slaved to the decay length scale (by coefficients and ). Consequently, like in the  model, profiles of dissipation and transport of Reynolds stress (and densityspecific volume covariance) are scaled versions of each other (in the BHR model). This is in contrast to the different nature of the scaling of transport and dissipation that is exhibited in the selfsimilar regime of RayleighTaylor turbulence in Livescu et al. (2009). To remedy this shortcoming, Schwarzkopf et al. (2016) introduced a second turbulent length scale.
The RANS turbulence model discussed above contains several coefficients. These coefficients need to be calibrated so that a common set of model coefficients will allow reasonable comparisons of statistics over a wide range of turbulent flows, ranging from incompressible flows with single fluids and mixtures of different density fluids (variable density flows) to flows over shock waves. Schwarzkopf et al. (2016) follow a recipe for calibrating the coefficients that considers a sequence of simple flow configurations. The simple flow configurations considered were such that most of the terms in the equations could be neglected and the remaining nonzero terms were (mostly) different for each of the configurations considered. The specific sequence of canonical flows considered in the calibration process consisted of homogeneous isotropic decaying turbulence, homogeneous buoyancydriven decaying turbulence, homogeneous shear (including Rapid Distortion Theory at high shear rates), wall bounded flow, RayleighTaylor (RT) driven turbulence, shear driven turbulence, and shocked isotropic turbulence.
A RANS turbulence model simultaneously represents a number of physical processes (e.g., production dissipation, return to isotropy and rapid distortion of second order turbulent velocity correlations and others). Therefore, it seems important that the calibration process should account for the possibility that different combinations of such processes can lead to QoIs, either as observed in experiments or as computed in DNS. This amounts to requiring a comprehensive consideration of uncertainty in the coefficients (parametric uncertainty). The traditional calibration process adopted in Schwarzkopf et al. (2016), by producing one optimal combination of coefficient values, fails to account for such parametric uncertainty. In this sense, it is clear that a calibration process that accounts for such uncertainty will be better than the calibration process adopted in Schwarzkopf et al. (2016). While we are currently working on a comprehensive Bayesian calibration of the RANS turbulence model that includes a full complement of test cases (e.g., as considered in Schwarzkopf et al. (2016)) that will not be the focus of the current article. The focus of this study, instead, is on demonstrating how the diagnostics resulting from the Bayesian calibration and analysis can be leveraged to improve turbulence modeling. For this, it suffices to consider a suite of homogeneous buoyancydriven or RayleighTaylor (hRT) turbulence test cases.
The RANS turbulence model equations for hRT turbulence is a set of ordinary differential equations:
(2.12)  
(2.13)  
(2.14)  
(2.15)  
(2.16) 
Here is time, is the mean material density, is the mean pressure gradient in the direction of gravity, , , , and are the Favre average Reynolds stress, turbulent kinetic energy, mean densityweighted velocity fluctuation, and densityspecificvolume, respectively, as defined previously. is the turbulent decay length scale. The remaining terms denoted by subscripted s are model parameters and consist of , related to rapid return to isotropy; related to slow return to isotropy; , related to rapid decay of mass flux; , related to slow decay of mass flux; , related to rapid growth/decay for hRT; and , related to decay of densityspecific volume. and turbulent kinetic energy are the QoIs resulting from this model. We refer readers to Schwarzkopf et al. (2016) for further details about the RANS turbulence model. For convenience, this set of equations may be written as
(2.17) 
where is the vector of turbulent secondorder moments of interest of the primitive variables , and is the set of parameters. In this setting, the closure term in (1.3) is given by the turbulence model:
(2.18) 
Here it is also understood that the RANS turbulence model is an approximation of
(2.19) 
but where is not explicitly known because of the dependence of on yet higher order moments and so on. As such, the closures introduced in (1.3) to approximate (1.1), equivalently approximations introduced in (2.17) to approximate (2.19), invariably lead to model error that may be attributed to structural inadequacy/error and parametric errors as discussed in the introduction. While we are interested in improving the accuracy of (1.3) by improving , since is determined from as in (2.18), in the rest of this article, unless otherwise stated, we refer to the RANS turbulence model given by the set of equations (2.12) to (2.16) or equivalently (2.17) as the approximate model we consider and wish to improve.
3 Methods and Results
There are multiple ways in which the set of parameters in (2.122.16) can be estimated given a set of data such as data from a few DNSs. For convenience, we broadly categorize the estimation methods based on two considerations: first, based on whether the estimation procedure pools all of the sample data together or not (see below), and, secondly, based on whether the method produces a pointestimate or a probabilistic or interval estimate. Since the distinction between methods based on the second consideration is evident, we do not devote a separate section to point them out; rather we discuss relevant details in the respective sections that present the results.
3.1 Pooled and Unpooled Analysis
To motivate the distinction between pooled and unpooled analysis, we note that we are interested in analyzing the RANS turbulence model given DNS simulations that have been previously performed at four different Atwood numbers: . The aim in performing such an analysis is to then examine the resulting pointestimates or posterior distribution of parameters in the turbulence model as a means to gain insights into the turbulence model itself. For example, if an a priori unexpected dependency is found in the study, it would lead us to further investigate the origin of the dependency in terms of turbulence phenomenology. If this investigation leads us to conclude that the dependency is a shortcoming of the model, further followup would consist of finding appropriate (possibly structural) modifications of the turbulence model that will remove the dependency while not introducing yet other “unexpected/unwanted” dependencies.
To this end, we note that the sample data is at different Atwood numbers, a parameter that characterizes the strength of the drive that is forcing the turbulent flow. At the same time, the RANS turbulence model in (2.122.16) does not explicitly depend on Atwood number. That is because of the local nature of the differential equation form of the RANS turbulence model; the RANS turbulence model subsumes such dependencies through its dependence on local turbulent correlations. That is, the RANS approach aims to model the flow evolution based purely on local behavior of density and related quantities. As such, assuming that the given turbulence model is correct leads us to consider the sample data as coming from the same population, and to estimate a single set of parameters for the different Atwood numbers. We call this the pooled analysis. A comparison of the QoIs with this point estimate would then allow us to estimate the effectiveness of the turbulence model.
If the RANS estimates of the QoIs do not compare well to the DNS estimates, two possibilities exist: either the sample data did not come from the same population (still implying a deficiency in the turbulence model) or the turbulence model itself is (more seriously) deficient. In order to examine the possibility that the sample data do not come from the same population, we need to perform an “unpooled” analysis. That is, we need to assume that the data at different Atwood numbers are coming from different populations, and redo the analyses to obtain a separate estimate for the set of turbulence parameters at each Atwood number. We call this the unpooled analysis. We note that while indeed partial pooling is also possible and useful in certain circumstances, it suffices to consider the two end members, viz. fully pooled or pooled and unpooled analysis, for our present purpose.
If the QoIs compare well with DNS with the unpooled analysis, then the dependence of the parameters on Atwood number would have to be investigated further to be able to remove it from the turbulence model. Needless to mention, a more serious shortcoming of the turbulence model would be implicated in the case of a poor comparison between the RANS and DNS estimates of QoIs on performing unpooled analysis.
3.2 Point Estimates and Results
In this category of estimation methods, the RANS model is used in conjuction with calibration data to calculate a single set of values for the turbulence parameters—a set that serves as the best estimate for those parameters given the data. Figure 1 shows five different point estimates. The first estimate, in red, corresponds to that of Schwarzkopf et al., 2016. This may be considered as an example of a pooled pointestimate. Other pooled pointestimates are not shown to avoid clutter.
In the second approach all quantities other than the parameters in (2.122.16) are evaluated from the DNSs to obtain a set of algebraic equations in the unknown parameters. These equations are then solved in the leastsquares sense to obtain the second, now unpooled, pointestimate of the parameter set (at each Atwood number for which the DNSs were available). This is shown in blue.
For the third estimate (green), an optimal set of parameters is sought at each Atwood number while integrating the RANS model. We note that we implemented estimation procedures using both forward sensitivities and adjoint sensitivities of the QoIs with respect to the parameters . Differences in these estimates were not significant, as expected, while noting that the adjoint sensitivities become more attractive when the number of parameters is larger than in the present case.
Next, we consider unpooled “maximum likelihood”estimates (MLE). The likelihood function is given by (3.23) and is described further in the section on Bayesian analysis. Two different estimates, (in cyan and magenta) are considered in this case to highlight another issue with pointestimates—their dependence on the initial value provided to the search algorithm.
Experiment  At=0.05  At=0.25  At=0.50  At=0.75  

1  Schwarzkopf et al. (pooled)  0.408  1.726  0.92  2.258 
2  Unpooled DNS Only  0.144  0.645  1.088  1.792 
3  Unpooled RANS Integration  0.023  0.143  0.293  0.384 
4  Unpooled Max. Likelihood 1  0.029  0.156  0.302  0.392 
5  Unpooled Max. Likelihood 2  0.069  0.166  0.385  0.476 
6  Pooled Bayesian  0.060  0.224  0.678  1.596 
7  Unpooled Bayesian  0.029  0.160  0.327  0.487 
8  Pooled Modified Bayesian  0.028  0.169  0.500  0.439 
9  Unpooled Modified Bayesian  0.029  0.160  0.331  0.498 
10  DNNSurrogate Bayesian  0.028  0.159  0.326  0.515 
The corresponding behavior of the QoIs obtained on (re)integrating the RANS model using two of the five parameter estimates discussed above (one pooled and one unpooled) are shown in Fig. 2. First, the estimate from Schwarzkopf et al. (2016) is seen to result in poor comparisons of the RANS and DNS estimates of the QoIs. To a first order, the poor performance of the first estimate could be attributed to the fact that the calibration procedure of Schwarzkopf et al. (2016) considers a number of testflows and produces one set of parameters to be used across them. Next, the second (now unpooled) estimate produces similarly poor comparisons in the QoIs (see Table 1). The poor performance of the second estimate can be understood as due to not fully considering the dynamical nature of the RANS model when estimating the parameter values (e.g., see Nadiga and Livescu, 2007). With the third estimate, the correspondance between RANS and DNS estimates of the QoIs is seen to be good. This is also true of the fourth and fifth (both unpooled) estimates; see Table 1.
The uniformlyweighted combined residuals for the five QoIs at different Atwood numbers for these five estimates are presented in the first five rows of Table 1, Here the residuals are the differences between the DNS and RANS estimates of the QoIs. These residuals are consistent with the above characterization of the parameter estimates.
The good correspondance between RANS and DNS estimates of QoIs seen in Fig. 4 with the third, fourth, and fifth estimates establishes that the turbulence model being considered has the phenomenlogy required to represent the class of flows being considered. Furthermore, along the lines of reasoning for the poor performance of the first (pooled) estimate, the better performance of the third through fifth estimates may again be understood as due to the fact that the calibration was specific to each of the different Atwood number cases (unpooled).
Next, as discussed in Sec. 4.1, the favorable outcome with some of the unpooled estimates leads us to focus attention on the variation of the parameter estimates with Atwood number. Large variations with Atwood number are seen in most of the six parameters when the third through fifth estimates are considered. This would suggest that multiple aspects of the turbulence model have to be modified. However, as good as the match is between DNS and RANS at these estimates, we note that this estimation procedure does not account for parametric uncertainty discussed previously. We therefore turn our attention next to probabilistic estimates.
3.3 Bayesian Analysis Methodology
In the introduction and in the preceding parts of this section, we briefly considered the role of calibration in the overall process of validation and discussed how uncertainty has an important role to play. Here, we outline the Bayesian approach to calibration wherein uncertainty is quantified in a probabilistic sense. In the Bayesian framework, calibration of a model is formulated as an inference problem. Thus, the probability of (the vector of) the model parameters , given experimental or DNS data or QoIs , is determined using the Bayes rule as
(3.20)  
(3.21) 
In the above equation, it is common to call

the prior distribution of the model parameters.

the likelihood function, and

, the posterior distribution of the model parameters
An analytical approach to Bayesian estimation requires the integration of the normalization term (denominator in (3.20). This is difficult to evaluate when the probability model structure is complex. However, algorithms based on the MonteCarlo Markov Chain (MCMC) allow for discretely sampling points from the posterior distribution, bypassing the need for the integration of the normalization factor. The only requirement for performing MCMC based sampling is that a function proportional to the original probability distribution be specified. This is shown in (3.21). To assist in the illustration of the Bayesian estimation process, we further make precise the QoIs vector as follows:

is the th QoI vector from the DNS simulation data. where is the number of QoIs in the problem. is the total number of timesteps. Denote this as the truth vector.

, results from RANS model, given model parameters where is number of RANS model parameters. Denote this as the model vector.
Likelihood function and discrepancy model
We parameterize the likelihood function as a multivariate Gaussian density function centered around the truth vector. For the th QoI,
(3.22)  
(3.23) 
Here is the covariance matrix (or equivalently is the precision matrix) for the th QoI. That is, when comparing to , we model the discrepancy between them that arises due to various reasons including model error as , with and where represents the assumed covariance structure of the discrepency. We present results here for a simple parameterization of as , and where is a new hyperparameter and is the identity matrix. Clearly, while more complicated parameterizations such as Gaussian processes (e.g., see Oliver and Moser, 2011; Nadiga and Urban, 2018) and other forms can be used for the parameterization of the covariance, the simple form for the discrepancy that we use here, including its independence of follows from our a priori lack of knowledge about such dependencies including a priori lack of knowledge of model error. Finally, a limited amount of experimentation with the more complicated forms of the covariance showed robustness of the results we find with respect to the form of covariance.
Finally, we define the likelihood function of all QoIs as a sum of the individual likelihood functions, i.e.,
(3.24) 
where is now augmented to include , the dimensional vector of hyperparameters : . Finally, given the parameter vector , is obtained by accurate numerical integration of (1.3).
Prior distributions
To ensure robustness of the analysis and results presented, all computations were performed with two different forms for the prior distribution for parameters , a normal distribution and a uniform distribution. The priors are always centered at the previous estimates of Schwarzkopf et al. (2016). As mentioned in the introduction, if the processes associated with the parameters play an important role in the class of flows we consider, we use weak, uninformative and diffuse priors; for other parameters, the width of the priors are broadly determined by physical bounds such as trying to limit regions of negative turbulent kinetic energy. Broad normal distributions are used for .
MCMC using Delayed Rejection Adaptive Metropolis (DRAM)
We use the DRAM algorithm Haario et al. (2006), which is an improved MCMC algorithm, for drawing samples from the posterior distribution. DRAM combines two ideas that improve on the MetropolisHasting type MCMC algorithm, Delayed Rejection Tierney and Mira (1999) and Adaptive Metropolis Haario et al. (2001), whose efficiency in many scenarios outperform the original methods. The pseudocode for the algorithm is given in Algorithm 2, and we limit to two.
3.4 Results from Pooled Bayesian Inference
As discussed in Sec. 2, a singlepoint RANS turbulence model models all turbulent phenomena in a local fashion. That is, each of the turbulent processes is parameterized in terms of the local values of relevant variables and their gradients. Thus, given such locality assumptions, it is possible to imagine that the effects of Atwood number would enter implicitly through the gradients of density and gradients of other variables that are dependent on density. It may, therefore, be anticipated that the coefficients themselves should not depend on the Atwood number. Consequently, we first perform such a calibration.
The marginals of the posterior distribution of parameters resulting from such a calibration is shown in the heavy dashed magenta line in the panels of Fig. 3. It is seen in this figure that the DNS data have moved the prior distributions—centered at the pointestimate of Schwarzkopf et al. (2016)—significantly away from that point estimate and greatly narrowed the distributions from their prior values for three of the six parameters. In order to ensure robustness of these inferences, the pooled analysis was repeated with flat (uniform distributions with large bounds) priors; the results did not change significantly and the results are not shown to avoid redundancy.
The comparison between RANS and DNS estimates of the QoIs are shown for only one of the Atwood numbers (0.75) in the left panel of Fig. 4, again in order to limit the number of figures; large differences are seen and this is true at other Atwood numbers as well. The reader is referred to the lower five rows of Table 1 for the actual size of the residuals at different Atwood numbers for the different Bayesian calibrations considered.
As discussed earlier, the large differences seen in the QoIs leads to two possiblities: either that the model has serious deficiencies or that the model has possibly a less serious deficiency in being unable to simultaneously properly represent flows at different Atwood numbers.
3.5 Results from Unpooled Bayesian Inference
In order to find out the origin of the model deficiency and its significance, following the discussion in Sec. 3.1, we next consider unpooled or independent calibration of the RANS model at each of the Atwood numbers. The posterior distribution of parameters resulting from this unpooled calibration is shown in solid lines in Fig. 3. As in the pooled analysis and consistent with that inference, a significant shift away from the prior distribution, both in terms of mean and variance, is seen in these cases as well. Again, computations with a uniform prior distribution with large bounds did not produce any significant difference ensuring robustness of the inferences.
Comparisons of the RANS model fits to the DNS QoIs are shown in the right panel of fig. 4 (for the same Atwood number as in the left panel). The fits are seen to be much better than with pooled inference (left panel).
Parameter  

JSDivergence  0.051  0.035  0.037  0.085  1.125  0.100 
.
3.6 Variation of the Posterior Distribution of Parameters with Atwood Number and its Quantification
Both the unpooled Bayesian inference and the unpooled point estimate presented earlier succeed in fitting the DNS estimates of the QoIs well. However, a significant difference between the unpooled Bayesian and unpooled pointestimate is that, while with the unpooled point estimate four of the six parameters displayed large variations with Atwood number, it is seen that with the unpooled Bayesian inference only one of the parameters displays significant variations with Atwood number (lower middle panel of Fig. 3 vs. red, cyan, and turquoise lines in Fig. 2).
While it is visually clear that distributions of under different Atwood numbers are distinctly different, we quantify the distributional shift of the posterior PDF of model parameters resulting from the change in Atwood number using the general JensenShannon divergence. The general JensonShannon divergence is a distance metric that measures the similarity between probability distributions as:
(3.25) 
where are the weights for each probability distribution . Here, for the four different Atwood numbers, and we choose uniform weighting: . is the Shannon entropy for distribution . The Shannon entropy is defined as:
(3.26) 
The values are tabulated in Table. 2.
The dependence of parameter on Atwood number seen in the lower middle panel of Fig. 3 is better visualized in Fig. 5. A similar plot for the computations with a uniform form for the priors shows little difference.
Next, we note that, after performing the above Bayesian analysis, in order to further investigate the reason for the different dependencies we found with the leastsquares pointestimate and the Bayesian inference, we considered the maximum likelihood estimate (MLE) which uses the same likelihood function as in the Bayesian analysis. Those estimates are shown in Fig. 1 and were discussed previously.
Finally, on a related note, in the probabilistic estimation context, we show the marginal posterior distributions for ease of visualization. However, the joint distribution can have a more complicated structure than in one that may be imagined by composing the marginal distributions alone. This is the case when, as is typical, the turbulence modeling parameters are not all independent. This is consistent with, and is borne out by, the fact that none of the point estimates (see Fig. 1) happen to capture the mode of the joint distribution that emerges from the Bayesian analysis. Indeed, after the Bayesian analyses were conducted, a limited amount of experimentation with “basinhopping” extensions to gradientbased point estimation methods did not help such an “hybrid” approach to capture the mode of the Bayesian analysis either. Such hybrid approaches, however, need to be further investigated.
3.7 Improving the Turbulence Model Based on Results of Bayesian Analysis
As discussed above, pooled and unpooled Bayesian analyses of the RANS model given a few DNSs at different Atwood numbers helped reveal a dependence of Atwood number. That is, unlike with the point estimates, large and systematic variations were confined to just one of the parameters when Bayesian calibration was used. While is is clearly beyond the scope of this article to dwell on various optimization stragegies and their respective advantages and disadvantages, we note that in contrast to point estimates that typically rely on a local search algorithm, a global sampling strategy that underlies the Bayesian methodology is one of the reasons why the latter framework is more robust. The flip side of this is a large added computational cost and which we will address shortly. We now turn our attention to how the results of the Bayesian analyses can be leveraged to improve the turbulence model.
A straightforward way to do this would be to include or build in the discovered variation of with Atwood number. However, as previously discussed, Atwood number itself is either a feature of the initial condition or, at later times, a twopoint or global feature of the flow. As such, encoding the Atwood number dependence of a parameter would be inappropriate in the onepoint turbulence closure model that is being considered. Therefore, we look into being able to modify the turbulence model based on a local variable/correlation instead.
We discussed earlier that in this buoyancydriven flow, the Atwood number is a measure of the strength of the forcing. Next, towards identifying localvariable proxies for Atwood number, consider the temporal variation of various quantities in Figs. 2 and 4. It is sufficient to consider the DNSs for this purpose (and confine attention to any one panel in each of the two figures). In these figures, it is seen (a) that the only nonzero variable/correlation in the initial condition is , (b) that the magnitude of increases with At (Fig. 2 is at At = 0.50 and Fig. 4 is at At=0.75). These observations suggest that may be a useful local proxy for Atwood number. Consequently, we examine the variation of the maximum of (equivalently its initial value since it decays monotonically with time) with Atwood number and find a quadratic dependence ().
The identification of a local proxy for Atwood number paves the way for making the transition to a structural modification of the turbulence model starting from the discovered parameteric dependence: we now consider a new term in (2.16) of the form
(3.27) 
We repeat the pooled and unpooled Bayesian analysis of the modified turbulence model. The posterior probability distributions of the parameters are shown in Fig. 6. First, the systematic dependence of parameter (lowermiddle panel) on Atwood number that is seen in the corresponding panel of Fig. 3 is now absent. Second, no new dependence is seen to be introduced in the unpooled analysis. Furthermore, the RANS fits for the QoIs at each Atwood number are good for both pooled and unpooled calibrations. This can be seen in rows 8 and 9 of Table 1. While the good fits of the QoIs for the unpooled analysis verify that that aspect of the original model is retained, the modified turbulence model now allows, for the first time, the pooled calibration to fit the QoIs at each of the four Atwood numbers well (using a single set of parameters). This is a significant improvement of the turbulence model.
3.8 Surrogate RANS model using neural networks
As mentioned earlier, a disadvantage of the Bayesian approach is the added computational cost. The particular nature of the turbulent flow that we chose to analyze in this article was such that the RANS model could be integrated cheaply. As such, the added computational cost involved in the Bayesian analysis was not much of an issue. However, other turbulent flow settings may not be as forgiving in that a single integration/realization of the RANS model could be computationally expensive enough that the added computational cost of the Bayesian analysis may be prohibitive. New computationallyefficient sampling techniques may help alleviate this problem. However, we consider a possible alternate stragegy—that of a surrogate RANS model.
Artificial Neural Networks (ANN) have been studied extensively for their predictive capabilities in the context of machine learning, and a feedforward, multilayered neural network has been shown to be capable of approximating a continuous function arbitrarily well Cybenko (1989). Given our current problem, the use of an ANN for response surface exploration is of particular interest to us; this subject has been wellexplored (e.g., see Anjum et al., 1997).
In this study, we use an ANN to predict the temporal trajectories of turbulent correlation (see the RANS governing equations (2.122.16)) given the parameters . That is, we use the ANN to establish a mapping from the RANS parameters to the trajectories of tubulent correlations. Below, we briefly discuss some of the main aspects of developing the surrogate model including data compression, architecture of the neural network, and validation of training quality using a heldout test set.
Compression of the output vector:
While we are interested in predicting the full temporal trajectories of the QoIs (e.g., the trajectories shown in Fig. 4), the temporal trajectories contain a large number of time steps resultingly in a highdimensional output vector. At the same time, the smoothness of these trajectories suggests temporal oversampling and leads us to consider principle components as a means of compression. Therefore, we reduce the machine learning problem to predicting the PCA coefficients, and recover/reconstruct the full trajectories using the PCA bases used for the decomposition. Since four PCA components are found to explain in excess of 99% of the variance, we restrict ourselves to learning four components for each of five QoIs.
Architecture of the Neural Network:
We use a deep fullyconnected neural network (DNN, a.k.a., multilayer perceptron) with 10 hidden layers and with each layer consisting of 30 neuron units. The size of the input layer corresponds to that of the parameter vector and given the compression discussed above, the size of the ouput layer is 20.
Validation on test set:
We evaluate the performance of our neural network by using a heldout test set. The size of the test set is held fixed at 10% the size of the training set and both sets are sampled from the same distribution.
Results:
After training the ANN, we conducted unpooled Bayesian calibration of the RANS model, but now using the the DNNRANS surrogate instead of integrations of the actual RANS model. Representative sample results are shown in Fig. 7. With the DNNsurrogate, it is seen that not only are the fits to the DNS estimates of the QoIs good, but that the variation of the parameter with Atwood number is also well approximated.
In order to avoid a long digression, we skip details, but note that convergence of a Markov chain to the target posterior distribution is typically slow. For this reason, chains of length a million or more samples are typically used to ensure robustness of inferences. For the training/testing phase of the DNNsurrogate, we used a set of 500 integrations of the RANS model. After the DNNsurrogate has been trained it is fairly cheap to evaluate it. Thus in a situation where individual RANS integrations are costly this strategy can lead to significant computational savings. We do not dwell on actual differences in computational costs in this proofofprinciple demonstration for the reason that the RANS model itself is presently computationally cheap. We expect the very low end of cost advantages of the surrogate approach in cases where individual RANS integrations are expensive to be ten or more; more realistic estimates require actually working through such a case.
4 Discussion and Conclusion
In this study, we considered various methods of analyzing and calibrating an approximate model of a complex multiscale sytem given a few wellresolved simulations. We first considered various pointestimate approaches. These ranged from using previously published point estimates, to new ones that used ordinary least squares and maximum likelihood estimates. While it was found that some of the unpooled point estimates were able to fit the QoIs well, all pooled point estimates and some unpooled point estimates produced poor estimates of the QoIs. We rationalized the behavior of the point estimates by appealing to the effects of pooling versus unpooling of the different Atwood number cases and the effects of considering or not considering the dynamical nature of the approximate model.
Next we considered the behavior of the parameters themselves in the cases that produced good estimates of the QoIs. Not only were the parameter estimates themselves different, depending on the method used, but their variation with Atwood number was also very different. For these reasons, the main insight into the approximate model provided by the various point estimates was limited to indicating that the model under consideration contained enough of the relevant phenomenology to properly represent the particular flow that we consider, but that they had to be individually calibrated. While valuable, it did not point in further specific ways how the model could be improved.
We next conducted Bayesian analysis of the approximate model using the same data as was used for the point estimates. Not surprisingly, pooled and unpooled Bayesian analyses first verified and confirmed the insight provided by the point estimate approaches. That is that while the approximate model could produce QoIs that compare well with DNS estimates at each Atwood number, they are unable to do so using a single set of, now probabilistic or interval estimates. Next, however, this approach provided further specific pointers towards how this shortcoming could be overcome. Compared to the point estimates where different variations with Atwood number were produced with different variants of the estimators, the Bayesian analyses produced similar posterior distributions at all Atwood numbers for all but one of the model parameters.
The minimization of parameter variability with Atwood number may be thought of as the key contribution of the Bayesian analyses as far as insights into the workings of the approximate model is concerned. This is because, in contrast to the wide range of parameter variations seen in point estimates, the systematic variation of the posterior distribution of a minimal set of parameters (in this case, a single one) suggests that the former is likely an artifact of the pointestimate approach to calibration. Furthermore, a small set of systematic parameter variations allows for an immediate improvement of the model (as we demonstrate), with initial improvements being based purely on statistical grounds. However, such improvements could then foster better physics or dynamicsbased improvements. Given the nonuniqueness of closures, if multiple such improvements should arise, then it is possible that one can use the principle of Occam’s razor in conjunction with plausibility to choose from among them Farrell et al. (2015).
Next, we consider two aspects of computational cost associated with the procedure that we have considered and find useful: first the relative difference in cost between point estimates and probablistic estimates and next, the cost of the approximate model itself. Not surprisingly, probablistic estimates when performed using sampling are costlier than point estimates. However, this difference in cost can be greatly reduced by using variational methods of probabilistic inference, an area that has seen major improvements in the last couple of years. We present a few details below.
When using randomwalk (MetropolisHastings) based algorithms for generating members of the Markov chain, the stepsize of MCMC scales inversely as the dimension of (number of parameters being inferred): if h is the stepsize, dimension() Gelman et al. (1996). Consequently, when the dimension of is large, a larger number of MCMC steps have to be taken so that cost scales as where is the computational cost of a step. Thus, for a reasonable chain length, the computational cost of the MCMCbased scheme scales as, say, . However, by similar straightforward scaling arguments, the cost of a point estimate scales only as, say, , leading to the probabilistic estimate being costlier by a factor of about . So for , as in the present study, the cost difference is about 100 and the study was computationally feasible. However, this becomes a problem when is large.
Recent improvements in computational inference methods, however, can be brought to bear on this issue. For example, computational cost of Hamiltonian Monte Carlo only scales as, say, (e.g., see Beskos et al., 2013). Even better, variational inference methods have now been developed that bring the cost of probabilistic inference to levels comparable to that of point estimates (e.g., see Blei et al., 2017). However, it has to be noted that with variational inference, an additional error is introduced since optimal parameters for the source distributions that best match the actual posterior distribution will be found, rather than the actual posterior distribution itself as we currently compute. Nevertheless, we think that the ideas presented in this article can be scaled up to larger problems by the use of probabilistic inference (but with caveats such as the one mentioned above).
Next, we consider the issue of computational cost of one step in the inference procedure, viz. above. It is dominated by the cost of evaluating the likelihood function which in turn requires an integration of the approximate model. If an individual integration of the approximate model itself is computationally intensive (while still being orders of magnitude cheaper than wellresolved simulations) then conducting either point estimate based analysis or probablistic estimate based analysis can be computationally expensive. Towards reducing such costs, we explored the use of an artificial neural network as a surrogate model for the RANS solver and we demonstrated the feasibility of using such a ANNbased surrogate in conducting the Bayesian analysis: the Bayesian analysis using the ANNsurrogate recovered similar parametric dependencies as the Bayesian analysis that was performed using RANS integrations. Thus in a situation where individual integrations of the approximate model are themselves costly we anticipate that this strategy can lead to significant enough computational savings that the Bayesian analysis can be performed.
In other considerations, both the wellresolved model and the approximate model were deterministic in the case we chose to illustrate the process of improving the approximate model using probabilistic or interval inference. However, since the methods that we use are agnostic about the deterministic or stochastic nature of either the wellresolved model or the approximate model, application of this methodology to situations wherein either the wellresolved model or the approximate model is stochastic is easily achieved on taking into consideration prior information available about the nature of the stochasticity involved. That is, for example, we anticipate that the procedure for model improvement should work in the setting of the Markov State Model or other such models in the context of data coming from either a deterministic or stochastic, but wellresolved model. We also note that we have leveraged certain aspects of Bayesian estimation to serve the pupose of model improvement. It remains to be seen if other aspects resulting from Bayesian estimation, such as ensemble characteristics, can also be exploited for similar purposes.
Bayesian analysis can be used in many ways to help with the task of reducedorder modeling but, the fundamental aspect of Bayesian analysis that permits its use in such a fashion is related to its ability to create knowledge (e.g., see Babuska and Oden, 2004; Edeling et al., 2014b, a; Farrell et al., 2015). A few recent studies have highlighted such uses and include using Bayesian approaches to estimating and characterizing errors and uncertainty in RANS models and comparing and selecting from among different models. We add to this nascent body of literature by showing how Bayesian analysis can be leveraged to improve models themselves: the pooled and unpooled Bayesian analysis we conducted revealed unanticipated parameteric dependencies, and then we closed the analysisimprovement loop by using the discovered dependency to effect a structural modification of the model that removed the dependency while not introducing others and in a fashion that is consistent with the modeling approach (see also Nadiga and Urban, 2019; DeGennaro et al., 2018). Finally, we anticipate that this methodology will also be useful in improving reducedorder models when such models include other parameterdependent model terms that serve to either stabilize or improve the fidelity of the model.
Acknowledgements
We thank the referees for their extensive comments and suggestions. The presentation of this article has benefitted greatly from them. This research was funded by the US Department of Energy, in part under the Laboratory Directed Research and Development program at the Los Alamos National Laboratory (LANL), and in part by the Mix and Burn research initiative under the Physics and Engineering Models (PEM) component of the Advanced Simulation and Computing Program (ASC) at LANL.
References
References
 Anjum et al. [1997] M Farooq Anjum, Imran Tasadduq, and Khaled AlSultan. Response surface methodology: A neural network approach. European Journal of Operational Research, 101(1):65–73, 1997.
 Babuska and Oden [2004] Ivo Babuska and J Tinsley Oden. Verification and validation in computational engineering and science: basic concepts. Computer Methods in Applied Mechanics and Engineering, 193(36):4057–4066, 2004.
 Baldwin and Lomax [1978] Barrett Baldwin and Harvard Lomax. Thinlayer approximation and algebraic model for separated turbulentflows. In 16th aerospace sciences meeting, page 257, 1978.
 Beskos et al. [2013] Alexandros Beskos, Natesh Pillai, Gareth Roberts, JesusMaria SanzSerna, Andrew Stuart, et al. Optimal tuning of the hybrid monte carlo algorithm. Bernoulli, 19(5A):1501–1534, 2013.
 Besnard et al. [1992] Didier Besnard, FH Harlow, RM Rauenzahn, and C Zemach. Turbulence transport equations for variabledensity turbulence and their relationship to twofield models. Technical report, Los Alamos National Lab., NM (United States), 1992.
 Blei et al. [2017] David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
 Cheung et al. [2011] Sai Hung Cheung, Todd A Oliver, Ernesto E Prudencio, Serge Prudhomme, and Robert D Moser. Bayesian uncertainty analysis with applications to turbulence modeling. Reliability Engineering & System Safety, 96(9):1137–1149, 2011.
 Chien [1982] KueiYuan Chien. Predictions of channel and boundarylayer flows with a lowreynoldsnumber turbulence model. AIAA journal, 20(1):33–38, 1982.
 Cybenko [1989] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
 DeGennaro et al. [2018] Anthony M DeGennaro, Nathan M Urban, Balasubramanya T Nadiga, and Terry Haut. Model structural inference using local dynamic operators. arXiv preprint arXiv:1803.03731; International Journal for Uncertainty Quantification (to appear), 2018.
 Deuflhard et al. [2000] Peter Deuflhard, Wilhelm Huisinga, Alexander Fischer, and Ch Schütte. Identification of almost invariant aggregates in reversible nearly uncoupled markov chains. Linear Algebra and its Applications, 315(13):39–59, 2000.
 Dorrestijn et al. [2013] Jesse Dorrestijn, Daan T Crommelin, A Pier Siebesma, and Harm JJ Jonker. Stochastic parameterization of shallow cumulus convection estimated from highresolution model data. Theoretical and Computational Fluid Dynamics, 27(12):133–148, 2013.
 Duan and Nadiga [2007] Jinqiao Duan and Balasubramanya Nadiga. Stochastic parameterization for large eddy simulation of geophysical flows. Proceedings of the American Mathematical Society, 135(4):1187–1196, 2007.
 Durbin [1991] Paul A Durbin. Nearwall turbulence closure modeling without âdamping functionsâ. Theoretical and Computational Fluid Dynamics, 3(1):1–13, 1991.
 Edeling et al. [2014a] WN Edeling, Pasquale Cinnella, and Richard P Dwight. Predictive rans simulations via bayesian modelscenario averaging. Journal of Computational Physics, 275:65–91, 2014a.
 Edeling et al. [2014b] WN Edeling, Pasquale Cinnella, Richard P Dwight, and Hester Bijl. Bayesian estimates of parameter variability in the k– turbulence model. Journal of Computational Physics, 258:73–94, 2014b.
 Farrell et al. [2015] Kathryn Farrell, J Tinsley Oden, and Danial Faghihi. A bayesian framework for adaptive selection, calibration, and validation of coarsegrained models of atomistic systems. Journal of Computational Physics, 295:189–208, 2015.
 Gelman et al. [1996] Andrew Gelman, Gareth O Roberts, Walter R Gilks, et al. Efficient metropolis jumping rules. Bayesian statistics, 5(599608):42, 1996.
 Gerber and Horenko [2017] Susanne Gerber and Illia Horenko. Toward a direct and scalable identification of reduced models for categorical processes. Proceedings of the National Academy of Sciences, page 201612619, 2017.
 Haario et al. [2001] Heikki Haario, Eero Saksman, Johanna Tamminen, et al. An adaptive metropolis algorithm. Bernoulli, 7(2):223–242, 2001.
 Haario et al. [2006] Heikki Haario, Marko Laine, Antonietta Mira, and Eero Saksman. Dram: efficient adaptive mcmc. Statistics and Computing, 16(4):339–354, 2006.
 Jolliffe [2011] Ian Jolliffe. Principal component analysis. Springer, 2011.
 Launder and Sharma [1974] BE Launder and BI Sharma. Application of the energydissipation model of turbulence to the calculation of flow near a spinning disc. Letters in heat and mass transfer, 1(2):131–137, 1974.
 Livescu and Ristorcelli [2007] Daniel Livescu and JR Ristorcelli. Buoyancydriven variabledensity turbulence. Journal of Fluid Mechanics, 591:43–71, 2007.
 Livescu et al. [2009] Daniel Livescu, JR Ristorcelli, RA Gore, SH Dean, WH Cabot, and AW Cook. Highreynolds number rayleigh–taylor turbulence. Journal of Turbulence, 10:N13, 2009.
 Nadiga and Urban [2019] Balasubramanya T Nadiga and Nathan M Urban. Improved representation of ocean heat content in energy balance models. Climatic Change, pages 1–14, 2019.
 Nadiga [2008] BT Nadiga. Orientation of eddy fluxes in geostrophic turbulence. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 366(1875):2489–2508, 2008.
 Nadiga and Livescu [2007] BT Nadiga and D Livescu. Instability of the perfect subgrid model in implicitfiltering large eddy simulation of geostrophic turbulence. Physical Review E, 75(4):046303, 2007.
 Nadiga and Urban [2018] BT Nadiga and NM Urban. Dependence of inferred climate sensitivity on the discrepancy model. arXiv preprint arXiv:1809.04068, 2018.
 Oliver and Moser [2011] Todd A Oliver and Robert D Moser. Bayesian uncertainty quantification applied to rans turbulence models. In Journal of Physics: Conference Series, volume 318, page 042032. IOP Publishing, 2011.
 Pande et al. [2010] Vijay S Pande, Kyle Beauchamp, and Gregory R Bowman. Everything you wanted to know about markov state models but were afraid to ask. Methods, 52(1):99–105, 2010.
 Plant and Craig [2008] RS Plant and George C Craig. A stochastic parameterization for deep convection based on equilibrium statistics. Journal of the Atmospheric Sciences, 65(1):87–105, 2008.
 Pope [2001] Stephen B Pope. Turbulent flows, 2001.
 Roache [1998] Patrick J Roache. Verification and validation in computational science and engineering. Hermosa, 1998.
 Schmid [2010] Peter J Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656:5–28, 2010.
 Schütte et al. [2011] Christof Schütte, Frank Noé, Jianfeng Lu, Marco Sarich, and Eric VandenEijnden. Markov state models based on milestoning. The Journal of chemical physics, 134(20):05B609, 2011.
 Schwarzkopf et al. [2016] JD Schwarzkopf, D Livescu, JR Baltzer, RA Gore, and JR Ristorcelli. A twolength scale turbulence model for singlephase multifluid mixing. Flow, Turbulence and Combustion, 96(1):1–43, 2016.
 Schwarzkopf et al. [2011] John D Schwarzkopf, Daniel Livescu, Robert A Gore, Rick M Rauenzahn, and J Raymond Ristorcelli. Application of a secondmoment closure model to mixing processes involving multicomponent miscible fluids. Journal of Turbulence, 12:N49, 2011.
 Spalart and Allmaras [1992] PRaA Spalart and S1 Allmaras. A oneequation turbulence model for aerodynamic flows. In 30th aerospace sciences meeting and exhibit, page 439, 1992.
 Tierney and Mira [1999] Luke Tierney and Antonietta Mira. Some adaptive monte carlo methods for bayesian inference. Statistics in medicine, 18(1718):2507–2515, 1999.
 Wang et al. [2012] Zhu Wang, Imran Akhtar, Jeff Borggaard, and Traian Iliescu. Proper orthogonal decomposition closure models for turbulent flows: a numerical comparison. Computer Methods in Applied Mechanics and Engineering, 237:10–26, 2012.