A Frequentist Approach to Computer Model Calibration
Abstract
This paper considers the computer model calibration problem and provides a general frequentist solution. Under the proposed framework, the data model is semiparametric with a nonparametric discrepancy function which accounts for any discrepancy between the physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, this paper proposes a new and identifiable parametrization of the calibration problem. It also develops a twostep procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates of convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. The practical performance of the proposed methodology is illustrated through simulation examples and an application to a computational fluid dynamics model.
Keywords: bootstrap; inverse problem; model misspecification; semiparametric modeling; surrogate model; uncertainty analysis
1 Introduction
In many scientific studies, complex mathematical models, implemented as computer codes, are often used to model the physical reality (see, e.g., Santner et al., 2003; Fang et al., 2010). Such computer codes are also known as computer models, and can only be executed when certain model parameters are prespecified. The goal of computer model calibration is to find the model parameter values that allow the computer model to best reproduce the physical reality.
In the computer model calibration problem (Kennedy and O’Hagan, 2001), an output is observed from the physical reality at locations of a variate input :
where is the measurement error for the th observation. It is assumed that the user can select the values of the design locations . A computer model , also called the simulator, can be used to approximate the physical reality when its model parameter is selected to be close to the unknown ideal value . To account for the discrepancy between the simulator and the physical reality, one can introduce a discrepancy function and the model of the experimental data becomes
(1) 
To use (1) in practice, one first needs to estimate and , which requires evaluation of at many different values of and . However, the evaluation of is often very computationally expensive due to the complex nature of the mathematical models. This complication can be alleviated via the use of a surrogate model, often referred to as an emulator, for the simulator (e.g., Currin et al., 1991; Kennedy and O’Hagan, 2001; Higdon et al., 2004; Drignei and Morris, 2006; Conti et al., 2009; Reich et al., 2009). Typically a Gaussian process (GP) is assumed for the emulator to allow for a flexible model of the simulator. Moreover, the discrepancy function is also often modeled by a GP. In order to construct an emulator, an additional set of outputs ’s generated from the simulator is obtained at design locations . Thus, there are two observed data sets: the experimental data set obtained from the physical reality and the simulator data set generated by the simulator . Note that the experimental and simulator data sets are fundamentally different. The simulator data are generated solely from the simulator and so they are not directly related to the physical reality. Moreover, the designs of the two data sets do not have to match. That is, does not necessarily equal to either in value or sampling distribution. In this setting, the goal is to estimate , and .
Traditionally, this estimation problem is solved within a Bayesian framework (e.g., Kennedy and O’Hagan, 2001; Higdon et al., 2004; Oakley and O’Hagan, 2004; Bayarri et al., 2007; Higdon et al., 2008; Storlie et al., 2014), partly because of the GP’s ability to incorporate uncertainty about the surrogate model for , and a similar ability to provide uncertainty about the discrepancy function . However, in this paper we are interested in solving this problem from a purely frequentist perspective with a nonparametric discrepancy function, and at the same time also accounting for uncertainty in the model parameters, the surrogate model, and the discrepancy function. To the best of our knowledge, there has been no previous attempt to solve the computer model calibration problem in this manner. Yet, there are several reasons for doing this:

The proposed approach is conceptually clean and simple, easy to understand, and can be paired with any choice of surrogate model.

It delivers a complementary calibration result to the Bayesian approach, allowing each approach to provide some qualitative and quantitative confirmation of the other.

Researchers using computational models may be opposed to the Bayesian calibration approach because of the complex prior assumptions of GP and of an identifiability issue between the surrogate model and the discrepancy function (see Section 2.1). Consequently they may prefer the proposed frequentist approach instead.
The proposed frequentist approach explicitly accounts for all potentially important sources of uncertainty, and is a viable alternative to the Bayesian approach. While any statistical model makes assumptions, there are fewer assumptions necessary in the proposed approach than in the Bayesian counterpart. For example, prior distributions on the surrogate model and the discrepancy function are replaced with “smoothness” assumptions and an emphasis is intuitively placed on the ability to predict the experimental data via crossvalidation. Empirical evidence shows that the proposed frequentist approach tends to outperform existing Bayesian methods on the examples considered.
Lastly we remark that there are some frequentist solutions for the computer model calibration problem. The most common one involves obtaining the maximum likelihood estimate (MLE) for directly by evaluating sequentially in an optimization routine (e.g., Vecchia and Cooley, 1987; Jones et al., 1998; Huang et al., 2006). If is computationally expensive then as before a surrogate model could be used in place of for the purpose of obtaining an MLE for . However, this latter approach must be used with caution as to not ignore the estimation uncertainty in the surrogate model for , which can often be substantial. Also, unlike the proposed approach, most previous frequentist methods suffer from the same shortcoming: no discrepancy modeling is included in their formulation (i.e., ). While some surrogate models may be very good approximations, no model is perfect and neglecting the discrepancy can be a major pitfall (Bayarri et al., 2007; Brynjarsdóttir and O’Hagan, 2014). One notable exception is the pioneering work by Joseph and Melkote (2009), which substantially improves previous frequentist model calibration methods by assuming a parametric form for the discrepancy function . Our work brings frequentist calibration approaches to to the next level in several respects: (i) we allow for a nonparametric (or semiparametric) to provide a practical representation of ignorance for the discrepancy; (ii) we present a rigorous justification for the proposed estimates, and (iii) we deliver a simple, yet general procedure to account for the uncertainty in the surrogate model, the model parameters, and the discrepancy. As discussed in detail below, the use of a nonparametric gives rise to an identifiability issue and some theoretical challenges, which separates our work from others.
The rest of this article is organized as follows. In Section 2 we introduce an identifiable definition of the calibration problem, and provide a practical estimation procedure. Section 3 illustrates how the bootstrap methodology can be applied to provide uncertainty quantification for parameters of interest. Theoretical backup for our estimation procedure is presented in Section 4. The practical performance of our methodology is illustrated via a real life computational fluid dynamics model in Section 5, while concluding remarks are given in Section 6. Additional material including simulation experiments and technical details are provided in a separate online supplemental document.
2 The Proposed Approach
Consider the semiparametric model (1) for the physical reality. Despite its popularity under the Bayesian framework, this model is not identifiable in the frequentist regime, where and are treated as fixed but unknown quantities to be estimated. In the following we will first discuss this nonidentifiability issue and provide intuitive and identifiable definitions for and under model (1) (Sections 2.1). We then develop an efficient method for estimating these quantities when the simulator is known (Section 2.2) and unknown (Section 2.3). As to be seen below, the proposed frequentist framework is very general and covers many practical situations. Its estimation procedure can be paired with any existing optimization techniques and nonparametric regression methods, therefore provides an effective and flexible approach to estimate and with convenient implementation.
2.1 An Identifiable Formulation
In model (1), the discrepancy function is assumed to be an unconstrained smooth function, and as such will be estimated using nonparametric regression techniques. To see the nonidentifiability of (1), consider two different values and for , and write and . As both and give the same distribution for , model (1) in general is unidentifiable. In the Bayesian paradigm, with the help of suitable priors, the posterior distributions for and are technically welldefined. However, this unidentifiability still poses many problems at a more foundational level. For example, it forbids any meaningful construction of uncertainty measures for or , as these target quantities do not have unique definitions. We also note that previous frequentist methods bypass this issue by setting .
Now we provide natural and sensible definitions of and to achieve identifiable modeling. Write the spaces of model parameters and inputs as and , respectively. We propose the following model for the physical reality :
where is the simulator and is the discrepancy function, both to be modeled by smooth functions. We define
(2) 
where the distribution characterizes a weighting scheme of . For simplicity, we assume that is the sampling distribution of . This is mostly sensible since the sampling design, being user controllable, would characterize one’s attention to over different values of . Thus this should also be reflected in the definition of . For common sampling design schemes, is intuitively defined as an identity function and thus is the minimizer of the integrated squared error between and . However, if a weighting scheme, other than the sampling distribution of , is intended, our procedure can be modified easily by introducing a simple reweighting (to defined below in (4)) to estimate the corresponding .
Under a mild regularity assumption (Assumption 5 below), the solution of the minimization problem in (2) is unique, and hence it is straightforward to see that both and are identifiable.
The definition (2) for is logical and natural, as it aligns with the intuition that should be the value that makes closest to , and is used to account for the remainder. Using other arguments or motivation, it is of course possible to provide alternative identifiable definitions for and . For example, Brynjarsdóttir and O’Hagan (2014) argue that the the “best fitting” model is not always the most desirable. However, our definition (2) leads directly to a straightforward and easily implementable estimation procedure, as to be described below. We also note that our definition is in the same spirit as Walker (2013).
2.2 Estimation when the Simulator is Known
Suppose we observe the output of the physical reality at locations ; i.e., , , where is the th observation error. These errors are assumed to be independent and have mean 0. For simplicity, we assume the variate ’s have been scaled such that . With the above modeling of , the observations are assumed to follow
(3) 
For the estimation of and , we first consider a simpler situation for which can be assumed known or evaluated rapidly. For cases when this assumption is not true, we will estimate with a second set of samples and the details are described later in Section 2.3.
The definitions of and naturally motivate a twostep estimation procedure:

Estimation of : compute the estimate of as the solution to the following minimization problem
(4) 
Estimation of : estimate by applying any nonparametric regression method to the “data” .
This estimation procedure has its beauty in flexibility and ease of implementation. It can be coupled with any (global) minimization technique in Step 1 and any nonparametric regression method in Step 2. For example, for convenience one could adopt an existing and fast offtheshelf minimization routine for Step 1, and wavelet technique for Step 2 if one believes is mostly smooth with a few sharp jumps. Also, as the estimation of and that of are separated, there is no need to rerun the minimization in Step 1 when choosing the smoothing parameter in the nonparametric regression step. Thus in general this estimation procedure can be made very fast computationally with suitable choices of minimization and nonparametric regression techniques. For all the numerical illustrations in this paper, we adopt the genetic optimization using derivative (Sekhon and Mebane, 1998) for Step 1 and smoothing spline ANOVA (Wahba, 1990) for Step 2. The theoretical support for using these estimators are provided in Section 4 under a fixed design setting.
2.3 Estimation when the Simulator is Approximated with an Emulator
This subsection handles the situation when the simulator is unknown or expensive to run. As mentioned in introduction, a common strategy to overcome this issue is to approximate with a surrogate model, also known as an emulator. The emulator is estimated nonparametrically from a second set of observations obtained by running the simulator for different combinations of inputs and model parameters . Write the simulator output at design locations as . They are assumed to follow
where ’s are independent random errors with mean zero. The underlying physical model is typically continuous and in theory these ’s should not be needed. However, they are included here to allow for numerical jitter in the simulator evaluations due to various reasons such as convergence criteria.
The proposed approach proceeds as follows. We first use to fit an emulator via a nonparametric regression method such as SSANOVA (Wahba, 1990). Denote the resulting emulator as . We then treat as fixed and replace by in (3) when estimating and via the estimation procedure proposed in Section 2.2. We note that the parameter can be constrained to a particular domain, as is often done (albeit with the ability to be more informative) in the Bayesian calibration approach via a prior distribution.
In the above description the estimation of and is done separately. One could alternatively perform a joint estimation by combining the two estimation problems into one semiparametric optimization problem. However, in this case (as in the Bayesian approach) the experimental data will influence the estimation of the emulator. This could be beneficial by providing a smaller variance to the emulator, but it could also be problematic by inserting a large bias into the emulator toward the reality function. The more conservative approach taken here eliminates this potential bias by removing the influence of the experimental data on the emulator.
To the best of our knowledge, this approach to obtain a point estimate for the calibration problem with discrepancy ( and ) for a computationally demanding simulator has not been attempted until now. The above description does not yet account for the uncertainty in the estimation of the emulator, model parameters, or the discrepancy function. However, this issue can easily be addressed via the bootstrap method (see, e.g., Efron and Tibshirani, 1993), to be described next.
3 Uncertainty Quantification using Bootstrap
In computer modeling, bootstrap has been applied successfully to quantify the uncertainty in the emulator for the purpose of sensitivity analysis (SA) and uncertainty analysis (UA) for computationally demanding simulators (e.g., Storlie et al., 2009, 2013). It is therefore expected that bootstrap will also provide equally successful results for the current calibration problem. However, it is noted that this calibration problem is far more complicated than SA and UA due to the additional estimation of and .
Let the point estimates of the unknown parameters in model (1) be obtained as described above and denoted as , , and . These define an estimate for the data generating process for both the simulator data and experimental data . A bootstrap sample for the calibration problem can be generated with the following steps:

(Optional) Resample the designs in both data sets if the data were generated under random designs.

Produce bootstrap samples by resampling centered residuals.

Reestimate the parameters to obtain bootstrap estimates of , , and . Denote them as , , and , , respectively.
The resulting bootstrap sample of the estimates can be used to obtain a bootstrap confidence region for most quantities of interest.
In calibration problems, confidence intervals for the elements of and pointwise confidence bands for are usually of interest. For example, to obtain a confidence interval for the first element of , one can find the and sample quantiles from , where represents the first element of for . Denote these quantiles as and , respectively. The required confidence interval is then given by . A confidence interval for a prediction of the physical reality at any new input and the pointwise confidence band for can be obtained in a similar fashion.
Since our estimation procedure involves nonparametric regression, the impact of bias may lead to incorrect asymptotic coverage of the aforementioned bootstrap confidence regions (see, e.g., Härdle and Bowman, 1988; Hall, 1992a, b). In the literature, there are two common strategies for correcting the coverage: undersmoothing and oversmoothing. As shown in Hall (1992a), undersmoothing is a simpler and more effective strategy than oversmoothing. Our estimation procedure can be easily modified to incorporate undersmoothing; e.g., by choosing a smaller smoothing parameter. However, the gain in practical performance of these strategies are usually small and most of these strategies involve another adhoc choice of the amount of under or oversmoothing. Moreover, it is not uncommon to ignore this bias issue, essentially resulting in the use of nonadjusted confidence regions as described above; see, e.g., Efron and Tibshirani (1993) and Ruppert et al. (2003). To keep the approach simple and adaptable to a wide class of nonparametric regression methods, we recommend using the nonadjusted confidence regions.
4 Theoretical Results
This section provides theoretical support to the proposed estimation procedure presented in Section 2. First recall that the estimation of depends on a second independent sample generated from the simulator of size . In practice this sample is typical much larger than the sample obtained from the physical reality; i.e., . Thus, it is reasonable to assume that approaches infinity at a faster rate than in the asymptotic framework. If fast enough, the asymptotics of and would be similar to those under known . Therefore, for simplicity and to speed up the development, in the following we assume is known and derive the asymptotic properties of and defined in Section 2.2.
Write and as and respectively to address their dependence on . In the following, we assume that are fixed and use to denote their empirical distribution function. In addition, , and represent the norm, the norm and the Euclidean norm respectively. For two functions and , let and . With slight notation abuse, we also write and . Lastly, write , and for a function . And, use to represent the th order derivative of with respect to for .
When deriving asymptotic results for similar statistical problems, it is relatively common to assume an independent and identically distributed (i.i.d.) random design, as it is easier than a fixed design to work with. However, for most practical calibration problems, the design is either fixed or correlated (e.g., Latin Hypercube sampling). Therefore the following results are developed under a fixed design, despite it is a more challenging setting than the i.i.d. random design. Note that model (3) is a semiparametric model and we first approach the parametric part and establish the consistency of in (see Theorem 1), where the difficulty lies in the existence of the discrepancy function . The effect is similar to a regression model with misspecification.
As for the nonparametric part, , we adopt the framework of Section 10.1 of van de Geer (2000) for penalized least squares estimation. We extend Theorem 10.2 of van de Geer (2000) to obtain the asymptotic behavior of (see Lemma 2 of the supplemental document and Theorem 2), taking into account the effect of estimation error of . Let the class of functions to which belongs be . We suppose that is a cone. Under van de Geer’s framework, the general form of the estimate of is
(5) 
where , , is a pseudonorm on . The is known as the smoothing parameter.
As an illustration, we provide the convergence rate of for if a penalized smoothing spline is used (see Corollary 1). This requires an additional orthogonality argument for the application of Theorem 2. We will write as when .
Below are the assumptions needed for our theoretical results.
Assumption 1 (Error structure).
and for . Also, are uniformly subGaussian; that is, there exist and such that
Assumption 2 (Parameter space).
is a totally bounded dimensional Euclidean space. That is, there exists an such that for all .
Assumption 3 (Function class ).

There exists a such that for all .

is twice continuously differentiable with respect to in a neighborhood of . and are continuous with respect to over this neighborhood.

and are bounded uniformly over a neighborhood of .
Assumption 4 (Convergence of design).

.

Elements of are , where
are the second derivative of evaluated at and that of respectively.

Euclidean norm of the first derivative of evaluated at , , is .
Assumption 5 (Identification).
is strictly positive definite.
Assumption 6 (Discrepancy function).

is bounded.

.

There exist and such that , for all and , where
and is the entropy of for the metric (Definition 2.1 of van de Geer, 2000).

.
The two main theorems and a corollary now follow. Their proofs can be found in Section S3 of the supplemental document.
Theorem 2 (Rate of convergence of ).
5 Application to a Computational Fluid Dynamics Model
In this section, the proposed approach is applied to a computational fluid dynamics (CFD) model of a bubbling fluidized bed (Lane et al., 2014). The experimental apparatus used to produce the field data used here is the the Carbon Capture Unit (C2U) housed at National Energy Technology Laboratory. The C2U unit is a benchtop carbon capture system, designed to mimic a postcombustion capture device that could be applied at a coal fired power plant. Section S1 of the supplemental document provides an illustration of the C2U system. A gas mixture (i.e., flue gas) flows through the bottom of the adsorber pictured on the right of the figure shown in Section S1 of the supplemental document and into the bed of solid sorbent, resulting in fluidization of the sorbent. At low temperature (C), CO will chemically bond to the solid sorbent and be effectively lifted out of the gas mixture. The solid sorbent would then circulate out of the adsorber, be stripped of CO, and flow back into the adsorber. However, in this example, the goal was to isolate the fluid dynamics of the bubbling fluidized bed, and thus there is only nitrogen gas flowing through the solid sorbent material in the bed (i.e., no CO adsorption is taking place). These data were collected as part of Department of Energy’s (DOE’s) Carbon Capture Simulation Initiative (CCSI) (Miller et al., 2014). The open source CFD code Multiphase Flow with Interphase eXchanges (MFIX) (Benyahia et al., 2012) was used as the simulator of the bubbling fluidized bed. The experimental setup and the MFIX model used to simulate it are fully documented in Lai et al. (2014). Below, only an abridged description of the data is provided.
The variables involved in the experimental data are the input variables flow rate (, FRate) and bed temperature (, Temp), and the output variable () is the pressure drop at location P3820 (i.e., the pressure drop across the bubbling fluidized bed). The pressure drop output is the time averaged value of the pressure drop once it was oscillating in steady state. The P3820 pressure drop was measured on the physical C2U system at a design of 44 distinct input settings. A total of 60 MFIX simulation cases were also designed and run for the purpose of emulator estimation. Both data sets are available online at the journal website. The MFIX model parameters involved in the calibration for this case were ResPP (): the particleparticle coefficient of restitution, ResPW (): the particlewall coefficient of restitution, FricAngPP (): the particleparticle friction angle, FricAngPW (): the particlewall friction angle, PBVF (): Packed bed void fraction, and PartSize (): the effective particle diameter of the sorbent material. The allowable ranges of the model parameters were chosen to be the same as those devised in Lai et al. (2014), mostly from literature review.
Table 1 provides the estimated values along with 95% confidence intervals and credible intervals, respectively, for the proposed fbootrs method and the bssanova method, both are numerically tested by simulations described in Section S2 of the supplemental document:

fbootrs: The proposed frequentist method coupled with the bootstrap procedure of Section 3 with resampling of the design (i.e., keep Step 1).

bssanova: Calibration of computational models via Bayesian smoothing spline ANOVA (Storlie et al., 2014).
Both methods largely agree on their respective estimates and CIs, which provides some confirmation of the result. The first five parameters have fairly wide CIs relative to their allowable ranges, indicating that most of the range of these parameters produces reasonable model results. However (effective particle size) does have tighter CIs and it seems as though values closer to 117 are preferred.
Figure 1 provides a visual summary for the simulator fit to the experimental data along with confidence bands (accounting for uncertainty in the emulator and the value of ). The simulator with discrepancy (i.e., reality) predictions are provided as well. The pressure drop () is plotted against Temp () for six distinct values of FRate (). The experimental data is also provided (along with the estimated 2 measurement error bars). The experimental data was binned into the closest value of the six displayed FRate for display purposes. It is clear that the discrepancy is trending upwards as FRate increases. Figure 2 makes this relationship more explicit by isolating the discrepancy main effect functions across FRate and Temp, respectively. While it is evident that there is some statistically significant model discrepancy here, such discrepancy is relatively small when considering the magnitude of the pressure drop: the discrepancy is on the order of 0.05 kPa, while the pressure drop is on the order of 0.72 kPa, thus a relative error of roughly 7%. Thus, for practical purposes MFIX can be used for prediction of a bubbling fluidized bed, knowing that the model form discrepancy is negligible.
6 Concluding Remarks
In this work, we have provided a frequentist framework for computer model calibration. This framework applies a general semiparametric data model with an emulator for expensive simulators and a discrepancy function, which allows discrepancy between the simulator and the physical reality. Despite the flexibility of the model, our proposed framework gives identifiable parametrizations for both the model parameters and the discrepancy function. A practical estimation procedure and theoretical guarantees are provided for the proposed framework. Finally, a bootstrapping approach to provide uncertainty quantification for virtually any quantity of interest has also been developed. Due to the simplicity of the proposed calibration framework and the corresponding bootstrap, this approach can be easily coupled with a variety of optimization methods and/or emulators, which is beneficial to practitioners.
Three objectives of calibration are identified by Brynjarsdóttir and O’Hagan (2014): to study the true values of the physical parameters, to predict the physical system’s behaviour within the scope of the observed data (i.e., interpolation) and outside the scope of the data (i.e., extrapolation). It is important to examine the usefulness of the proposed approach with respect to these objectives.
The proposed approach was not designed to, in general, address the first objective, unless the discrepancy is negligible and represents the physical parameters. The proposed approach, however, should work well for the second interpolation objective, although at times it may not be clear what is the phyiscal meaning of . The third objective (extrapolation) is much more challenging. As pointed out by Brynjarsdóttir and O’Hagan (2014), the key to successful extrapolation requires that both and be meaningfully defined, and that accurate prior information for them are available. Under these situations Bayesian approach would be useful, and it is not advisable to use the proposed approach, especially when is large.
This paper is not suggesting that the proposed approach is correct for all problems. However, when there is no good prior information about some of the model parameters and/or discrepancy, other approaches may not be applicable or could produce unreliable results. It is in such cases that the proposed approach can be beneficial above and beyond the general benefits provided by the analytic study of its properties.
References
 Bayarri et al. (2007) Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Cafeo, J. A., Cavendish, J., Lin, C.H. and Tu, J. (2007) A framework for validation of computer models. Technometrics, 49, 138–154.
 Benyahia et al. (2012) Benyahia, S., Syamlal, M. and O’Brien, T. (2012) Summary of MFIX equations 20121. National Energy Technology Laboratory Technical Report. URL https://mfix.netl.doe.gov/documentation/MFIXEquations20121.pdf.
 Brynjarsdóttir and O’Hagan (2014) Brynjarsdóttir, J. and O’Hagan, A. (2014) Learning about physical parameters: The importance of model discrepancy. Inverse Problems, 30, 114007.
 Conti et al. (2009) Conti, S., Gosling, J. P., Oakley, J. and O’Hagan, A. (2009) Gaussian process emulation of dynamic computer codes. Biometrika, 96, 663–676.
 Currin et al. (1991) Currin, C., Mitchell, T., Morris, M. and Ylvisaker, D. (1991) Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86, 953–963.
 Drignei and Morris (2006) Drignei, D. and Morris, M. D. (2006) Empirical Bayesian analysis for computer experiments involving finitedifference codes. Journal of the American Statistical Association, 101, 1527–1536.
 Efron and Tibshirani (1993) Efron, B. and Tibshirani, R. J. (1993) An introduction to the bootstrap. New York: Chapman & Hall.
 Fang et al. (2010) Fang, K.T., Li, R. and Sudjianto, A. (2010) Design and modeling for computer experiments. Boca Raton: Chapman & Hall.
 Hall (1992a) Hall, P. (1992a) Effect of bias estimation on coverage accuracy of bootstrap confidence intervals for a probability density. The Annals of Statistics, 20, 675–694.
 Hall (1992b) Hall, P. (1992b) On bootstrap confidence intervals in nonparametric regression. The Annals of Statistics, 20, 695–711.
 Härdle and Bowman (1988) Härdle, W. and Bowman, A. W. (1988) Bootstrapping in nonparametric regression: Local adaptive smoothing and confidence bands. Journal of the American Statistical Association, 83, 102–110.
 Higdon et al. (2008) Higdon, D., Gattiker, J., Williams, B. and Rightley, M. (2008) Computer model validation using highdimensional output. Journal of the American Statistical Association, 103, 570–583.
 Higdon et al. (2004) Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A. and Ryne, R. D. (2004) Combining field data and computer simulations for calibration and prediction. SIAM Journal on Scientific Computing, 26, 448–466.
 Huang et al. (2006) Huang, D., Allen, T. T., Notz, W. I. and Zeng, N. (2006) Global optimization of stochastic blackbox systems via sequential kriging metamodels. Journal of Global Optimization, 34, 441–466.
 Jones et al. (1998) Jones, D. R., Schonlau, M. and Welch, W. J. (1998) Efficient global optimization of expensive blackbox functions. Journal of Global optimization, 13, 455–492.
 Joseph and Melkote (2009) Joseph, V. R. and Melkote, S. N. (2009) Statistical adjustments to engineering models. Journal of Quality Technology, 41, 362–375.
 Kennedy and O’Hagan (2001) Kennedy, M. C. and O’Hagan, A. (2001) Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B, 63, 425–464.
 Lai et al. (2014) Lai, K., Xu, Z., Pan, W., Shadle, L., Storlie, C., Dietiker, J., Li, T., Dartevelle, S. and Sun, X. (2014) Hierarchical calibration and validation of highfidelity CFD models with C2U experiments. Milestone Report.
 Lane et al. (2014) Lane, W., Storlie, C., Montgomery, C. and Ryan, E. (2014) Numerical modeling and uncertainty quantification of a bubbling fluidized bed with immersed horizontal tubes. Powder Technology, 253, 733–743.
 Miller et al. (2014) Miller, D., Syamlal, M., Mebane, D., Storlie, C., Bhattacharyya, D., Sahinidis, N., Agarwal, D., Tong, C., Zitney, S., Sarkar, A., Sun, X., Sundaresan, S., Ryan, E., Engel, D., and Dale, C. (2014) Carbon capture simulation initiative: A case study in multiscale modeling and new challenges. Annual Review of Chemical and Biomolecular Engineering, 5, 301–323.
 Oakley and O’Hagan (2004) Oakley, J. and O’Hagan, A. (2004) Probabilistic sensitivity analysis of complex models: A Bayesian approach. Journal of the Royal Statistical Society: Series B, 66, 751–769.
 Reich et al. (2009) Reich, B. J., Storlie, C. B. and Bondell, H. D. (2009) Variable selection in Bayesian smoothing spline ANOVA models: Application to deterministic computer codes. Technometrics, 51, 110–120.
 Ruppert et al. (2003) Ruppert, D., Wand, M. P. and Carroll, R. J. (2003) Semiparametric Regression. New York: Cambridge University Press.
 Santner et al. (2003) Santner, T., Williams, B. and Notz, W. (2003) The Design and Analysis of Computer Experiments. New York: Springer.
 Sekhon and Mebane (1998) Sekhon, J. S. and Mebane, W. R. (1998) Genetic optimization using derivatives. Political Analysis, 7, 187–210.
 Storlie et al. (2014) Storlie, C. B., Lane, W. A., Ryan, E. M., Gattiker, J. R. and Higdon, D. M. (2014) Calibration of computational models with categorical parameters and correlated outputs via Bayesian smoothing spline ANOVA. Journal of the American Statistical Association. To appear.
 Storlie et al. (2013) Storlie, C. B., Reich, B. J., Helton, J. C., Swiler, L. P. and Sallaberry, C. J. (2013) Analysis of computationally demanding models with continuous and categorical inputs. Reliability Engineering & System Safety, 113, 30–41.
 Storlie et al. (2009) Storlie, C. B., Swiler, L. P., Helton, J. C. and Sallaberry, C. J. (2009) Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliability Engineering & System Safety, 94, 1735–1763.
 van de Geer (2000) van de Geer, S. (2000) Empirical Processes in Mestimation. New York: Cambridge University Press.
 Vecchia and Cooley (1987) Vecchia, A. V. and Cooley, R. L. (1987) Simultaneous confidence and prediction intervals for nonlinear regression models with application to a groundwater flow model. Water Resources Research, 23, 1237–1250.
 Wahba (1990) Wahba, G. (1990) Spline Models for Observational Data. Philadelphia: SIAM.
 Walker (2013) Walker, S. G. (2013) Bayesian inference with misspecified models. Journal of Statistical Planning and Inference, 143, 1621–1633.