A theoretical framework for calibration in computer models: parametrization, estimation and convergence properties Tuo’s research is partially sponsored by the Office of Advanced Scientific Computing Research; U.S. Department of Energy, project No. ERKJ259 “A mathematical environment for quantifying uncertainty: Integrated and optimized at the extreme scale.” The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725. Tuo’s work is also supported by the National Center for Mathematics and Interdisciplinary Sciences, CAS and NSFC 11271355. Wu’s research is supported by NSF DMS-1308424 and DOE DE-SC0010548.

# A theoretical framework for calibration in computer models: parametrization, estimation and convergence properties ††thanks: Tuo’s research is partially sponsored by the Office of Advanced Scientific Computing Research; U.S. Department of Energy, project No. ERKJ259 “A mathematical environment for quantifying uncertainty: Integrated and optimized at the extreme scale.” The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725. Tuo’s work is also supported by the National Center for Mathematics and Interdisciplinary Sciences, CAS and NSFC 11271355. Wu’s research is supported by NSF DMS-1308424 and DOE DE-SC0010548.

Rui Tuo Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China 100190; and Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 (tuorui@amss.ac.cn).    C. F. Jeff Wu School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332 (jeffwu@isye.gatech.edu).
###### Abstract

Calibration parameters in deterministic computer experiments are those attributes that cannot be measured or available in physical experiments. Kennedy and O’Hagan [18] suggested an approach to estimate them by using data from physical experiments and computer simulations. A theoretical framework is given which allows us to study the issues of parameter identifiability and estimation. We define the -consistency for calibration as a justification for calibration methods. It is shown that a simplified version of the original KO method leads to asymptotically -inconsistent calibration. This -inconsistency can be remedied by modifying the original estimation procedure. A novel calibration method, called the calibration, is proposed and proven to be -consistent and enjoys optimal convergence rate. A numerical example and some mathematical analysis are used to illustrate the source of the -inconsistency problem.

Key words.    computer experiments, uncertainty quantification, Gaussian process, reproducing kernel Hilbert space

AMS subject classifications.    62P30, 62A01, 62F12

## 1 Introduction

Because of the advances in complex mathematical models and fast computer codes, experiments on a computer, or referred to as computer experiments in the statistical literature, have become popular in engineering and scientific investigations. Computer simulations can be much faster or less costly than running physical experiments. Furthermore, physical experiments can be difficult to conduct as in the detonation of explosive materials or even infeasible when only rare events like land slide or hurricane are observed. Therefore computer simulations can be a stand-alone tool or combined with (typically smaller) data from physical experiments or field observations. There are many successful applications of computer experiments as reported in the literature. For a review of the general methodologies and examples, see the books by [22], and [9], and the November 2009 issue of Technometrics, which was devoted to computer experiments.

In this paper we consider the situations in which both physical experiments/observations and computer simulations are conducted and some input variables in the computer code are either unknown or unmeasured in the physical experiment. We refer to them as calibration parameters. From the responses in physical experiments alone, we cannot estimate the true values of the calibration parameters. We can run the computer codes by choosing selected values of the calibration parameters. From the combined data of the two sources, we can make inference about the parameters. That is, we can use the physical responses to calibrate the computer model. Apart from the calibration parameters, control variables are also involved as in standard computer experiments [22].

We use a spot welding example from [4] to illustrate the control variables and calibration parameters. In resistance spot welding, two sheets of metal are compressed by water-cooled copper electrodes under an applied load, . A direct current of magnitude is supplied to the sheets by two electrodes to create localized heating at the interface (called “faying surface”) between the two sheets. The heat produced by the current flow across the faying surface leads to melting, and, after cooling, a weld ”nugget” is formed. The size of nugget is taken as the response because it gives a good measure of the strength of the weld. Here and are considered as control variables. The contact resistance at the faying surface is a calibration parameter because it cannot be measured in physical experiments but can be used as an input variable to a finite element code called ANSYS.

In their pioneering work Kennedy and O’Hagan [18] proposed a model to link the two data sources by employing Gaussian process models, which are commonly used in the computer experiments literature. Since its publication, this approach has received a great deal of attention in the statistical literature. See [3, 4, 13, 14, 15, 16, 26], among others. It has seen a variety of applications, including hydrology, radiological protection, cylinder implosion, spot welding, micro-cutting and climate prediction, which were reported in the papers mentioned above and also in [11] and [19]. In spite of its importance as a methodology and significant practical impact, there has been no theoretical research on its modeling and estimation strategies. The main purpose of this paper is to provide a theoretical framework that facilitates the study of issues of parametrization, estimation and modeling in the Kennedy-O’Hagan formulation. For simplicity, we shall refer to Kennedy-O’Hagan as KO in the rest of the paper.

The paper is organized as follows. Some basic notation and terminology are given in Section LABEL:sec:preliminaries. A new theoretical framework for the calibration problem and its connection to function approximation via Gaussian process modeling are given in Section LABEL:sec:calibration. In particular, the lack of identifiability of the calibration parameters is discussed and a well-defined notion of calibration parameters is proposed by using the distance projection. The -consistency is defined as a justification for calibration methods. The KO modeling strategy is discussed in Section LABEL:Sec_least_native_norm. In order to provide a clean and workable mathematical analysis, we consider in Section LABEL:Sec_KO some simplifications of their original formulation. One is to drop the prior, which should not affect the general conclusions of our work because the information in the prior becomes negligible as the data gets larger. Thus we shall refer to calibration based on this simplification as the KO calibration. A key result is Theorem LABEL:Th_theta1, which states that the likelihood calibration is asymptotically -inconsistent according to our definition of the true calibration parameters. A numerical example is given to show that the -inconsistency can have a dramatic effect in small samples and some mathematical analysis is given to shed some light on why this happens. See Section LABEL:Sec_KO and LABEL:sec:numerical. To rectify the -inconsistency problem, a modification of the KO calibration is proposed in Section LABEL:Sec_ModifiedKO by introducing a scale parameter into its correlation function. When the scale parameter converges to at a certain rate, -consistency is restored (see Theorem LABEL:Th_calibration). The convergence rate of the original (unmodified) KO calibration is given in Theorem LABEL:Th_rateKO of Section LABEL:sec:rate. To achieve both -consistency and optimal convergence rate, we introduce a new method called least distance calibration in Section LABEL:Sec_L2deterministic and prove such properties in Theorem LABEL:Th_L2 for the case of cheap code, i.e., when the computer code can be evaluated with no cost. Its extension to expensive code is given in Theorem LABEL:Th_expensive of Section LABEL:Sec_Expensive. The convergence rate is slower than that in Theorem LABEL:Th_L2 because, in expensive code, there is cost in evaluating the code and an unknown function associated with the code needs to be estimated from data. Concluding remarks are given in Section LABEL:sec:discussion. Technical details are given in two appendices. Throughout the paper, mathematical tools and results in native space [27] are extensively used.

## 2 Preliminaries

For a convex and compact region , let be the set of continuous functions over . For a multiple index , define . Given and function , we denote the partial derivatives of by

 Dαf:=∂|α|∂xα11⋯∂xαddf.

For integer , define . For a function over , define the norm as and the Sobolev norm as

 ∥f∥Hk(Ω)=√∑|α|≤k∥Dαf∥2L2(Ω). \hb@xt@.01(2.1)

The Sobolev space consists of functions with finite norm value. The definition of the Sobolev spaces can be extended to the case where is a real number. Such spaces are called the fractional Sobolev spaces and we refer to [1] for details.

Functional approximation methods play an important role in the estimation of the calibration parameters. In this article, we consider the method of kernel interpolation [10]. This method provides a good functional approximation when the design consists of scattered points, i.e., the design points do not have any regular structure.

Suppose is a smooth function over and are observed. A kernel interpolator is built as follows. First choose a symmetric positive definite function over . Two common choices for are the squared exponential family (also referred to as the Gaussian correlation family), with

 Φ(s,t;ϕ)=exp{−ϕ∥s−t∥2} \hb@xt@.01(2.2)

and the Matérn family [24], with

 Φ(s,t;ν,ϕ)=1Γ(ν)2ν−1(2√νϕ∥s−t∥)νKν(2√νϕ∥s−t∥), \hb@xt@.01(2.3)

where is the modified Bessel function of the second kind. Let . Since the function is positive definite, the matrix is also positive definite. Thus the linear system about

 Y=Φu \hb@xt@.01(2.4)

has a unique solution , where . For , let

 ^y(x)=n∑i=1uiΦ(x,xi). \hb@xt@.01(2.5)

It can be verified that indeed interpolates ’s.

We call the kernel stationary if depends only on the difference . Another special case of the kernel interpolation is the radial basis function interpolation [5], in which the kernel function depends only on the distance as in (LABEL:Gaussian) or (LABEL:matern). The choice of the kernel function is critical to the performance of the interpolation. Cross-validation is a common method for choosing a suitable kernel function, see [22, 20].

In computer experiments, Gaussian process models are widely used as surrogate models for unknown functions [21]. There is a known relationship between the kernel interpolation and the Gaussian process prediction [2]. Suppose is a Gaussian process on with mean 0 and covariance function . Then given , the predictive mean of for any is

 E[z(x)|Z]=Φ(x,x)TΦ−1Z, \hb@xt@.01(2.6)

where . It can be seen that in (LABEL:interpolator) has the same form as the predictive mean in (LABEL:Gaussianprocess). For details, see the book [22].

## 3 Calibration Problem

Suppose we have a physical system with a vector of control variables as its input. Denote the input domain of by , which is a convex and compact subset of . The response of this system given is denoted as . We call the physical system deterministic if is a fixed value for each , and stochastic if is random for some . To study the response surface, we conduct experiments on a selected set of points . The set is called the experimental design or design for brevity.

We also have a computer code to simulate the physical system. The input of this computer code consists of two types of variables: the control variable and the calibration variable . The latter represents inherent attributes of the physical system, which cannot be controlled in the physical experiment. Denote the input domain for by , which is a compact subset of . The computer code gives a deterministic function of and , denoted as .

Computer experiments are usually much less costly than the corresponding physical experiments. In an ideal situation, a computer run only takes a short time so that we can run the computer code as many times as we want. Mathematically speaking, we call a computer code cheap if the functional form for is known. However, computer runs may be time-consuming so that we can only evaluate on a set of design points. In this case, an estimate based on the observed values (and the corresponding input values) is needed and we call the computer code expensive.

In many cases, the true value of the calibration parameters cannot be measured physically. For instance, material properties like porosity and permeability are important computer inputs in computational material simulations, which cannot be measured directly in physical experiments. A standard approach to identify those parameters is to use physical responses to adjust the computer outputs. Use of the physical response and computer output to estimate the calibration parameters is referred to as calibration.

### 3.1 Kennedy-O’Hagan Method

Kennedy and O’Hagan [18] was the first to propose a Bayesian framework for the estimation of the calibration parameters. The original version of the Kennedy-O’Hagan method works for stochastic systems with expensive computer codes. Denote the physical response by , for . Kennedy and O’Hagan [18] supposes that the physical response follows independent normal distribution. Specifically, they suggests that

 yp(xi)=ζ(xi)+ei, \hb@xt@.01(3.1)

where and with an unknown . Kennedy and O’Hagan [18] denotes the “true” value of the calibration parameter by and proposes the following model to link and

 ζ(⋅)=ρys(⋅,θ0)+δ(⋅), \hb@xt@.01(3.2)

where is an unknown regression coefficient, is an unknown discrepancy function. Kennedy and O’Hagan [18] claims that is a nonzero function because the computer code is built based on certain assumptions or simplifications which do not match the reality exactly. Thus and are related via the model:

 yp(x)=ρys(x,θ0)+δ(x)+e. \hb@xt@.01(3.3)

As is typically done in the literature of computer experiments, they assume that and are independent realizations of Gaussian processes. The use of Gaussian process modeling in computer experiment problems can be traced back to [21]. In Gaussian process modeling, we usually choose the Gaussian or Matérn correlation functions (see (LABEL:Gaussian) and (LABEL:matern)) and regard the parameters like and as unknown models parameters. Then can be estimated from (LABEL:KOmodel) through a Bayesian approach.

### 3.2 L2 Distance Projection

The aim of this work is to establish a theoretical framework for the calibration problems from a frequentist point of view, i.e., we regard in (LABEL:KOdeterministic) as deterministic functions. For simplicity, we rewrite (LABEL:KOdeterministic) as

 ζ(⋅)=ys(⋅,θ0)+δ(⋅). \hb@xt@.01(3.4)

This does not make much difference because we can regard the term as the computer output with calibration parameters .

From now on, we suppose the physical system is deterministic, i.e., or equivalently . Then under the framework of [18], the calibration problem can be formulated as

 yp(x)=ys(x,θ0)+δ(x), \hb@xt@.01(3.5)

where is the “true” value of the calibration parameter and is the discrepancy between and .

From a frequentist perspective, in (LABEL:lackindentifiability) is unidentifiable, because the pair cannot be uniquely determined even if and are known. This identifiability issue is discussed in [3, 4, 13] and other papers.

The main purpose of this section is to provide a rigorous theoretical study on the estimation of calibration parameters. Given the lack of identifiability for , we need to find a well-defined parameter in order to study the estimation problem. A standard approach when the model parameters are unidentifiable is to redefine the “true” parameter value as one that minimizes the “distance” between the model and the observed data. First define

 ϵ(x,θ):=yp(x)−ys(x,θ). \hb@xt@.01(3.6)

Here we adopt the norm in defining the distance. If another distance measure is chosen, the proposed mathematical framework can be pursued in a similar manner.

###### Definition 3.1

The distance projection of is given by

 θ∗=argminθ∈Θ∥ϵ(⋅,θ)∥L2(Ω), \hb@xt@.01(3.7)

where is defined in (LABEL:epsilon). For brevity, we will also refer to as the projection.

In Definition LABEL:Def_1 we find a value that minimizes the discrepancy between the physical and computer observations, because the true value of is not estimable. The value given by (LABEL:least_distance) minimizes the average predictive error given by the computer code. This is relevant and useful since our interest lies in the prediction of the physical response. One justification for choosing the norm comes from the common use of the quadratic loss criterion. Suppose we want to predict the physical response at a new point and is uniformly distributed over . Then the expected quadratic loss given is

 ∫Ω(yp(x)−ys(x,θ))2dx=∥ϵ(⋅,θ)∥2L2(Ω). \hb@xt@.01(3.8)

Thus the value minimizing is the distance projection .

The value of depends on the norm that is used to measure the discrepancy between the physical and the computer observations. If the norm is generalized to an norm or some weighted version, the results in the paper are not affected. Detailed discussion on this will be deferred to Section LABEL:sec:discussion. In (LABEL:least_distance) we implicitly assumes that the (global) minimizer of is unique. We believe that this is a reasonable assumption in many calibration problems. The definition of the projection also put the parameter estimation problem into an “engineering validation” framework. However, we still keep the wording “calibration” because it is the standard terminology since the foundation work by Kennedy and O’Hagan [18]. For a related discussion, we refer to [7, 17].

Since the functional forms for and are unknown, cannot be obtained by solving (LABEL:least_distance). For the problems with cheap computer code, we know and the function values of over a set of design points, denoted as . For the problems with expensive computer code, we know only and the function values of over the design points for the computer simulation, denoted as . Call a (deterministic) estimator of , if depends only on for cheap code or on for expensive code. For fixed and , let be a sequence of estimates given by a sequence of designs (given by either or ). Then is said to be -consistent if tends to as the designs become dense over or . The term “consistent” or “consistency” is a misnomer but we keep it here because of its statistical implication.

## 4 Frequentist Properties of the Kennedy-O’Hagan Model

In this section we examine the frequentist properties of the calibration model by [18]. Our theoretical analysis shows that the method is -inconsistent. We also construct some examples to show that the Kennedy-O’Hagan method may produce unreasonable answers.

### 4.1 Simplified KO calibration

In order to conduct a rigorous mathematical analysis, we make the following simplifications to the Kennedy-O’Hagan method. We refer to this simplified version as the simplified KO calibration, or KO calibration for brevity.

1. The computer code is cheap.

2. The physical system is deterministic.

3. Without loss of generality, we can assume (, otherwise the unknown can be regarded as a calibration parameter). The discrepancy function is a realization of a Gaussian process with mean 0 and the covariance function , where is an unknown parameter and the function is known.

4. Maximum likelihood estimation (MLE) is used to estimate instead of Bayesian analysis.

The assumption (i) on cheap code will be relaxed for the calibration in Section LABEL:Sec_Expensive. The assumption (ii) on deterministic physical experiment can be relaxed but will require a separate treatment. Further remarks are deferred to the end of Section LABEL:sec:discussion. In assumption (iv), we only consider the likelihood portion of the Bayesian formulation in order to have a clean and workable mathematical analysis. As will be discussed in Section LABEL:sec:discussion, this simplification should not affect the general message one may draw regarding the original Bayesian formulation. Not to break the flow, further comments and justifications for the assumptions will be deferred to the concluding section.

Under these assumptions, the functions are known for . Then the likelihood function given in [18] can be simplified and it can be shown that the log-likelihood function for here is given by

 l(θ,σ2;Y)=−n2logσ2−12log|Φ|−12σ2ϵ(x,θ)TΦ−1ϵ(x,θ), \hb@xt@.01(4.1)

where , and denotes the determinant of . For details on the likelihood functions of Gaussian process models, we refer to [22, 20].

Our study will employ the reproducing kernel Hilbert spaces (also called the native spaces) as the mathematical tool [27]. Given a symmetric and positive definite function , define the linear space

 FΦ(Ω)={N∑i=1βiΦ(⋅,xi):N∈\mathdsN,βi∈R,xi∈Ω}

and equip this space with the bilinear form

 ⟨N∑i=1βiΦ(⋅,xi),M∑j=1γjΦ(⋅,yj)⟩Φ:=N∑i=1M∑j=1βiγjΦ(xi,yj).

Define the native space as the closure of under the inner product . The inner product of , denoted as , is induced by . Define the native norm as . Some required properties of reproducing kernel Hilbert spaces are given in Appendix LABEL:App_RKHS under Propositions LABEL:prop_power_function-LABEL:prop_scale. Their equations are numbered as (LABEL:integral_equation) to (LABEL:norm_bound).

Direct calculation shows that the maximum likelihood estimation (MLE) for is

 ^θKO=argminθ∈Θϵ(x,θ)TΦ−1ϵ(x,θ). \hb@xt@.01(4.2)

For fixed , let be the kernel interpolator for given by (LABEL:interpolator), i.e.,

 ^ϵ(⋅,θ)=Φ(⋅,x)TΦ−1ϵ(x,θ). \hb@xt@.01(4.3)

From the definition of the native norm, we have

 ∥^ϵ(⋅,θ)∥2NΦ(Ω)=ϵ(x,θ)TΦ−1ΦΦ−1ϵ(x,θ)=ϵ(x,θ)TΦ−1ϵ(x,θ),

which, together with (LABEL:MLE), gives

 ^θKO=argminθ∈Θ∥^ϵ(⋅,θ)∥2NΦ(Ω). \hb@xt@.01(4.4)

Now we study the asymptotic properties for the KO calibration model. To this end, we require the design points to become dense over . This property is measured by the fill distance.

###### Definition 4.1

For a design , define its fill distance as

 h(D):=maxx∈Ωminxi∈D∥xi−x∥. \hb@xt@.01(4.5)

We use to denote the estimator under . Theorem LABEL:Th_theta1 gives the limiting value of .

###### Theorem 4.2

Suppose there exists , such that

 ϵ(x,θ)=∫ΩΦ(x,t)vθ(t)dt \hb@xt@.01(4.6)

for any . Moreover, suppose , has continuous second order derivatives, and there exists a unique such that

 ∥ϵ(⋅,θ′)∥NΦ(Ω)=infθ∈Θ∥ϵ(⋅,θ)∥NΦ(Ω). \hb@xt@.01(4.7)

Then , provided that as .

The condition implies that lies in a subset of . See (LABEL:integral_equation) and (LABEL:v) in Appendix LABEL:App_RKHS and [27] for further discussions.

### 4.2 L2 norm or native norm?

Because in general , according to Definition LABEL:Def_1, the KO calibration is not -consistent. A bigger issue is whether is an appropriate definition for the calibration parameters. For example, suppose we adopt in (LABEL:thetaprime) as the “true” calibration parameters. Then the KO calibration can be declared consistent according to Theorem LABEL:Th_theta1. The question is whether can be used as a legitimate definition for the calibration parameters. Here we remind that Kennedy and O’Hagan [18] states the goal of calibration is “… adjusting the unknown parameter until the outputs of the (computer) model fit the observed data”. However, we will give an example, backed by mathematical theory, to show that the result given by KO calibration may not agree with their original purpose.

In view of the convergence result in Theorem LABEL:Th_theta1, we first study how different the limiting value of the KO calibration is from . To address this question, we consider the difference between the two norms and . This difference is related to the eigenvalues of the integral operator defined as

 κ(f)=∫ΩΦ(⋅,x)f(x)dx, \hb@xt@.01(4.8)

for . Denote the eigenvalues of by . Let be the eigenfunction associated with with . Then it can be shown that

 ∥fi∥2NΦ(Ω)=⟨fi,λ−1ifi⟩2L2(Ω)=λ−1i, \hb@xt@.01(4.9)

where the first equality follows from (LABEL:v) and the fact that . It is known in functional analysis that is a compact operator and therefore [8]. Thus yields that as . This leads to

 supf∈NΦ(Ω)∥f∥NΦ(Ω)∥f∥L2(Ω)=∞, \hb@xt@.01(4.10)

which implies that there are functions with arbitrarily small norm while their norm is bounded away from zero. Therefore, by Definition LABEL:Def_1, the KO calibration can give results that are far from the distance projection. The following example shows that this effect can indeed be dramatic.

###### Example 1

Consider a calibration problem with a three-level calibration parameter. Let , . By using a numerical method, we obtain the eigenvalue and eigenfunction of defined in . The first and second eigenvalues are and . For a better visual effect, we use the eigenfunctions whose norms are . We plot the first and second eigenfunctions of in Figure LABEL:Fig_eigen. We also plot for later comparison. Suppose we have three different computer codes. Denote the discrepancy between the physical response and each of the computer output by , and respectively. Suppose are the three functions given in Figure LABEL:Fig_eigen, i.e., and are the first and second eigenfunction of , and is . Then which code is the best? From and , the third computer code is the best according to Definition LABEL:Def_1.

However, by using a Gaussian process model with the same correlation function , we get a different result. By , maximizing the likelihood function is equivalent to minimizing the pivoted sum of square (PSS): , where for . We choose a space-filling design of 11 points, given by for . The PSSs are for , for , and for . Thus the KO calibration will choose the first code because it has the smallest PSS value. This demonstrates that the likelihood-based method can give very different rankings of the competing codes from the projection. From Figure LABEL:Fig_eigen we can see that is smaller than and for every , i.e., the point-wise prediction error for the third code is uniformly smaller than the first two. Therefore, the KO calibration made a wrong choice. This also gives a good justification for choosing the norm rather than the native norm in Definition LABEL:Def_1.

To give a more general explanation for the phenomenon in Example LABEL:ex_eigen, we consider the situations where the Matérn correlation functions defined by (LABEL:matern) are used. Corollary LABEL:coro_matern in Appendix LABEL:App_RKHS shows that for the Matérn correlation functions, the reproducing kernel Hilbert space equals to the (fractional) Sobolev space and the two norms are equivalent for , where the Sobolev norm is defined by (LABEL:sobolev). Note that the Sobolev norm can be big for a function with wild oscillation even when its norm is small (the same as that shown by (LABEL:unbounded)). Thus, the Sobolev norm, which is equivalent to the native norm in this context, is not a good measure of discrepancy, because we only care about the magnitude of the discrepancy, not its oscillation. Therefore it is not suitable to use the KO calibration in most practical problems. For Gaussian correlation function, this problem is even more serious because the reproducing kernel Hilbert spaces generated by Gaussian kernels can be embedded into any Sobolev space (which can be shown by Proposition LABEL:prop_fourier).

The phenomenon shown in Example LABEL:ex_eigen can also be interpreted with the help of the Karhunen-Loève expansion of Gaussian processes. Suppose is a Gaussian process with mean 0 and covariance function . Let be the eigenvalues of integral operator in (LABEL:kappa) with and be the eigenfunction associated with with . Then the Karhunen-Loève theorem states that admits the follow expansion

 z(x)=∞∑i=1λiZifi(x), \hb@xt@.01(4.11)

where ’s are independent and identically distributed standard normal random variables, and the convergence is in . The expression (LABEL:KLexpansion) explains why is more likely to be a realization of among other functions with the same norm, i.e., yields the greatest likelihood value. To see this, we truncate (LABEL:KLexpansion) at a sufficiently large , denoted as , and obtain

 z(x)≈K∑i=1λiZifi(x). \hb@xt@.01(4.12)

Since forms an orthogonal basis in , we can approximate any function in by

 f(x)≈K∑i=1⟨f,fi⟩L2[−1,1]fi(x).

Thus the statement “ is a realization of ” approximately yields . This event has a probability density proportional to

 p(λiZi=⟨f,fi⟩L2[−1,1]):=exp{−K∑i=1⟨f,fi⟩2L2[−1,1]/(2λ2i)}, \hb@xt@.01(4.13)

because . Thus (LABEL:likelihoodf) can be regarded as a multiply of the probability density of sampling from approximately. Now we can find the function with the largest density value among a set of with the same norm, say without loss of generality. Let us assume , which is true for the covariance function we discussed in Example LABEL:ex_eigen. Because , the function maximizes should satisfy and for . Such a function is .

### 4.3 Numerical Study on Kennedy-O’Hagan Method with Estimated Correlation Function

To give a more realistic comparison, we extend the study in Example LABEL:ex_eigen by considering the frequentist version of the original Kennedy-O’Hagan method, in which the function is estimated as well. Specifically, we suppose that depends on a model parameter , denoted as . Then the log-likelihood function is

 ~{}~{}~{}~{}~{}~{}~{}l(θ,σ2,ϕ;Y)=−n2logσ2−12log|Φϕ|−12σ2ϵ(x,θ)TΦ−1ϕϵ(x,θ), \hb@xt@.01(4.14)

where . By substituting the analytical form of into (LABEL:philikelihood), we obtain the log-likelihood function with respect to :

 l(θ,ϕ;Y)=−n2logϵ(x,θ)TΦ−1ϕϵ(x,θ)−12log|Φϕ|. \hb@xt@.01(4.15)

As in Section LABEL:Sec_KO, we estimate by using the maximum likelihood. In this subsection we present some numerical results, which show that even when is estimated, the frequentist KO method still suffers from the problem demonstrated in Example LABEL:ex_eigen.

###### Example 1 (continued)

We use the same true functions and design points as in Example LABEL:ex_eigen. We compute the log-likelihood functions given in (LABEL:phiMLE) with . Denote the log-likelihood function in (LABEL:phiMLE) by , where correspond to the candidate functions . The functions are plotted in Figure LABEL:Fig_likelihood for . From the figure we can see that . Therefore the frequentist KO method will pick , which gives the same (incorrect) result as in Example LABEL:ex_eigen.

It can be seen from Figure (LABEL:Fig_likelihood) that the log-likelihood functions and are monotonic decreasing. This suggests that if ranges over , the MLE of can be even smaller. Unfortunately, we are not able to calculate the likelihood value for a very small because the correlation matrix becomes nearly singular. Our current numerical experience show that likelihood value keeps growing as decreases. We conjecture that the likelihood function is unbounded in this case, although we are not able to prove it so far. From Figure (LABEL:Fig_likelihood) it can also be seen that if we fix a relatively large , say , the likelihood values give a correct order of the discrepancy. This is not occasional. In Section LABEL:Sec_ModifiedKO we will prove that such a modified version of the KO model leads to -consistent estimation.

## 5 Asymptotic Results: Cheap Code

Theorem LABEL:Th_theta1 is the first asymptotic result we present in this article. In this section we shall study other convergence properties, assuming that the computer code is cheap as in Section LABEL:Sec_KO.

### 5.1 Modified KO Calibration

Given the wide-spread use of the Gaussian process modeling in calibration problems (as in the KO model), a fundamental question is whether we can modify it to rectify its -inconsistency problem. For convenience we assume a stationary Gaussian process model . The correlation of is given by

 R(x)=Corr(Y(⋅+x),Y(⋅)),

where is a positive definite kernel over . The Fourier transform [23] is a useful tool for studying stationary kernels. We will use the notation instead of for simplicity.

###### Definition 5.1

For define the Fourier transform by

 ~f(w):=(2π)−d/2∫Rdf(x)e−iw%Txdx.

From Theorem LABEL:Th_theta1, it can be seen that the KO calibration is not -consistent if the correlation function is fixed. In order to construct -consistent calibration, we should use a sequence of functions indexed by , denoted by . From (LABEL:v), the norm becomes only when , where denotes the Dirac delta function. We need the convergence in order to obtain -consistency. An easy way to achieve this convergence is to introduce a scale parameter. Suppose is a family of correlation functions on with . Call a scale parameter if for any and any . Most correlation families like Gaussian or Matérn family satisfy these conditions.

Write . Let be the estimate of given by the KO calibration using correlation function under design , referred to as the modified KO calibration. The -consistency requires . But to ensure the convergence of the interpolation, cannot diverge too fast. The next theorem suggests that the modified KO calibration is -consistent if we choose a suitable increasing rate for . Define the convolution for any . We list the required regularity conditions before stating the theorem.

1. , where is the gradient of with respect to .

2. There exists such that .

3. is integrable and , where is the fourier transform of .

###### Theorem 5.2

Suppose conditions (A1-A3) are satisfied and has continuous derivatives. Then if and .

Although the modified KO calibration is -consistent, its implementation relies on a prespecified sequence . For a calibration problem with a fixed sample size, there is no theoretical guidelines on choosing the value. A more practical procedure is given in Section LABEL:Sec_L2deterministic, namely, a novel calibration method that is -consistent and does not rely on the choice of the kernel function.

### 5.2 Convergence Rate

Under the assumption of Theorem LABEL:Th_theta1, we can employ (LABEL:improved) in Proposition LABEL:prop_power_function to show that the interpolation error given by a kernel with derivatives is equivalent to . Given this rate, Theorem LABEL:Th_rateKO shows that the KO calibration, which converges to in Theorem LABEL:Th_theta1, reaches the same rate. Let .

###### Theorem 5.3

Under the conditions of Theorem LABEL:Th_theta1, suppose has continuous derivatives. We assume that is an interior point of and there exist a neighborhood of , and functions such that

 ∂ϵ∂θi(x,θ)=∫ΩΦ(x,t)Divθ(t)dt, \hb@xt@.01(5.1) ∂2ϵ∂θi∂θj(x,θ)=∫ΩΦ(x,t)Dijvθ(t)dt, \hb@xt@.01(5.2)

for and . Moreover,

 supθ∈U,1≤i,j≤q{∥Divθ∥L2(Ω),∥Dijvθ∥L2(Ω)}<∞, and \hb@xt@.01(5.3) ∂2∂θ∂θT∥ϵ(⋅,θ′)∥2NΦ(Ω) is % invertible. \hb@xt@.01(5.4)

Then .

The conditions (LABEL:v1) and LABEL:v2 enhance the condition (LABEL:vexist) in Theorem LABEL:Th_theta1 by assuming the differentiability of and the interchangeability of a derivative and an integral (differentiation under the integral sign; i.e.,Leibniz integral rule).

Noting that consistency is a necessary requirement for an estimator, we would like to find an estimator that is consistent and attains the same convergence rate as in Theorem LABEL:Th_rateKO. In the following subsection we find an estimator that guarantees -consistency and full efficiency.

### 5.3 Least L2 Distance Calibration

Let be the kernel interpolator defined in for under design . Define the least distance calibration by

 ^θL2(D)=argminθ∈Θ∥^yp(⋅)−ys(⋅,θ)∥L2(Ω). \hb@xt@.01(5.5)

For brevity, we will also refer to it as the calibration. [13] used the norm in a different context, i.e., choosing an optimal tuning parameter value to minimize the norm of the observed discrepancy . Theorem LABEL:Th_L2 shows that converges to the projection at the optimal rate.

###### Theorem 5.4

Suppose is the unique solution to (LABEL:least_distance) and an interior point of ; ; is invertible; has continuous derivatives; and there exists a neighborhood of such that and for . Then as ,

 ∥^θL2(Dn)−θ∗∥=O(hk(Dn)). \hb@xt@.01(5.6)

Furthermore, if there exists such that for all , then the convergence rate can be improved to

 ∥^θL2(Dn)−θ∗∥=O(h2k(Dn)). \hb@xt@.01(5.7)

By comparing the results and conditions in Theorems LABEL:Th_rateKO and LABEL:Th_L2, we can make the following observations. First, and enjoy the convergence rate under similar conditions. Second, the calibration has the additional property that, even under much less restrictive conditions, it still enjoys convergence, though at a slower rate. But this slower rate is optimal under these conditions because the interpolator has the same convergence rate given by (LABEL:ordinary).

## 6 Least L2 Distance Calibration for Expensive Code

Now we turn to the case of expensive computer code for which cannot be evaluated for infinitely many times. In this situation we need another surrogate model for . Note that the input space for a computer run is . Let be the set of design points for the computer experiment with its fill distance . Suppose is convex. Choose a positive definite function over . For kernel and design , let be the interpolate for defined by (LABEL:interpolator). Then we can define the calibration in a similar way:

 ^θL2(D,G) := argminθ∈Θ∥^yp(⋅)−^ys(⋅,θ)∥L2(Ω). \hb@xt@.01(6.1)

Note that the only difference from the definition in (LABEL:L2calibration) is the replacement of by its interpolate in (LABEL:L2calibrationexpensive).

We want to study the asymptotic behavior of the calibration for expensive computer code. First we need to extend the definition of in (LABEL:least_distance) to

 θ∗(G) := argminθ∈Θ∥yp(⋅)−^ys(⋅,θ)∥L2(Ω).

Then we have

 ∥^θL2(D,G)−θ∗∥≤∥^θL2(D,G)−θ∗(G)∥+∥θ∗(G)−θ∗∥. \hb@xt@.01(6.2)

If we regard the interpolate as the true computer output, can be viewed as an “ projection”. Following similar steps as in the proof of Theorem LABEL:Th_L2, we can prove that

 ∥^θL2(Dn,Gn)−θ∗(Gn)∥=O(hk(Dn)) \hb@xt@.01(6.3)

under the same conditions in Theorem LABEL:Th_L2. It remains to find a bound for . The following theorem shows that its rate is slower than that in Theorem LABEL:Th_L2.

###### Theorem 6.1

Suppose is the unique solution to (LABEL:least_distance); has continuous derivatives with ; is an interior point of ; ; and is invertible. Then as .

Theorem LABEL:Th_expensive, together with (LABEL:triangle)-(LABEL:expensiveterm1), yields the following result on the convergence rate of the calibration for expensive computer code.

###### Theorem 6.2

Under the assumptions of Theorems LABEL:Th_L2 and LABEL:Th_expensive,

 ∥^θL2(Dn,Gn)−θ∗∥=O(max(hk(Dn),hk