PhysicsInformed CoKriging: A GaussianProcessRegressionBased Multifidelity Method for DataModel Convergence
Abstract
In this work, we propose a new Gaussian process regression (GPR)based multifidelity method: physicsinformed CoKriging (CoPhIK). In CoKrigingbased multifidelity methods, the quantities of interest are modeled as linear combinations of multiple parameterized stationary Gaussian processes (GPs), and the hyperparameters of these GPs are estimated from data via optimization. In CoPhIK, we construct a GP representing lowfidelity data using physicsinformed Kriging (PhIK), and model the discrepancy between low and highfidelity data using a parameterized GP with hyperparameters identified via optimization. Our approach reduces the cost of optimization for inferring hyperparameters by incorporating partial physical knowledge. We prove that the physical constraints in the form of deterministic linear operators are satisfied up to an error bound. Furthermore, we combine CoPhIK with a greedy active learning algorithm for guiding the selection of additional observation locations. The efficiency and accuracy of CoPhIK are demonstrated for reconstructing the partially observed modified Branin function, reconstructing the sparsely observed state of a steady state heat transport problem, and learning a conservative tracer distribution from sparse tracer concentration measurements.
Keywords: physicsinformed, Gaussian process regression, CoKriging, multifidelity, active learning, error bound.
1 Introduction
Gaussian processes (GPs) are a widely used tool in applied mathematics, statistics, and machine learning for regression, classification, and optimization [14, 40, 44]. GP regression (GPR), also known as Kriging in geostatistics, constructs a statistical model of a partially observed process by assuming that its observations are a realization of a GP. A GP is uniquely described by its mean and covariance function (also known as kernel). In standard (referred to here as datadriven) GPR, usually parameterized forms of mean and covariance functions are assumed, and the hyperparameters of these functions (e.g., variance and correlation length) are estimated from data by maximizing the log marginal likelihood of the data. GPR is also closely related to kernel machines in machine learning, but it provides a richer characterization in the result, as it provides uncertainty estimates [48]. GP is also connected to infinite neural networks, that is, networks with an infinite number of hidden units [29].
There are several variants of GPR, including simple, ordinary, and universal Kriging [21]. Ordinary Kriging is the most widely used GPR method. It assumes stationarity of the random field, including constant mean and variance, and a prescribed stationary covariance function. The stationarity assumption reduces the number of hyperparameters and the model complexity. For example, in universal Kriging, the mean is modeled as a linear combination of basis functions [1], which increases the number of unknown parameters and may lead to nonconvex optimization problems. Although the assumption of stationarity may not be suitable for some application problems, it is often necessary as there are usually not enough data to compute accurate estimates of nonstationary mean and covariance functions. Progress have been made at incorporating physical knowledge into kernels, e.g., [42, 18, 6, 7, 38, 39] by computing kernels for systems governed by linear and weakly nonlinear (allowing accurate linearization) ordinary and partial differential equations. Such kernels are computed by substituting a GPR approximation of the system’s state variables into the governing equation and obtaining a system of equations for the kernel hyperparameters. For complex linear systems, computing the kernel in such a way can become prohibitively expensive, and for strongly nonlinear systems, it may not be possible at all.
In our previous work [52], we proposed the physicsinformed Kriging method (PhIK) that incorporates (partial) physical knowledge into GPRs. In modeling complex systems, it is common to treat unknown parameters and fields as random parametrs and fields, and the resulting realizations of the state of the system are employed to study the uncertainty of the model or the system. The goal of PhIK is to exploit the information of the system provided by these realizations to assimilate observations. In PhIK, such random realizations are used to compute the prior mean and covariance. A similar idea is used in the ensemble Kalman filter (EnKF) [13] and the formula of the “filtering step” is equivalent to the PhIK prediction Eq. (2.13). Whereas EnKF introduces uncertainty mainly from the observation noise and evolves an ensemble of state variables drawn from the posterior distribution of the previous time step, PhIK utilizes the stochasticity in models and directly uses simulation outputs for prediction without redrawing the ensemble in each time step. Not only does PhIK provide prediction or reconstruction in the form of posterior mean, it also performs uncertainty reduction (UR) in the form of posterior variance. More importantly, PhIK posterior mean satisfies linear physical constraints with a bounded error [52], which is critical for guaranteeing the predictive value of the method. The main drawback of PhIK is that it is highly dependent on the physical model, because the prior mean and covariance are determined entirely by the model and are not informed by data. Therefore, convergence of PhIK to the true solution with the increasing number of available observations is slower than in the datadriven GPR if the physical model is incorrect.
In this work, we propose a physicsinformed CoKriging (CoPhIK) method, an extension of the CoKrigingbased multifidelity framework [20, 15] to physicsinformed Kriging. In this context, the direct observations of a physical system are considered as highfidelity data and the stochastic physical model outputs are treated as lowfidelity data. CoPhIK uses PhIK to construct a GP that regresses lowfidelity data, and uses another parameterized GP to model the discrepancy between low and highfidelity data by assuming a specific kernel; then it infers hyperparameters of the GP model for via optimization. Subsequently, CoPhIK uses a linear combination of and to represent highfidelity data. The mean and covariance in CoPhIK integrate physical model outputs and observation data; therefore, CoPhIK is expected to have better accuracy than PhIK in some applications (e.g., the first two numerical examples in Section 3). On the other hand, due to the introduction of the GP , CoPhIK may lose some capacity for satisfying physical constraints with respect to PhIK, as will be shown in the error estimate provided by Theorem 2.2.
This work is organized as follows: Section 2 summarizes the GPR framework and physicsinformed Kriging (Sections 2.1 to 2.3), and introduces the CoPhIK method (Section 2.4). Section 3 provides three numerical examples to demonstrate the efficiency of the proposed method. Conclusions are presented in Section 4.
2 Methodology
We begin this section by reviewing the general GPR framework [48], the ordinary Kriging method based on the assumption of stationary GP [14], and the PhIK method [52]. Then, we introduce the modified PhIK and CoPhIK methods.
2.1 GPR framework
We consider the spatial dependence of a scalar state of a physical system. Let , be the spatial domain, denote the state of interest, and let , denote observations of collected at the observation locations , where . The observations are arranged into the observation vector . We aim to predict at any new location . The GPR method assumes that the observation vector is a realization of the dimensional random vector with multivariate Gaussian distribution
where is a GP defined on the probability space . Of note, the observation coordinates can be considered as parameters for the GP such that is a Gaussian random variable for any . For brevity, we denote by . The GP is usually represented using GP notation as
(2.1) 
where and are the mean and covariance functions
(2.2)  
(2.3) 
The variance of is , and its standard deviation is . The covariance matrix of the random vector is then given by
(2.4) 
When the functions and are parameterized, their hyperparameters are identified by maximizing the log marginal likelihood of the observations (see examples in Section 2.2) [48]
(2.5) 
where .
The GPR prediction at consists of the posterior distribution , with posterior mean and variance given by
(2.6)  
(2.7) 
and is the vector of covariances
(2.8) 
In practice, it is common to employ the posterior mean as the prediction. The variance is often called the mean squared error (MSE) of the prediction because [14]. Consequently, is called the root mean squared error (RMSE).
To account for observation noise, one can model the noises as independent and identically distributed (iid) Gaussian random variables with zero mean and variance , and replace in Eqs. 2.5 to 2.8 with . In this study, we assume that observations of are noiseless. If is not invertible or its condition number is very large, one can add a small regularization term , where is a small positive real number, to such that it becomes full rank. Adding the regularization term is equivalent to assuming there is iid observation noise with variance .
2.2 Stationary GPR
In the widely used ordinary Kriging method, a stationary GP is assumed. Specifically, is set as a constant , and , where . Consequently, is a constant. Popular forms of kernels include polynomial, exponential, Gaussian (squaredexponential), and Matérn functions. For example, the Gaussian kernel can be written as , where the weighted norm is defined as , The constant and the correlation lengths along each direction, , are the hyperparameters of the Gaussian kernel.
For the stationary kernel, the covariance matrix of observations, , can be written as , where for the Gaussian kernel. In the maximum likelihood (MLE) framework, the estimators of and , denoted as and , are
(2.9) 
where is a vector of s. The hyperparameters are estimated by maximizing the log marginal likelihood, Eq. (2.5). The prediction of at location is
(2.10) 
where is a vector of correlations between the observed data and the prediction, given by
and the MSE of the prediction is
(2.11) 
A more general approach to GPR is to employ parameterized nonstationary covariance kernels. Nonstationary kernels can be obtained by modifying stationary covariance kernels, e.g., [41, 27, 32, 37, 4, 28], or from neural networks with specific activation functions, e.g., [29, 34], among other approaches. Many of these approaches assume a specific functional form for the correlation function, chosen according to expert knowledge. The key computational challenge in these datadriven GPR is the optimization step of maximizing the (log marginal) likelihood. In many practical cases, this is a nonconvex optimization problem, and the condition number of or can be quite large. Another fundamental challenge is that parameterized models for mean and covariance usually don’t account for physical constraints, and therefore require a large amount of data to accurately model physics.
2.3 PhIK
The recently proposed PhIK method [52] takes advantage of existing expert knowledge in the form of stochastic physicsbased models. These stochastic models for physical systems include random parameters or random fields to reflect the lack of understanding (of physical laws) or knowledge (of the coefficients, parameters, etc.) of the real system. Monte Carlo (MC) simulations of the stochastic physical model can be conducted to generate an ensemble of state variables, from which the mean and covariance are estimated. This mean and covariance estimates are then employed to construct a GP model of the state variables. As such, there is no need to assume a specific parameterized covariance kernel or solve an optimization problem for the hyperparameters of the kernel.
Given realizations of a stochastic model , , , denoted as , we build the following GP model:
(2.12) 
where and are the ensemble mean and covariance functions
(2.13)  
The covariance matrix of observations can be estimated as
(2.14) 
where and . The prediction and MSE at location are
(2.15)  
(2.16) 
where
is the variance of the set , and
.
It was demonstrated in [52] that PhIK predictions satisfy linear physical constraints up to an error bound that depends on the numerical error, the discrepancy between the physical model and real system, and the smallest eigenvalue of matrix . Linear physical constraints include periodic, Dirichlet or Neumann boundary condition, and linear equation , where is a linear differential or integral operator. For example, let be a stochastic model of the velocity potential for a incompressible flow, i.e., ; then PhIK guarantees that is a divergencefree field.
In PhIK, MC simulation of the stochastic physical model for computing and can be replaced by more efficient approaches, such as quasiMonte Carlo [30], multilevel Monte Carlo (MLMC) [17], probabilistic collocation [49], Analysis Of Variance (ANOVA) [50], compressive sensing [51], the moment equation and PDF methods [45, 3], and the bifidelity method [53]. Linear physical constraints are preserved if and are computed using a linear combination of the realizations . As an example, we present the MLMCbased PhIK [52] in Appendix A.
Further, the matrix and vector are fixed in Eq. (2.5) for a given ensemble . Thus, the log marginal likelihood is fixed. We can modify PhIK by adding a correction term to to increase the likelihood. Specifically, we replace by , where is a constant. Then, taking the derivative of with respect to and setting it to be zero yields
(2.17) 
This modification has a potential to increase the accuracy of the prediction, but it may also violate some physical constraints, e.g., the Dirichlet boundary condition. We name this method modified PhIK, and provide the following theorem on how well it preserves linear physical constraints.
Theorem 2.1.
Assume that a stochastic model defined on () satisfies for any , where is a deterministic bounded linear operator, is a welldefined function on , and is a welldefined function norm. are a finite number of realizations of , i.e., . Then, the prediction from modified PhIK satisfies
(2.18)  
where is the standard deviation of the data set for fixed , , and .
Proof.
The modified PhIK prediction can be written as
(2.19) 
where is the th entry of . According to Theorem 2.1 and Corollary 2.2 in [52],
∎
2.4 CoPhIK
CoKriging was originally formulated to compute predictions of sparsely observed states of physical systems by leveraging observations of other states or parameters of the system [43, 22]. Recently, it has been employed for constructing multifidelity models [20, 25, 35], and has been applied in various areas, e.g., [24, 5, 33]. In this work, we propose a novel multifidelity method, CoPhIK, that integrates PhIK and CoKriging by combining numerical simulations and highfidelity observations. Our multifidelity method is based on Kennedy and O’Hagan’s CoKriging framework presented in [20, 14].
We briefly review the formulation of CoKriging for twolevel multifidelity modeling in [15]. Suppose that we have highfidelity data (e.g., accurate measurements of states) at locations , and lowfidelity data (e.g., simulation results) at locations , where and . By concatenating the observation locations and data respectively, i.e., and , we can construct a multivariate GP via Kriging as detailed in [2]. Kennedy and O’Hagan proposed an alternative formulation of CoKriging based on the autoregressive model for
(2.20) 
where is a regression parameter and is a GP that models the difference between and . This model assumes that
(2.21) 
It was shown in [31] that the assumption of Eq. (2.21) implies the autoregressive model of Eq. (2.20). The covariance of observations, , is then given by
(2.22) 
where is the covariance matrix based on GP ’s kernel , and is the covariance matrix based on GP ’s kernel . One can assume parameterized forms for these kernels (e.g., Gaussian kernel) and then simultaneously identify their hyperparameters along with by maximizing the following log marginal likelihood:
(2.23) 
Alternatively, one can employ the following twostep approach [15, 14]:

Use Kriging to construct using .

Denote , where are the values of at locations common to those of , then construct using via Kriging.
The posterior mean and variance of at are given by
(2.24)  
(2.25) 
where , is the mean of , is the mean of , , , and
(2.26)  
(2.27) 
where . Here, we have neglected a small contribution to (see [14]).
Now we describe the CoPhIK method. We set to simplify the formula and computing, and denote . We employ PhIK to construct GP using realizations of a stochastic model on . Specifically, we set and , where and are given by Eq. (2.13). The GP model is constructed using the same approach as in the second step of the Kennedy and O’Hagan CoKriging framework. In other words, CoPhIK replaces the first step of their framework with PhIK, and follows the same procedure for the second step.
We proceed to describe the construction of in more detail. First, we set . The reason for this choice is that is the most probable observation of the GP . Next, we need to assume a specific form of the kernel function. Without loss of generality, in the following theoretical analysis and computational examples, we use the stationary Gaussian kernel model and constant . Once is computed, and the form of and are decided, can the constructed as in ordinary Kriging. Now that all components in are specified except for the in . We set as the realization from the ensemble that maximizes . The algorithm is summarized in Algorithm 1.