Dimension Reduction for Gaussian Process Emulation: An Application to the Influence of Bathymetry on Tsunami Heights

Dimension Reduction for Gaussian Process Emulation: An Application to the Influence of Bathymetry on Tsunami Heights

Xiaoyu Liu and Serge Guillas
Department of Statistical Science, University College London
Abstract

High accuracy complex computer models, also called simulators, require large resources in time and memory to produce realistic results. Statistical emulators are computationally cheap approximations of such simulators. They can be built to replace simulators for various purposes, such as the propagation of uncertainties from inputs to outputs or the calibration of some internal parameters against observations. However, when the input space is of high dimension, the construction of an emulator can become prohibitively expensive. In this paper, we introduce a joint framework merging emulation with dimension reduction in order to overcome this hurdle. The gradient-based kernel dimension reduction technique is chosen due to its ability to drastically decrease dimensionality with little loss in information. The Gaussian process emulation technique is combined with this dimension reduction approach. Theoretical properties of the approximation are explored. Our proposed approach provides an answer to the dimension reduction issue in emulation for a wide range of simulation problems that cannot be tackled using existing methods. The efficiency and accuracy of the proposed framework is demonstrated theoretically, and compared with other methods on an elliptic partial differential equation (PDE) problem. We finally present a realistic application to tsunami modeling. The uncertainties in the bathymetry (seafloor elevation) are modeled as high-dimensional realizations of a spatial process using a geostatistical approach. Our dimension-reduced emulation enables us to compute the impact of these uncertainties on resulting possible tsunami wave heights near-shore and on-shore. Considering an uncertain earthquake source, we observe a significant increase in the spread of uncertainties in the tsunami heights due to the contribution of the bathymetry uncertainties to the overall uncertainty budget. These results highlight the need to include the effect of uncertainties in the bathymetry in tsunami early warnings and risk assessments.


Keywords: Dimension reduction, Gaussian process, statistical emulation, uncertainty quantification

1 Introduction

Simulators are widely employed to reproduce physical processes and explore their behavior, in fields such as fluid dynamics or climate modeling. To characterize the impact of the uncertainties in the boundary conditions or the parameterizations of the underlying physical processes, a sufficient number of simulations are required. However, when the simulators are computationally expensive, as it is the case for high accuracy simulations, the task can become extremely costly or even prohibitive. One prevailing way to overcome this hurdle is to construct statistical surrogates, namely emulators, to approximate the computer simulators in a probabilistic way [29]. Emulators are trained on a relatively small number of well chosen simulations, i.e. a design of computer experiments. Outputs at any input can be predicted at little computational cost with emulators. One can then employ emulators for any subsequent purposes such as uncertainty propagation, sensitivity analysis and calibration.

With high-dimensional inputs, say beyond 20 dimensions (usually in the hundreds or thousands), a large design is usually required to explore the input space, typically in the order of 10 times the number of dimensions, for a reasonable level of approximation. One would face serious computational problems since the original simulator cannot be run many times or is very expensive to run. Advanced designs such as Latin Hypercubes or new sequential designs [4] that are more efficient than Latin Hypercubes only partially alleviate the issue. As a result, methods that adequately reduce the dimension of the input space are required, as high-dimensional inputs are often present in computer models, e.g. as boundary conditions like the bathymetry (i.e. seafloor elevation) in tsunami modeling. Some approaches ignore high-dimensional inputs and add stochastic terms to account for their contribution [19, 25]. These methods are easy to implement and effective in some applications. However, repeated simulations at the same input parameters that are encoded in the emulation are often required to estimate the variability due to those parameters that are ignored. The variability estimates are often restricted to the second moments, and the input-output relationships over the ignored inputs are not clear. Constantine et al. [6] proposed to find rotations of the input space with the strongest variability in the gradients of the simulators and constructed a response surface on such a low-dimensional active subspace. This Active Subspace (AS) approach has been demonstrated to be effective theoretically and numerically. Constantine and Gleich [5] studied further the properties of the Monte Carlo approximation of the subspace. However, this method requires the calculation of a sufficient number of gradients explicitly, which unfortunately prevents its use in many applications. The gradients are often unavailable in many realistic simulators, and typically intractable for systems of mixed PDEs or multi-physics simulations. Even in the rare situations where gradients are computable numerically, the computational cost of obtaining them could be prohibitive.

The concept of active subspace is closely related to the sufficient dimension reduction (SDR) [7, 9] and effective dimension reduction (EDR) [22] in the statistical community. Given an explanatory variable (input) and response variable (output), the aim of SDR (or EDR) is to find the directions in the subspace of that contain sufficient information about the response for statistical inference. More specifically, a SDR where satisfies where and are conditional probability density functions with respect to and respectively. The EDR approach aims to specifically find a linear projection matrix onto a -dimensional subspace such that and

(1)

Several methods have been developed to find SDR including nonparametric approaches such as sliced inverse regression (SIR) [22], minimum average variance estimation (MAVE) [30], and parametric approaches like principal fitted components (PFC) [8, 10]. In this paper, we adopt the gradient-based kernel dimension reduction (gKDR) [15] to construct low-dimensional approximations to the simulators. The gKDR approach does not require any strong assumptions on the variables and distributions. The response variable can be of arbitrary type: continuous or discrete, univariate or multivariate. Unlike the active subspace method, gradients are not required to be computed explicitly but are estimated non-parametrically and implicitly using stable kernel methods. Our proposed approach therefore provides an answer to the dimension reduction issue in emulation for a wide range of problems that cannot be tackled using existing methods at the moment. Moreover, the gKDR approach ends up with an eigen-problem without any needs of elaborate numerical optimization and thus can be applied to large and high-dimensional problems.

We introduce a joint framework to approximate the high-dimensional simulators by combining statistical emulators with the gKDR approach. Deterministic simulators are considered here, however the framework could potentially be applied to stochastic simulators, with additional treatments to the stochastic effect in the emulation, see e.g. [18]. Throughout this paper, the mainstream Gaussian process (GP) emulators are employed for illustration. But the general framework and most of the results would potentially hold for other emulation techniques.

The paper is organized as follows. Section 2 and 3 review GP emulation and the gKDR approach respectively. In Section 4, a joint framework of dimension reduction combined with emulation is proposed and some theoretical properties are established. Section 5 contains the numerical experiments on an elliptic PDE and an application to the propagation of uncertainties in the bathymetry to tsunami wave heights, as well as comparison with alternative methods. Section 6 consists of some conclusive discussion.

2 Gaussian process emulation

A Gaussian process is a collection of random variables such that any finite subset of these variables follow a joint Gaussian distribution [27]. It is widely used in various scientific fields. Here we briefly review some basics of its application in statistical emulation.

A deterministic simulator with multivariate input and univariate output can be represented as . The GP emulator assumes that the simulator output can be modeled with a Gaussian process. It is commonly assumed that the mean can be written as , where is a -vector of pre-defined regression functions and the coefficients . In practice, a constant or linear form for the regression functions would perform well. The covariance between two simulator outputs and is usually represented as , where the positive scalar parameter is the process variance and is the correlation function. One common choice for the correlation function is the squared-exponential correlation where controls the correlation lengths.

Suppose that the simulator is run at inputs and the outputs are respectively. We may need to impose a prior for the parameter in the mean function . One of the popular choices is a Gaussian prior, , which forms a conjugate prior with the GP likelihood. At any desired inputs , the respective outputs are denoted by . Then letting and , the predictive process of (also known as kriging prediction) given the observed data and the covariance function is

(2)

with

(3)

and

(4)

where represents the hyperparameters from the covariance function, , and are , and matrices respectively with the associated -th entry as , and , and .

We can see that the output at any desired input predicted using a GP emulator is a distribution rather than a single value. This could be used to estimate the uncertainty introduced into the prediction with emulator and to evaluate the confidence about the prediction. However, the hyperparameters is usually unknown and needs to be specified properly. It is possible to make a fully Bayesian inference with appropriate prior . But this usually requires costly MCMC approach for the analytically intractable posterior. In practice, a computationally cheap alternative is often employed by specifying the hyperparameters at the most probable values. This could be done by maximizing the marginal likelihood,

(5)

Then the prediction can be completed by plugging in .

Usually there is no sufficient information about the parameter , hence a vague prior can be imposed by letting and , where is the matrix of zeros. In this case, the conditional mean and covariance of the predictive process are respectively

(6)

and

(7)

where . This is closely related to the -process [26] when a weak prior for that is assumed with the mean function and the covariance function , where contains the parameters in the correlation function .

When is used with a continuous correlation function, such as the squared-exponential correlation, the emulator interpolates through the training data, i.e. and at the training points . When a nugget term is included, this is no longer true. A nugget term can be included, e.g. to mitigate numerical instabilities or account for the stochastic terms in simulations [3]. The correlation function can be extended with the addition of a nugget as where is the nugget term, and is the indicator function that takes if and otherwise. The associated correlation matrix is , where is the identity matrix.

In practice, the error in the prediction of GP emulator depends on the number of training data points. As there are more and more training data points, the GP emulator will be expected to recover the simulator. There are several theoretical results on how well the GP emulator can approximate the simulator in the literature. For example, given training samples that are quasi-uniformly distributed on , the error can be bounded [13] as for any in some function space over , typically a Hilbert space, where controls the smoothness of the function and is a constant depending on the dimension . This result suggests that provides arbitrarily high approximation order when , i.e. is infinitely smooth. However, this rate decreases as the dimension increases and the constant also grows with . Hence, the approximation deteriorates in very high dimensions. This implies that more evaluations of the simulator are required to train an accurate emulator when the number of input parameters increases and the associated computational cost of constructing an emulator could increase dramatically as a result. Therefore, it is desirable to reduce the dimension of the problem from the perspectives of both accuracy and efficiency.

3 Gradient-based kernel dimension reduction

For a set , a symmetric kernel is positive-definite if for any and . Then a positive-definite kernel on is uniquely associated with a Hilbert space consisting of functions on such that 1) ; 2) the linear hull of is dense in ; 3) for any and where is the inner product in . Because the third property implies that the kernel reproduces any function , the Hilbert space is called the reproducing kernel Hilbert space (RKHS) associated with . Let be a random vector on the domain , and and be positive definite-kernels on and with respective RKHS and . We shortly present the salient facts about the gKDR approach.

Fukumizu and Leng [15] noted that for any , there exists a function on such that

(8)

Then, under mild assumptions, we have, for any ,

(9)

On the other hand, defining the cross-covariance operator as the operator such that

(10)

holds for all , , and using the fact that

(11)

if for any [14], we obtain

(12)

Equating the two expressions above yields for ,

(13)

Therefore, the dimension reduction projection matrix is formed as the eigenvectors associated with the nontrivial eigenvalues of the matrix .

Given i.i.d. samples , the matrix can be approximated with [15] that contains the first eigenvectors of the following symmetric matrix,

(14)

where and are the Gram matrices with the -entry as and respectively, .

Sometimes there may not exist such a sufficient subspace rigorously so that , or we may want to select less dimensions for later analysis even in cases where such a subspace exists in order to achieve a more stringent reduction (albeit with a small loss). For convenience, we slightly reformulate the gKDR approach into a more general form without any change to the results in [15]. Let be an matrix with , satisfying . In fact, if there exists a matrix satisfying , we can just set , where is an matrix such that and the column vectors of are orthogonal to those of ; otherwise, and .

Following the same procedure as before, it is easy to see that

(15)

If there exists satisfying with , for any , hence the respective columns correspond to the zero eigenvalues of . The projection matrix does not depend on the value of , while the nontrivial eigenvalues vary with . Therefore, we obtain the following eigen-decomposition

(16)

4 Joint emulation with dimension reduction

The gKDR approach is now applied together with GP emulation, to construct a low-dimensional approximation to a simulator. Thus the following procedure is employed to emulate a high-dimensional simulator.

Step 1. Given a set of simulator’s runs , estimate the projection matrix using the gKDR approach.

Step 2. Split into , where consists of the first columns of corresponding to the largest eigenvectors.

Step 3. Design a set of runs of the simulator, e.g. based on the reduced space , and construct an emulator using the lower dimensional pairs .

In Step 1, sufficient samples are needed to estimate accurately. The theoretical results in [15] on the convergence rate of would provide some insights. In practice, the number of directions that have a major influence may also affect the sample size needed. Step 2 requires an appropriate selection of to construct an efficient and effective emulator. The samples to train the emulator in Step 3 can be different (e.g. additional runs) from those already collected to find in Step 1. There is a benefit in terms of design arising from the dimension reduction. Indeed, in step 3, the design can be built to explore the reduced space of possible but the actual inputs of the simulator are of the corresponding high-dimensional values of , as the dimensions left out are deemed unimportant.

4.1 Approximation properties

We now explore some theoretical properties of the low-dimensional approximation to a simulator using the gKDR approach. For any , if is known exactly, we have the eigen-decomposition . Suppose that the eigenvectors and eigenvalues are partitioned as

(17)

where with consists of the first largest eigenvalues, is the matrix whose columns are the associated eigenvectors. Then for any , we can define the projected coordinates by and . Our proposed approach suggests to make inference on based on instead of the full explanatory variable . The following proposition establishes an error bound for such approximation.

Proposition 1.

For any and , we approximate by for any such that . The approximation error is bounded as follows:

(18)

where is a constant depending on the domain of , are positive constants relating to and .

Proof.

Let , and for . Following [2], for any , we can define a bounded linear operator on the Hilbert space such that

(19)

Its adjoint is a mapping , defined by the relation for any and . Then we have

(20)

Because is arbitrage, it must hold for any that

Defining , it is easy to see that . From the derivation of the gKDR approach, the derivative of w.r.t is just and . We denote the range of an operator as and its kernel (null space) as . The space is finite-dimensional and hence closed, so we have the decomposition . In particular, for any , there is and such that . Hence we obtain

(21)

Given the projection of coordinates from to and , we can write

(22)

The gradient of w.r.t can be obtained by the chain rule as

(23)

where relates to . Then it is easy to see that

(24)

where the positive constants depend on and , for . Similarly, we have

(25)

where the positive constants depend on and , for .

We now infer based on rather than with . For any , we have

(26)

Therefore, for any fixed , we estimate with for any , i.e.

(27)

Note that for any fixed , the original function is a function of only , while the approximation is in fact the average of over all possible , and so is not a function of . The Poincaré inequality yields

(28)

where is a constant depending on the domain of . ∎

When represents a sufficient dimension reduction, for , which implies that exactly. Though the result is presented with conditional mean for any , it is not limited to the first moment only. For characteristic kernels such as the popular Gaussian RBF kernel and the Laplace kernel , probabilities are uniquely determined by their means on the associated RKHS [15]; see also [17] for a definition of the distance between probabilities using their means.

In practice, cannot be known exactly. We can only estimate a perturbed version instead using the eigen-decomposition of . Under some mild conditions, converges in probability to with order for some [15]. As a result, we have the following result.

Proposition 2.

For any and , we approximate by for every such that . Then we have

(29)

where is a constant depending on the domain of and the are positive constants related to and .

Proof.

Denoting and , the convergence result on [15] entails that for any , there exists a constant and such that for any ,

(30)

Then there exists such that for any ,

(31)

where are the eigenvalues of ; where can be chosen as .

The distance between subspaces that are spanned by columns of and , denoted by and respectively, can be defined as [16]

(32)

Using Corollary of [16], we have

(33)

Hence We also note that .

Then for any , we have the following approximation to ,

(34)

Letting and following the same procedure as Proposition 1, for any fixed we have,

(35)

where is some constant. Since , we have

(36)

The result holds by plugging the respective terms. ∎

The approximation procedure generates an “innovative simulator” on the reduced input space of , which is however not deterministic. Suppose there are two distinct inputs and with the respective outputs . It may happen that , i.e. the approximated simulator may yield different outputs given the same input. The low-dimensional stochastic simulator can nevertheless be emulated, for example using GP with nugget effect, assuming that the influence of the dropped components is relatively small and simple enough to be captured by the nugget. The overall approximate error of the final emulator to can be decomposed into , where the first term in the right hand side is due to the low-dimensional approximation which has been investigated in Proposition 2, and the second term depends on the emulation procedure.

4.2 Choice of parameters and structural dimension

When applying the proposed framework for emulation, several parameters need to be specified properly, e.g. the parameters in the kernels and the regularization parameter . The cross validation approach can be used to tune such parameters as in many nonparametric statistical methods. In addition, it is also required to choose an appropriate structural dimension to construct an accurate emulator.

One of the possible ways is to choose within the dimension reduction procedure. Fukumizu and Leng [15] pointed out that it might not be practical to select based on asymptotic analysis of some test statistics, as in many existing dimension reduction techniques, when the dimension is high and the sample size is small. They mentioned that the ratio of the sum of the largest eigenvalues over the sum of all the eigenvalues, , might be useful in identifying the conditional independence of and given . In addition, Proposition 2 shows that the approximation error decreases as a function of . As discussed in [5], might be chosen such that is maximized. However, we may notice that the approximation error also depends on the squares of the eigenvalues with some unknown weights. Therefore it seems to be not very practical to select solely based on the eigenvalues.

On the other hand, Fukumizu and Leng [15] suggested to select based on the subsequent utilization of rather than the dimension reduction procedure when dimension reduction serves as a pre-processing step. For example the ultimate goal of our proposed framework here is to construct an accurate emulator, hence it is intuitive to select the structural dimension that produces the best predictive performance. Therefore, in the following numerical studies, we select as well as other parameters for the gKDR approach using simple trial-and-error or more formal cross validation approach based on the predictive accuracy of the respective emulators.

5 Numerical simulations

In this section we conduct two numerical studies. In the first study, the proposed emulation framework using the gKDR approach is compared with several alternatives of dimension reduction and the full emulation on a PDE problem. This problem set up allows the computation of gradients explicitly. In the second study, we illustrate the emulation framework with an application to tsunami modeling; we also provide a comparison to other methods, except AS which cannot be applied. Throughout the simulations, the GPML code using maximum likelihood method implemented by [27] is employed for the emulation assuming a linear form mean function with intercept, and a squared exponential correlation function.

5.1 Study 1: elliptic PDE with explicit gradients available

In this example, we investigate the elliptic PDE problem with random coefficients as studied in [6]. Let satisfy the linear elliptic PDE

The homogeneous Dirichlet boundary conditions are set on the left, top and bottom boundary (denoted by ) of the spatial domain of , and a homogeneous Neumann boundary condition is imposed on the right side of the spatial domain denoted . The coefficients are modeled by a truncated Karhunen-Loeve (KL) type expansion

(37)

where the are i.i.d. standard Normal random variables, and are the eigenpairs of the correlation operator

(38)

The target value is a linear function of the solution

(39)

The problem is discretized using a finite element method on a triangulation mesh, then and can be computed as a forward and adjoint problem; see [6] for more details. We choose and examine two cases of the correlation lengths or . Therefore, the original input space is with standard Normal distribution and the output is univariate.

The gKDR approach is applied to reduce the dimension of the problem using samples. We also compare with several popular alternative dimension reduction techniques: AS (here possible due to the explicit gradients), SIR, SIR-II [22], sliced average variance estimation (SAVE) [11], MAVE and PFC. After reducing the dimension of the problem, the GP emulator is trained using a Latin Hypercube design of points on the reduced -dimensional space so that the whole procedure needs samples in total using each dimension reduction method. For comparison, we also emulate the problem on the original -dimensional input space directly with samples, which is the full emulation. The gKDR approach is implemented in Matlab by K. Fukumizu, see http://www.ism.ac.jp/~fukumizu/. The Matlab code for AS and solving the PDE by [6] is available on https://bitbucket.org/paulcon/active-subspace-methods-in-theory-and-practice. For SIR, SAVE and PFC, the codes are provided in the Matlab LDR-package (https://sites.google.com/site/lilianaforzani/ldr-package), and SIR-II is implemented by simply modifying the SIR code. For MAVE, the Matlab code by Y. Xia is available from http://www.stat.nus.edu.sg/~staxyc/. The associated parameters in some methods, such as the kernel and regularization parameters for gKDR, the number of slices for the sliced methods and the degree of polynomial basis for PFC, are chosen in a simple trial-and-error way by trying several values and selecting the best.

The final emulators are used to make prediction on a testing set of evaluations that differ from the training set, where and is drawn randomly from the standard Normal distribution. The predictive performance is measured by the normalized predictive root-mean-square-error (PRMSE)

where is the prediction (predictive mean) using emulation. The associated computing time is also recorded with three parts: T1 for running the simulator, T2 for estimating the dimension reduction and T3 for training the emulator and making prediction. Note that T1 includes the time devoted to run the simulator times when using all the dimension reduction methods. It also includes the time used to compute the gradients for AS and the additional runs for full emulation. T2 is zero for full emulation since there is no dimension reduction. T3 also includes the time for running the simulator times on the designed points except full emulation.

In this study, we choose and . Table 1 presents the results on a testing set with evaluations using different emulation approaches and Fig. 1 shows an example of the associated computing time when and . Compared with the full emulation results, by reducing the dimension properly, the predictive accuracy can be improved, especially when the correlation length is long (). Also, as a result, the computing time for training GP emulator (T3) decreases dramatically. In terms of predictive accuracy, AS naturally performs the best, as it is using exact gradients, followed by gKDR. MAVE, SIR and PFC are better than SIR-II and SAVE, but PFC does not work very well when and MAVE spends more computing time on dimension reduction. Most methods yield smaller errors for than , except PFC and full emulation. Unlike the other techniques, AS employs exact gradients which explains its lead in performance. However, as shown in Fig. 1, the computing time T1 for AS is about two orders of magnitude longer than the others making the method most computationally expensive. Moreover, computing gradients is sometimes impossible, e.g. for the tsunami simulation in the next study. This restricts the applicability of AS method to a few applications. To summarize, when the exact gradients are computable the proposed gKDR approach is able to produce comparable results (though not as good) as the AS method that uses exact gradients, and outperforms the other SDR methods in most cases. However, the computational cost of applying gKDR is much less than that employing AS. In fact, gKDR not only is able to find the SDR accurately and efficiently, but also can be applied in a wide range of scenarios where complicated variable types or very high dimensions are involved. The next application into tsunami simulation provides a snapshot of its wide capability when there are few applicable alternatives.

d gKDR AS SIR SIR-II SAVE MAVE PFC Full
1
2
3
4
5
1
2
3
4
5
Table 1: Study 1: Normalized PRMSEs at testing sites using emulation on the full input space (Full) or combined with different dimension reduction techniques.
Figure 1: Computing time (in seconds) for the emulation using different approaches when and .

5.2 Study 2: tsunami emulation where no gradients available

Here we apply the proposed general framework to investigate the impact of uncertainties in the bathymetry on tsunami modeling, where the bathymetry is included as a high-dimensional input.

A synthetic bathymetry surface is created in the coordinate system to conduct tsunami simulations as shown in Fig. 2 (a). For simplicity, we assume that the seabed elevation only vary along the first coordinate . Though simple, it still captures the typical continental characteristics: the continental shelf spans from shore line () to around km at the water depth of around m; the continental slope is between km and km with water depth of m; west of km it is the deep ocean with water depth of m .

Figure 2: (a) Synthetic bathymetry; (b) seabed uplift when m; (c) gauge sites.

To model uncertainties and mimic the realistic boat tracks of oceanic surveys, some irregular lines are drawn. We consider two levels of survey density which are denoted by survey level 1 and 2 respectively. Considering that the surveys are usually constrained within budgets, the total lengths of the two level surveys are fixed at and km. To account for different possible survey traces, samples of boat tracks are drawn at each level of survey density; see in Fig. 3 three samples per level for illustration. In this study, we only consider the impact of the uncertainties in the bathymetry within the area