# A model and variance reduction method for computing statistical outputs of stochastic elliptic partial differential equations

###### Abstract

We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basis approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.

###### keywords:

Model reduction, variance reduction, reduced basis method, a posteriori error estimation, hybridizable discontinuous Galerkin method, multilevel Monte Carlo method, stochastic elliptic PDEs^{†}

^{†}journal: Journal of Computational Physics\biboptions

sort&compress

## 1 Introduction

The analysis of physical systems is often carried out by mathematical modeling and numerical simulation. For a given system, the corresponding mathematical model requires certain input data, such as material properties, forcing terms, boundary conditions and geometry information. For many problems of interest, input data are not known precisely. In such cases, one may need to consider input data as random variables and represent them in probabilistic terms. Mathematical models represented by partial differential equations with random input data are known as stochastic partial differential equations (SPDEs). Uncertainty in the input data may come from different sources. It can be that the physical system has some intrinsic variability, for example, uncertainty in the gust loads on an aircraft, or wind and seismic loading on civil structures. It is also possible that we are unable to effectively characterize the physical system with a mathematical model because, for instance, we may have errors in geometry, roughness of surfaces, or multiscale behavior that we are unable to capture. Therefore, there is a growing need to represent the uncertainty in the data and effectively propagate it through the mathematical model. The goal of this probabilistic approach resides in computing statistics of some observable outputs (quantities of interest), which are usually defined as functionals of the solution of the underlying SPDE.

There exist a number of different approaches to solve SPDEs and retrieve the statistics of the output. The most common approach is to use Monte Carlo (MC) methods Fishman (1996); Hammersley et al. (1965). Monte Carlo methods only need repeated evaluations of the output functional of the solution of the SPDEs for different instantiations of the random input. The main advantage of Monte Carlo methods is that their convergence rate is independent of the dimension of the stochastic space, namely, the number of random variables. The main caveat of these methods is their slow convergence rate, which demands a large amount of realizations to achieve accurate results. As a result, a number of techniques such as quasi Monte Carlo methods Niederreiter (1992); Caflisch (1998), Latin Hypercube Sampling Loh (1996); Stein (1987), variance reduction methods Caflisch (1998) and multilevel Monte Carlo Giles (2008) have been proposed to alleviate the slow convergence rate of the standard Monte Carlo methods.

Another approach is stochastic Galerkin methods, first introduced by Ghanem et al. in Ghanem and Spanos (1991), that generalize the theory of Wiener-Hermite polynomial chaos expansion Wiener (1938) and combine it with a finite element method to model uncertainty in a SPDE. In this approach, the random variables are treated as additional dimensions of the problem and projected onto a stochastic space spanned by a set of orthogonal polynomials. The problem is then reduced to a system of deterministic equations, which couple the physical and stochastic dimensions. This methodology has proven to be very effective when solving partial differential equations (PDEs) in a broad range of applications, such as diffusion problems and heat conduction Ghanem (1999); Xiu and Karniadakis (2002, 2003b), structural dynamics Ghanem and Sarkar (2002), transport in random media Ghanem (1998) and fluid dynamics Xiu and Karniadakis (2003a); Chen et al. (2005). The advantage of these methods is that they converge exponentially fast for a sufficiently regular solution field Babuška et al. (2004, 2005); Deb et al. (2001). However, their main drawback is that their computational complexity grows combinatorially with the number of random variables and the number of expansion terms. As a consequence, they are not effective for solving problems with a large number of random variables.

A more recent approach is stochastic collocation methods (SCM), first introduced in Mathelin and Hussaini (2003) and further developed in Xiu and Hesthaven (2005). The main idea is to compute deterministic solutions of the SPDE for certain instantiations of the random variables and then construct an interpolation function to approximate the response over the stochastic space. When the interpolation procedure is performed on tensor grids, these methods suffer from the exponential growth with the dimensionality of the stochastic space. To economize the interpolation process in large dimensions, sparse grids (Smolyak Smolyak (1963)) were introduced for elliptic problems Nobile et al. (2008b); Xiu and Hesthaven (2005), parabolic problems Nobile and Tempone (2009) and natural convection problems Ganapathysubramanian and Zabaras (2007). In Babuška et al. (2007), sparse grids were shown to achieve exponential convergence for problems with smooth solutions. However, like polynomial chaos expansions, sparse grids still suffer from the curse of dimensionality in the sense that the number of grid points grows rapidly with the dimension of the stochastic space. Recently, anisotropy and adaptivity on sparse grids Gerstner and Griebel (2003); Klimke (2006) have been used in SCM Ganapathysubramanian and Zabaras (2007); Nobile et al. (2008a) to mitigate the elevated cost in high dimensions.

In this paper, we present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) a reduced basis method for the HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basis approximations and the high-fidelity HDG discretization to accelerate the convergence rate of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Although the three ingredients of our approach exist in the literature, the main contribution of this paper is to put these methodologies into a unified framework that combines all of their strengths to tackle stochastic elliptic PDEs. Another important contribution of the paper is to develop a posteriori error bounds for the estimates of the statistical outputs and to introduce an algorithm for optimally choosing the dimensions of the reduced basis approximations and the sizes of MC samples to achieve a given error tolerance. Last but not least, we present a new HDG formulation that enables the efficient construction of reduced basis approximations for the HDG discretization of parameterized PDEs.

The HDG method was first introduced in Cockburn et al. (2009a) for elliptic problems, subsequently analyzed in Cockburn et al. (2008, 2009b, 2010), and later extended to a wide variety of PDEs Nguyen et al. (2009a, b); Cockburn et al. (2011); Nguyen et al. (2010b, a, c, 2011a, 2011c, 2011b); Nguyen and Peraire (2012); Ueckermann and Lermusiaux (2010). The HDG method is particularly effective for solving elliptic PDEs because it possesses several unique features that distinguish it from other DG methods. First, it reduces the number of globally coupled unknowns to those required to represent the trace of the approximate solution on the element boundaries, thereby resulting in a smaller global systems than other DG methods. Second, the method provides optimal convergence rates for both the solution and the flux. And third, its flux superconvergence properties can be exploited to devise a local a postprocess that increases the convergence rate of the approximate solution by one order. These advantages are the main driver for the development of the Reduced Basis (RB) method for the HDG discretization of parameterized PDEs. While the RB method is well developed for the standard finite element discretization of parameterized PDEs Noor and Peters (1980); Machiels et al. (2000); Prud’homme et al. (2002); Grepl and Patera (2005); Grepl (2005); Veroy et al. (2003); Veroy and Patera (2005), the RB method for the HDG approximation of parameterized PDEs has not been considered before. The HDG discretization has multiple field variables and various equivalent weak formulations, which make the application of the RB method non straightforward.

Recently, the RB method has been applied to standard continuous Galerkin finite element solutions of stochastic elliptic PDEs Boyaval et al. (2010); Haasdonk et al. (2013); Chen et al. (2014). In this approach, the stochastic PDE is first reformulated as a parametrized PDE over the coefficients of the Karhunen-Loève expansion of the random fields. The reduced basis approximation and associated a posteriori error estimation are then developed for the resulting parametrized PDE. Finally, the output statistics and their error estimates are computed with a MC simulation Boyaval et al. (2010); Haasdonk et al. (2013) or a stochastic collocation approach Chen et al. (2014). These approaches, which involve the RB method and its a posteriori error bounds to evaluate the output instead of the original finite element discretization, have been shown to outperform both standard MC and stochastic collocation. In this paper, we extend the previous work Boyaval et al. (2010); Haasdonk et al. (2013) in several important ways. We will use the HDG method to construct the RB approximation. We will adopt the multilevel Monte Carlo strategy Giles (2008); Barth et al. (2011); Cliffe et al. (2011); Teckentrup et al. (2013) and demonstrate a significant computational gain relative to the standard MC approach. Moreover, we will provide a posteriori error estimates for our prediction of the statistical outputs without involving a posteriori error bounds for the RB approximation. This feature will broaden the applicability of our approach to a wide variety of stochastic PDEs for which a posteriori error bounds for the RB approximation are either not available or too expensive to compute.

According to the central limit theorem Feller (1968), the error in a Monte Carlo estimation of the expectation of an output is proportional to the square root of the ratio of the variance of the output and the number of samples. Therefore, in order to reduce the error one can increase the number of samples and/or decrease the variance of the output. Because increasing the number of samples leads to higher computational cost, various techniques such as the control variates method Boyaval (2012); Caflisch (1998); Hammersley et al. (1965), the multilevel Monte Carlo method Giles (2008); Barth et al. (2011); Cliffe et al. (2011); Teckentrup et al. (2013), and the multi-fidelity Monte Carlo method Ng et al. (2012) have been proposed to reduce the variance of the output. The control variates method reduces the variance of the output by making use of the correlation between the output and a surrogate. The multi-fidelity Monte Carlo method makes use of the statistical correlation between the low-fidelity (surrogate) and high-fidelity outputs to reduce the number of high-fidelity evaluations needed to achieve a given accuracy of interest. The multilevel Monte Carlo method applies the principle of control variates to a sequence of lower fidelity outputs (multigrid approximations) to estimate the statistics of the high-fidelity output. Likewise, our method applies the principle of control variates to the HDG approximation and a sequence of reduced basis approximations, thereby shifting the computational burden from the high-fidelity HDG discretization to the lower fidelity RB approximations.

This article is organized as follows. In Section 2, we introduce a stochastic elliptic boundary value problem and describe a new weak HDG formulation particularly suited for the reduced basis method. In Section 3, we describe a reduced basis method for the HDG approximation of the stochastic elliptic boundary value problem. In Section 4, we develop a multilevel Monte Carlo method that incorporates the HDG approximation and its reduced basis models into a unified framework to provide rapid reliable computation of the statistical outputs. In Section 5, we present numerical results to demonstrate the performance of the proposed method. Finally, in Section 6, we discuss some directions for future research.

## 2 The Hybridizable Discontinuous Galerkin Method

### 2.1 Problem statement

Let be a regular domain with Lipschitz boundary . We consider the following stochastic boundary value problem: find a function such that,

(1a) | ||||||

(1b) |

where is the source term, is the diffusion coefficient, is the Helmholtz parameter, is the Robin coefficient, and is the boundary data. In this problem, one or more than one of the quantities are stochastic functions. For simplicity of exposition we shall assume that is a real stochastic function and that are deterministic. The generalization to the case where one or more of are stochastic is straightforward. Note that since we allow to be complex-valued functions, the solution is in general a complex stochastic function.

We next introduce a probability space , where is the set of outcomes, is the -algebra of the subsets of , and is the probability measure. If is a real random variable in and a probability event, we denote its expectation by . We will consider random functions in equipped with the following norm

We will assume that and that is bounded and strictly positive, i.e., there exist constants and such that

We next assume that the random function can be written in the following form

where is the expectation of , are uniformly bounded real functions, and for are mutually independent random variables with zero mean. In addition, we assume that each of the is bounded in the interval with a uniformly bounded probability density function . It thus follows that, with a slight overloading of notation, we can write in the form

where and .

Therefore, the solution of (1) can be written as a function of , namely, . Now let be a bounded linear functional. We introduce a random output defined as

We are interested in evaluating the expectation and variance of as

where . Below we describe the hybridizable discontinuous Galerkin method for solving the model problem (1) and the Monte Carlo simulation for computing estimates of and .

### 2.2 HDG discretization

We begin by rewriting the governing equation (1) as a first-order system

(2a) | ||||

(2b) | ||||

(2c) |

The physical domain is triangulated into elements forming a mesh satisfying the standard finite element conditions Ciarlet (1978). Then, letting and denoting by the set of the faces of the elements , we seek a vector approximation to , a scalar approximation to , and a scalar approximation to the trace of on element boundaries, where

and is a space of complex-valued polynomials of degree at most on . Note that are defined only on the faces of the elements, hence they are single valued. We introduce the following inner products

where whenever is a domain in , and whenever is a domain in . For vector-valued functions and , the integrals are similarly defined with the integrand being the dot product . Note that denotes the complex conjugate of .

The HDG approximations in are determined by requiring that

(3a) | ||||

(3b) | ||||

(3c) |

hold for all in , where the numerical flux is defined as

(4) |

Here is the so-called stabilization parameter, a global constant with dimensions where is the characteristic lengthscale. We set since we do not consider multiple physical scales in this work. Further discussions on may be found in Cockburn et al. (2009a); Nguyen et al. (2009a). By substituting (4) into (3) we obtain that satisfies

(5a) | ||||

(5b) | ||||

(5c) |

for all in . This completes the definition of the HDG method.

The above weak formulation of the HDG method involves three field variables, namely, and . However, the first two equations (5a) and (5b) allow us to write both and in terms of at the element level due to the fact that our approximation spaces are discontinuous. Therefore, we can substitute both and from the first two equations into the last equation (5c) to obtain a weak formulation in terms of only: find such that

(6) |

Here we omit the derivation of the bilinear form and the linear functional . Instead we refer the reader to Nguyen et al. (2009a) for a detailed discussion. The reduced weak formulation (6) gives rise to the following linear system

(7) |

where is the vector containing the degrees of freedom of . Because is single valued on the faces of the finite element mesh, it has significantly fewer degrees of freedom than . As a result, the global matrix system (7) of the HDG method can be much smaller than that of other DG methods. This results in significant savings in terms of computational time and memory storage.

It turns out that although the formulation (6) results in the smallest possible system, it is not ideal to use it as the starting point for our reduced basis method. Substituting the first two equations (5a) and (5b) into the last equation (5c) results in the inverse of the material coefficients and , which renders the bilinear form nonaffine in the material coefficients. Although nonaffine parameter dependence can be treated by using the empirical interpolation method Barrault et al. (2004) or the best points interpolation method Nguyen et al. (2008), such treatment incurs additional cost and is unnecessary. We are going to derive a new weak formulation of the HDG method, which is suited for the reduced basis method.

### 2.3 A new weak formulation of the HDG method

We begin by deriving a weak formulation of the HDG method upon which our reduced basis method is constructed. To this end, we introduce two lifting operators and defined as

(8a) | |||||

(8b) |

It thus follows from (5a) and (8) that we can express as a function of and as

(9) |

By substituting (9) into (5b) and (5c) we arrive at the following weak formulation: find such that

for all in . By setting the -dimensional approximation space to be , , and we obtain that , satisfies

(10) |

where the bilinear form and the linear functional are given by

(11a) | ||||

(11b) |

for all and . We note that the bilinear form (11) is affine in . Furthermore, if we select in (8) and substitute into (11), we recover a symmetric form, which is also coercive provided . Henceforth, the choice allows us to equip the approximation space with the inner product and the induced norm .

We now substitute the expression of from (2.1) into (11) to express as

(12) |

where the bilinear forms are given by for and . Therefore, we can write the weak formulation (10) as follows: for any , satisfies

(13) |

Finally, we evaluate our realization output as

where the linear functional is obtained from the HDG discretization of . The key point of the new HDG formulation (13) for an efficient perfomance of the reduced basis method is the affine representation (12). This aspect is of crucial importance, and the main reason we prefer (13) to the reduced weak formulation (6) for constructing the reduced basis approximation. Furthermore, the new formulation is optimal in terms of degrees of freedom, since we no longer account for the gradient . Finally, even though the parameter independent matrices arising from (12) are used for the reduced basis approximation, the solution is never computed as the solution of the full system (13). Instead, we can invoke again discontinuity of the approximation spaces to write in terms of . This common strategy in HDG methods enables us to solve for the global degrees of freedom of only and then recover efficiently.

### 2.4 Monte Carlo sampling with the HDG method

We are interested in evaluating statistics of the output such as its expectation and variance. Let be a set of random samples drawn in the parameter space with the probability density function . We evaluate the following outputs

(14) |

The Monte Carlo-HDG (MC-HDG) estimates of the expectation and variance can be computed, respectively, as

(15) |

We shall assume that is indistinguishable from for any . Moreover, it is a known result that the estimators in (15) are unbiased and converge in distribution to

Confidence intervals can be constructed employing the central limit theorem (CLT), that is for all we have

(16a) | ||||

(16b) |

where

(17) |

Therefore, in order to guarantee that is bounded by a specified error tolerance with a high probability (say, greater than ), we need to take and . As a result, can be very large for a small error tolerance. Hence, the evaluations (14)–(15) can be very demanding.

The remaining goals of this paper are as follows. On one hand, we develop a reduced basis (RB) method for rapid reliable approximation of the stochastic HDG output for any given parameter vector in . On the other hand, we develop a multilevel variance reduction method to accelerate the convergence of the Monte Carlo simulation by exploiting the exponentially fast convergence of the RB output to the high-fidelity HDG output as a function of the RB dimension. These two ingredients enable very fast reliable computation of the statistical outputs at a computational cost which is several orders of magnitude less expensive than that of the MC-HDG approach. We describe the reduced basis approach in Section 3 and the multilevel variance reduction method in Section 4.

## 3 Reduced Basis Method

We consider a “primal-dual” formulation Pierce and Giles (2000); Nguyen et al. (2005) particularly well-suited to good approximation and error characterization of the output. To this end, we introduce the dual problem of (13): given , the dual solution satisfies

The dual problem plays an important role in improving the convergence rate of both the RB output and associated error bound.

We next assume that we are given orthonormalized basis functions , , such that . We define the associated hierarchical RB spaces as

In practice, the spaces and consist of orthonormalized primal and dual solutions at selected parameter values generated by a Greedy sampling procedure Rozza et al. (2008); Grepl (2005); Veroy et al. (2003). For our present purpose, however, and can in fact represent any sequence of (low-dimensional) hierarchical approximation spaces. We then apply the Galerkin projection for both the primal and dual problems: Given , we find a primal RB approximation satisfying

(18) |

and a dual RB approximation satisfying

We can now evaluate the RB realization output as

As discussed below, the online computational cost of evaluating the RB output depends only on and . Hence, for small and , the RB approximation can be significantly less expensive than the HDG approximation.

The RB output is then used as an approximation to the HDG output in the Monte Carlo simulation. The Monte Carlo-Reduced Basis (MC-RB) estimates of the expectation and variance of the output of interest are given by

for the same set of samples . Since the RB approximation is constructed upon the HDG approximation these quantities actually approximate the MC-HDG estimates. We next develop a posteriori error bounds for our MC-RB estimates relative to the MC-HDG estimates.

### 3.1 A posteriori error estimation

We note from (18) that the residuals and associated with and , respectively, are given by

for all . The dual norm of the primal residual and the dual norm of the dual residual are given by

It is a standard result Nguyen et al. (2005); Rozza et al. (2008) that

where is a positive lower bound for the Babuška “inf-sup” stability constant defined as

that is, the minimum (generalized) singular value associated with the differential operator. It is critical to note that the output error (and output error bound) vanishes as the product of the primal and dual error (bounds), and hence much more rapidly than either the primal or dual error.

It thus follows that we can bound the errors in the MC-RB estimates relative to the MC-HDG estimates as

(19) |

and

(20) | |||||

It should be stated that this error bound is rather pessimistic, and that a more precise bound can be obtained by introducing suitable dual problems to recover a quadratically convergent bound for the variance, as reported in Haasdonk et al. (2013). We can also bound the difference between the RB expected value and the true value. To this end, we note from the triangle inequality that

(21) |

Following from (16a), (17), (19), (20), and (21) we define the error bound

(22) |

such that

Clearly, the error bound (22) comprises two terms: the first term is due to the MC sampling, while the second term is due to the RB approximation.

### 3.2 Computational strategy

The linearity and parametric affinity of the problem allow for an efficient Offline-Online decomposition strategy. The Offline stage — parameter independent, computationally intensive but performed only once — comprises the greedy search for the selection of parameter values, the computation of snapshots associated with the HDG approximation space at the selected parameter values and the formation and storage of several parameter-independent small matrices and vectors. The Online stage — parameter dependent, performed multiple times — evaluates for any new with complexity independent of the dimension of the HDG approximation space. The implications are twofold: first, if and are indeed small, we shall achieve very fast output evaluation, usually several orders of magnitude faster than the HDG output; second, we may choose the HDG approximation very conservatively — to effectively eliminate the error between the exact output and HDG output — without adversely affecting the Online (marginal) cost. We refer the reader to Nguyen et al. (2005); Prud’homme et al. (2002) for a more thorough description of the Offline-Online procedure.

It is clear that the error in the RB expected value and its error bound depend on and as well as on . Typically, both the error and its error bound decrease very rapidly as a function of , but very slowly as a function of the number of samples . Hence, should be chosen very large, while can be chosen to be much smaller. Indeed, the (Online) computational cost to evaluate the RB expected value and its error bound scales as . Since both and are typically very small, the RB method can effect significant savings relative to the HDG method. Nevertheless, its performance can be affected by the accuracy of the RB outputs and the sharpness of the RB error bounds.

## 4 Model and Variance Reduction Method

### 4.1 Control variates principle

We first review the essential idea of control variates, which will serve as a building block for our method. Let be a random variable. We would like to estimate the expected value of . Suppose that we have another random variable and that its expected value is either known or inexpensive to compute. We then introduce a new random variable

where is a deterministic coefficient. It is obvious that for any choice of . However, the variance of is different from that of . Specifically, we have

where is the covariance of and . It can be easily shown that the following choice

is optimal in the sense that it minimizes the variance of . With this choice, we have

where is the correlation coefficient of and . It is clear that if and are highly correlated (i.e., is close to ) then is much smaller than . In that case, the MC simulation of converges significantly faster than that of according to the CLT.

In summary, control variate methods try to estimate by using the “surrogate” expected value and sampling the reduced variance variable . When the same principle is applied recursively to estimate , the resulting method is called multilevel control variates.

### 4.2 Two-level Monte Carlo sampling

We now apply the above idea to compute an estimate of , where is the stochastic output obtained by using the HDG method to solve the underlying stochastic PDE as described in Section 2. To achieve this goal, we introduce

where is the RB output developed in Section 3 for some . Because generally approximates very well, the two outputs are highly correlated. Therefore, we choose as we expect that the optimal value of is close to 1. With this choice, we obtain

(23) |

The underlying premise here is that the two expectation terms on the right hand side can be computed efficiently by MC simulations owing to variance reduction and model reduction: the first term requires a small number of samples because its variance is generally very small, while the second term is less expensive to evaluate because it involves the RB output.

In particular, let and be two independent sets of random samples drawn in with the probability density function . We calculate our Model and Variance Reduction (MVR) unbiased estimate of as

(24) |

where

(25) |

We note that our approach computes an estimate of , while the MC-RB approach described in the previous section computes an estimate of .