Accelerating stochastic collocation methods for partial differential equations with random input data ^{†}^{†}thanks: This material is based upon work supported in part by the U.S. Air Force of Scientific Research under grant number 1854V52112; by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under contract numbers ERKJ259, and ERKJE45; and by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UTBattelle, LLC, for the U.S. Department of Energy under Contract DEAC0500OR22725.
Abstract
This work proposes and analyzes a generalized acceleration technique for decreasing the computational complexity of using stochastic collocation (SC) methods to solve partial differential equations (PDEs) with random input data. The SC approaches considered in this effort consist of sequentially constructed multidimensional Lagrange interpolant in the random parametric domain, formulated by collocating on a set of points so that the resulting approximation is defined in a hierarchical sequence of polynomial spaces of increasing fidelity. Our acceleration approach exploits the construction of the SC interpolant to accelerate the underlying ensemble of deterministic solutions. Specifically, we predict the solution of the parametrized PDE at each collocation point on the current level of the SC approximation by evaluating each sample with a previously assembled lower fidelity interpolant, and then use such predictions to provide deterministic (linear or nonlinear) iterative solvers with improved initial approximations. As a concrete example, we develop our approach in the context of SC approaches that employ sparse tensor products of globally defined Lagrange polynomials on nested onedimensional ClenshawCurtis abscissas. This work also provides a rigorous computational complexity analysis of the resulting fully discrete sparse grid SC approximation, with and without acceleration, which demonstrates the effectiveness of our proposed methodology in reducing the total number of iterations of a conjugate gradient solution of the finite element systems at each collocation point. Numerical examples include both linear and nonlinear parametrized PDEs, which are used to illustrate the theoretical results and the improved efficiency of this technique compared with several others.
tochastic and parametric PDEs, stochastic collocation, highdimensional approximation, uncertainty quantification, sparse grids, multivariate polynomial approximation, iterative solvers, conjugate gradient method
65N30, 65N35, 65N12, 65N15, 65C20
1 Introduction
Modern approaches for predicting the behavior of physical and engineering problems, and assessing risk and informing decision making in manufacturing, economic forecasting, public policy, and human welfare, rely on mathematical modeling followed by computer simulation. Such predictions are obtained by constructing models whose solutions describe the phenomenon of interest, and then using computational methods to approximate the outputs of the models. Thus, the solution of a mathematical model can be viewed as a mapping from available input information onto a desired output of interest; predictions obtained through computational simulations are merely approximations of the images of the inputs, that is, of the output of interest. There are several causes for possible discrepancies between observations and approximate solutions obtained via computer simulations. The mathematical model may not, and usually does not, provide a totally faithful description of the phenomenon being modeled. Additionally, when an application is considered, the mathematical models need to be provided with input data, such as coefficients, forcing terms, initial and boundary conditions, geometry, etc. This input data may be affected by a large amount of uncertainty due to intrinsic variability or the difficulty in accurately characterizing the physical system.
Such uncertainties can be included in the mathematical model by adopting a probabilistic setting, provided enough information is available for a complete statistical characterization of the physical system. In this effort we assume our mathematical model is described by a partial differential equation (PDE) and the random input data are modeled as finite dimensional random fields, parameterized by a vector of dimension , consisting of uncorrelated realvalued random variables. Therefore, the goal of the mathematical and computational analysis becomes the approximation of the solution map , or statistical moments (mean, variance, covariance, etc.) of the solution or some quantity of interest (QoI) of the system, given the probability distribution of the input random data. A major challenge associated with developing approximation techniques for such problems involves alleviating the curse of dimensionality, by which the computational complexity of any naïve polynomial approach will grow exponentially with the dimension of the parametric domain.
Monte Carlo (MC) methods (see, e.g., [17]) are the most popular approaches for approximating highdimensional integrals, based on independent realizations , , of the parameterized PDE; approximations of the expectation or other QoIs are obtained by averaging over the corresponding realizations of that quantity. The resulting numerical error is proportional to , thus achieving convergence rates independent of dimension , but requiring a very large number of samples to achieve reasonably small errors. Other ensemblebased methods, including quasiMC (QMC) and important sampling (see [29, 24, 39] and the references therein), have been devised to produce increase convergence rates, e.g., proportional to , however, the function increases with dimension . Moreover, since both MC and QMC are quadrature techniques for QoIs, neither have the ability to simultaneously approximate the solution map , required by a large class of applications.
In the last decade, two global polynomial approaches have been proposed that often feature much faster convergence rates: intrusive stochastic Galerkin (SG) methods, constructed from predefined orthogonal polynomials [19, 44], or best term and quasioptimal approaches [9, 12, 14, 6], and nonintrusive stochastic collocation (SC) methods, constructed from (sparse) Lagrange interpolating polynomials [1, 31, 30], or discrete projections [27, 28]. These methods exploit the underlying regularity of the PDE solution map with respect to the parameters , evident in a wide class of highdimensional applications, to construct an approximate solution, and differ only in the choice of basis.
For both SG and SC approaches, the overall computational cost grows rapidly with increasing dimension. A recent development for alleviating such complexity and accelerating the convergence of parameterized PDE solutions is to utilize multilevel methods (see e.g., multilevel Monte Carlo (MLMC) methods [20, 11, 4, 40, 5] and the multilevel stochastic collocation (MLSC) approach [41]). The main ingredient to multilevel methods is the exploitation of a hierarchical sequence of spatial approximations to the underlying PDE, which are then combined with discretizations in parameter space in such a way as to minimize the overall computational cost. The approximation of the solution on the finest mesh is represented by the approximation on the coarsest mesh plus a sequence of “correction” terms. The resulting decrease in complexity with the use of multilevel methods results from the fact that the dominant behavior of the solution can be captured with cheap simulations on coarse meshes, so that the number of expensive simulations computed on fine meshes can be considerably reduced.
Nonetheless, the dominant cost in applying any uncertainty quantification (UQ) approach lies in the solution of the underlying parametrized linear/nonlinear PDEs, for a given value of the random inputs. Such solutions are often computed using iterative solvers, e.g., conjugate gradient (CG) methods for symmetric positivedefinite linear systems, generalized minimal residual method (GMRES) for nonsymmetric linear systems [37], and fixedpoint iteration methods[36] for nonlinear PDEs. However, many highfidelity, multiphysics models can exhaust the resources of the largest machines with a single instantiation and, as such, are not practical for even the most advanced UQ techniques. As such, several methods for improving the performance of iterative solvers have been proposed; especially preconditioner and subspace methods for iterative Krylov solvers. A strategy that utilizes shared search directions for solving a collection of linear systems based on the CG method is proposed in [8]. In [33] a technique called Krylov recycling was introduced to solve sets of linear systems sequentially, based on ideas adapted from restarted and truncated GMRES (see [38] and the references therein). This approach was later applied to the linear systems that arise from SG approximations that use the socalled doubly orthogonal bases to solve stochastic paramterized PDEs [25] . In addition, several preconditioners have been developed that improve the performance of solving the large linear systems resulting from SG approximations that employ standard orthogonal polynomials [18, 35, 16, 21].
On the other hand, when a general linear solver is employed to solve the underling SG or SC approximation, it is straightforward to see that improved initial approximations can significantly reduce the number of iterations required to reach a prescribed accuracy. A sequential orthogonal expansion is utilized in [18, 34] such that a low resolution solution provides an initial guess for the solution of the system with an enriched basis. However, at each step, all the expansion coefficients must be explicitly recomputed, resulting in increased costs. Similarly, in [21] an extension of a meanbased preconditioner is applied to each linear system coming from a sequential SC approach, wherein the solution of the th system is given as the initial vector for the th system. This approach, as well as the Krylov recycling method, impose an ordering of the linear systems that appear in the SC approximation. Consequently, new approaches are needed to amortize the cost of expensive simulations by reusing both deterministic and stochastic information across multiple ensembles of solutions.
In this work, we propose to improve the computational efficiency of nonintrusive approximations, by focusing on SC approaches that sequentially construct a multidimensional Lagrange interpolant in a hierarchical sequence of polynomial spaces of increasing fidelity. As opposed to multilevel methods that reduce the overall computational burden by taking advantage of a hierarchical spatial approximation, our approach exploits the structure of the SC interpolant to accelerate the underlying ensemble of deterministic solutions. Specifically, we predict the solution of the parametrized PDE at each collocation point on the current level of the SC approximation by evaluating each sample with a previously assembled lower fidelity interpolant, and then use such predictions to provide deterministic (linear or nonlinear) iterative solvers with improved initial approximations. As a particular application, we pose this acceleration technique in the context of hierarchical SC methods that employ sparse tensor products of globally defined Lagrange polynomials [31, 30], on nested onedimensional ClenshawCurtis abscissas. However, the same idea can be extended to other nonintrusive collocation approaches including orthogonal polynomials [44], as well as piecewise local and wavelet polynomials expansions [7, 22].
The sparse grid SC approximation considered in this work produces a sequence of interpolants, where a new set of collocation points is added on each level in order to increase the accuracy of the interpolant. For each newly added collocation point on the current level, we predict the solution of the underlying deterministic PDE using the most up to date sparse grid interpolant available; the previous level’s interpolant. We then use the prediction as the starting point of the iterative solver. The uniform convergence of the sparse grid interpolant to the true solution results in an increasingly accurate initial guess as the level increases, so that the overall complexity of the SC method can be dramatically reduced. We apply our novel approach in the context of solving both linear and nonlinear stochastic PDEs, wherein, we assume that the parameterized systems are solved by some existing linear or nonlinear iterative method. Furthermore, in the linear case, this technique can also be used to efficiently generate improved preconditioners for linear systems associated to the collocation points on higher levels, which further accelerates the convergence rate of the underlying solver.
The outline of this paper is as follows: We begin by describing the class of parameterized linear and nonlinear stochastic PDEs under consideration in §2. In §3 we describe our acceleration technique in the context of general stochastic collocation methods, defined on a hierarchical sequence of polynomial spaces, for approximating both linear and nonlinear stochastic elliptic PDEs using nonlinear iterative solvers. In §4 we briefly recall the sparse grid SC method, where the sparse grid interpolant is constructed with the use of nested onedimensional ClenshawCurtis abscissas. The theoretical convergence rates, with respect to the level of the interpolant and the degrees of freedom are shown in §4.1. In §4.2 we provide a rigorous computational complexity analysis of the resulting fully discrete sparse grid SC approximation, with and without acceleration, used to demonstrate the effectiveness of our proposed methodology in reducing the total number of iterations of a conjugate gradient solution of the finite element systems at each collocation point. Finally, in §5 we provide several numerical examples, including both moderately largedimensional linear and nonlinear parametrized PDEs, which are used to illustrate the theoretical results and the improved efficiency of this technique compared with several others.
2 Problem setting
Let , be a bounded domain and let denote a complete probability space with sample space , algebra , and probability measure . Define as a differential operator that depends on a coefficient with and . Analogously, the forcing term can be assumed to be a random field as well. In general, and belong to different probability spaces but, for economy of notation, we simply denote the stochastic dependences in the same probability space. Consider the stochastic boundary value problem. Find a random function such that, a.e. in , the following equations hold:
(1) 
where is a suitable boundary condition. We denote by a Banach space and assume the underlying random input data are chosen so that the corresponding stochastic system (1) is wellposed and has a unique solution , the function space given by
In this setting, the approximation space consists of Banachspace valued functions that have finite th order moments. Two example problems posed in this setting are given as follows. {example} (Linear elliptic problem). Find a random field such that a.e.
(2) 
where denotes the gradient operator with respect to the spatial variable . The wellposedness of (2) is guaranteed in with uniformly elliptic, i.e.,
(3) 
and square integrable, i.e.,
(Nonlinear elliptic problem). For , find a random field such that a.e.
(4) 
The wellposedness of (4) is guaranteed in with as in Example 2 and [31].
In many applications, the source of randomness can be approximated with only a finite number of uncorrelated, or even independent, random variables. For instance, the random input data and in (1) may have a piecewise representation, or in other applications may have spatial variation that can be modeled as a correlated random field, making them amenable to approximation by a KarhunenLoève (KL) expansion [26]. In practice, one has to truncate such expansions according to the desired accuracy of the simulation. As such, we make the following assumption regarding the random input data and (cf [23, 31]).
(Independence and finite dimensional noise). The random fields and have the form:
where is a vector of independent and uncorrelated realvalued random variables.
We note that Assumption 2 and the DoobDynkin lemma [32] guarantee that and are Borelmeasurable functions of the random vector . In our setting, we denote by the image of the random variable , and set , where . If the distribution measure of is absolutely continuous with respect to Lebesgue measure, then there exists a joint probability density function of denoted by \sϱ(y): Γ→R_+, with ϱ(y) = ∏_n=1^N ϱ_n(y_n) ∈L^∞(Γ). \fTherefore, based on Assumption 2, the probability space is mapped to , where is the Borel algebra on and is a probability measure on . By assuming the solution of (1) is measurable with respect to and , the DoobDynkin lemma guarantees that can also be characterized by the same random vector , i.e.,
where is defined by
Note that the above integral will be replaced by the essential supremum when :
2.1 Weak formulation
In what follows, we treat the solution to (1) as a parameterized function of the dimensional random variables . This leads to a Galerkin weak formulation [23] of the PDE in (1), with respect to both physical and parameter space, i.e., seek such that
where are linear operators independent of , while the operators are linear for , and nonlinear for . Moreover, since the solution can be viewed as a mapping , for convenience we may omit the dependence on and write to emphasize the dependence of on . As such, we may also write the problem (1) in the alternative weak form
(5) 
Therefore, the stochastic boundaryvalue problem (1) has been converted into a deterministic parametric problem (5). The acceleration technique proposed in §3 and the sparsegrid SC method discussed in §4 will be based on the solution of the weak form (5) above.
3 Accelerating stochastic collocation methods
Our acceleration scheme will be proposed in the context of both linear and nonlinear elliptic PDEs. A general SC approach requires the semidiscrete solution at a set of collocation points , given by
(6) 
Here is a predefined finite element basis of , and for , the coefficient vector is the solution of the following system of equations:
with and defined as above. Note that (3) is equivalent to (5) with the nonlinear operators subtracted on the right hand side. When , the PDE is linear, and a standard FEM discretization leads to a linear system of equations.
For , we denote by an interpolation operator that utilizes collocation points, defined by . More generally, assume that we have a family of interpolation operators , which for each approximates the solution in polynomial spaces \sP_1(Γ) ⊂…⊂P_L(Γ)⊂P_L+1(Γ)⊂…⊂L^2_ϱ(Γ), \fof increasing fidelity, defined on sets of sample points . Assume further that the fully discrete solution has Lagrange interpolating form
(8) 
where is a basis for . The approximation (8) can be constructed by solving for independently at each sample point . In §4, we construct a specific example of an interpolation scheme satisfying (8), namely global sparse grid collocation.
For each , the bulk of the computational cost in using (8) goes into solving the systems of equations (3) corresponding to each collocation point . Since the systems are independent and deterministic, they can be solved separately using existing FEM solvers, providing a straightforward path to parallelization compared to intrusive methods such as stochastic Galerkin methods. In this work, we consider iterative solvers for the system in (3), and propose an acceleration scheme to reduce the total number of iterations necessary to the collection of systems over the set of sample parameters.
Denoting by the output of the selected iterative solver for the system (3), for the semidiscrete solution is approximated by
where we define , and therefore the final SC approximation is given by a perturbation of (8), i.e.,
(9) 
We observe that the performance of the underlying iterative solver can be improved by proposing a good initial guess, denoted , or constructing an effective preconditioner to reduce the condition number of the system. Here, we propose our approach for improving initial deterministic approximations, remarking that the same idea can be also utilized to construct preconditioners. To start the iterative solver for the system in (3), it is common to use a zero initial guess, i.e., . However, we can predict the solution at level using lower level approximations to construct improved initial solutions . Assume that we first obtain by collocating solutions to (3) over . Then at level , for each new point , the initial guess can be given by interpolating the solutions from level , i.e.,
(10) 
For a convergent interpolation scheme, we expect the necessary number of iterations to compute to become smaller as the level increases to an overall maximum level, denoted . As such, the construction of the desired solution is accelerated through the intermediate solutions . Note that this approach reduces computational cost by improving initial guesses, but does not depend on the specific solver used. Thus, our scheme may be combined with other techniques for accelerating convergence, such as faster nonlinear solvers or better preconditioners. When the underlying PDE is nonlinear with respect to , iterative solvers are commonly used for the solution of (3). In Algorithm 1, we outline the acceleration procedure described above, using a general nonlinear iterative method for the solution of (3).
Algorithm 1: The accelerated SC algorithm 
Goal: Compute 
1:Define and
2:for do
3: for do
4: Compute the initial guess according to (10):
5:
6: Initialize:
7: repeat
8: Compute residual :
9: for do
10:
11: end for
12: Update the solution:
13:
14: until
15:
16: end for
17:end for

The efficiency of the proposed algorithm will depend crucially on the number of times the iterative solver is utilized, i.e., how many sample points are in the set for each level . In fact, if the sample points are not nested, it could be the case that , and the algorithm may be very inefficient. Hence, in the following sections we will assume: {assumption} Assume that the multidimensional point sets are nested, i.e.,
Then , and we can construct the intermediate solutions using a subset of the information needed to approximate .
In §4 we construct an interpolant using a point set which satisfies Assumption 3. Next, we give several examples of Algorithm 1, using iterative solvers for both nonlinear and linear elliptic PDEs. {example} Consider the weak form of the nonlinear elliptic PDE in Example 2, letting , , , and (note that , ). When using the fixed point iterative method in Algorithm 1, for the update step we define
where the matrix is defined by
(10) 
With , this update is equivalent to solving the following linear system
to update to at the th iteration. Note that each iteration of the solver in Algorithm 1 requires the solution of this linear system, which is not accelerated by our algorithm.
As a special case of the example above, consider the weak form of the linear elliptic problem in Example 2 with , , and in (3). Due to the linearity, at each collocation point the solution can be approximated by solving the following linear system
(11) 
with as in (10), and for . Under our assumptions on the coefficient , the linear system (11) is symmetric positive definite, and we can use the CG method [37] to find its solution. For , by recursively defining
we get the update function
Recall the following wellknown error estimate for CG:
(12) 
where denotes the condition number of , is the vector of initial guess and is the output of the th iteration of the CG solver. As opposed to Example 3, for this example Algorithm 1 accelerates the solution of the linear system (11).
To evaluate the efficiency of the accelerated SC method, we define cost metrics for the construction of standard and accelerated SC approximations. In general, the computational cost in floating point operations (flops) is the total number iterations to solve (3) summed over each of the sample points—denoted by and for the standard and accelerated SC methods, respectively—multiplied by the cost of performing one iteration, denoted . Let be the additional cost of interpolation incurred by using the accelerated initial vectors (10). Then, we define
(13) 
for the standard SC approach, and
(14) 
for the accelerated SC approximation, respectively.
In Example 3 the discretization of the linear PDE leads to sparse systems of equations of size . When solving these systems with a CG solver, and are the sum of solver iterations contributed from each sample system. In this case, the cost of one iteration is just the cost of one matrix vector product, i.e., , where depends on the domain and the type of finite element basis. {remark} (Relationship to multilevel methods). Multilevel methods reduce the complexity of stochastic sampling methods by balancing errors and computational cost across a sequence of stochastic and spatial approximations. Let , , be a sequence of semidiscrete approximations built in nested spaces, i.e., . Multilevel methods are based on the following identity:
Letting denote the chosen method of stochastic approximation, a general multilevel method might be written as
The main idea is that highly resolved, expensive stochastic approximations, e.g., , in combination with coarse deterministic approximations, that is, , and vice versa. In a similar way, collocation with nested grid points provides a natural multilevel hierarchy which we use in our method to accelerate each PDE solve (10). A combination of these methods could involve using our algorithm to accelerate the construction of the operators , as well as reusing information from level to level, thus improving further the performance of SC methods. {remark} (Interpolation costs). Note that many adaptive interpolation schemes already require evaluation of the intermediate interpolation operators as in (10), e.g., to compute residual error estimators. Thus, these methods will incur the interpolation cost even in the case of zero initial vectors. Furthermore, for most nonlinear problems the deterministic solver is expensive, thus reducing the number of iterations is the most important element in reducing the cost. In each of these settings, we can define the cost metrics to simply be and .
(Hierarchical preconditioner construction). When solving linear systems using iterative methods, convergence properties can be improved by considering the condition number of the system. As with initial vectors, an interpolation algorithm can be used to construct good, cheap preconditioners. We consider preconditioner algorithms where an explicit preconditioner matrix, or its inverse, is constructed. In this case, for some low collocation level , we construct a strong preconditioner, , for each individual iterative solver, . Then, these lower level preconditioners are interpolated for the subsequent levels. More specifically, for , and , we use the preconditioner
(15) 
Numerical illustrations of this approach are given in §5.
4 Applications to sparse grid stochastic collocation
In this section, we provide a specific example of an interpolation scheme satisfying the assumptions described in §3, i.e., a generalized sparse grid SC approach for a fixed level . In what follows, we briefly review the construction of sparse grid interpolants, and rigorously analyze the approximation errors and the complexities of both the standard and accelerated SC approaches, in order to demonstrate the improved efficiency of the proposed acceleration technique when applied to iterative linear solvers.
The fully discrete SC approximation is built by polynomial interpolation of the semidiscrete solution on an appropriate set of collocation points in . In our setting, such an interpolation scheme is based on a sparse tensor products of onedimensional Lagrange interpolating polynomials with global support. Specifically, in the onedimensional case, , we introduce a sequence of Lagrange interpolation operators , with the space of degree polynomials over . Given a general function , these operators are defined by
Here represents the resolution level of the operator, denotes the number of interpolation points on level , and for ,
are the global Lagrange polynomials of degree associated with the point set . To satisfy Assumption 3, we need nestedness of the onedimensional sets, i.e., , which is determined by the choice of interpolation points and the definition of . In addition, we remark that similar constructions for can be built based on wavelets [22] or other locally supported polynomial functions [23].
In the multidimensional case, i.e., , using the convention that , we introduce the difference operator
(16) 
and define the multiindex . The desired approximation is defined by a linear combination of tensorproduct operators (16) over a set of multiindices, determined by the condition , for , and a strictly increasing function. For , we now define the generalized SC operator by
(17)  
where is a multiindex with , , and represents the approximation level. This approximation lives in the tensor product polynomial space given by
where the multiindex set is defined as follows
Here , and is the left inverse of (see [2]).
Specific choices for the onedimensional growth rate and the function are needed to define the multiindex set and the corresponding polynomial space for the approximation. In this work, we construct the interpolant in (17) using the anisotropic Smolyak construction, i.e.,
(18) 
where is a vector of weights reflecting the anisotropy of the system, i.e., the relative importance of each dimension, with (see [30] for more details). Our analysis does not depend strongly on this choice of and , and we could use other functions, e.g., and define the anisotropic tensor product approximation.
When is a bounded domain in , a common choice is the ClenshawCurtis abcsissas [10] given by the sets of extrema of Chebyshev polynomials including the endpoint extrema. For a sample set of any size , the abscissas in the standard domain are given by
(19) 
By taking and letting grow according to the rule in (18), one gets a sequence of nested sets for . In addition, with defined as in (18), the resulting set of dimensional abscissas is a ClenshawCurtis sparse grid. Other nested families of sparse grids can be constructed from, e.g., the Leja points [13], GaussPatterson [42], etc. {remark} (Specific Choice of ). For the remainder of the paper, we will assume that the functions and are given as in (18), and use an underlying ClenshawCurtis sparse grid. For simplicity, we will also only consider isotropic collocation methods, i.e. . We then lighten the notation by defining .
Construction of the approximation requires evaluation of on a set of collocation points with cardinality . In our case, since the onedimensional point sets are nested, i.e., for , so that the multidimensional point set used by is given by
and the nested structure is preserved, i.e., , to satisfy assumption 3. Define the difference of the sets , and the number of new collocation points . With this nestedness condition, the approximation is a Lagrange interpolating polynomial [31], and thus (17) can be rewritten as a linear combination of Lagrange basis functions,
(20)  
where the index set is defined by
For a given and , this represents the subset of multiindices corresponding to the tensorproduct operators in (17) with the supporting point