Accelerating stochastic collocation methods for partial differential equations with random input data This material is based upon work supported in part by the U.S. Air Force of Scientific Research under grant number 1854-V521-12; by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under contract numbers ERKJ259, and ERKJE45; and by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725.

# Accelerating stochastic collocation methods for partial differential equations with random input data ††thanks: This material is based upon work supported in part by the U.S. Air Force of Scientific Research under grant number 1854-V521-12; by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under contract numbers ERKJ259, and ERKJE45; and by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725.

D. Galindo Joint Institute for Computational Sciences, University of Tennessee, 1 Bethel Valley Road, Oak Ridge, TN 37831 (dgalind1@utk.edu)    P. Jantsch Department of Mathematics, University of Tennessee, Knoxville, TN 37996 (pjantsch@vols.utk.edu)    C. G. Webster Department of Computational and Applied Mathematics, Oak Ridge National Laboratory, Oak Ridge, TN 37831 (webstercg@ornl.gov).    G. Zhang Department of Computational and Applied Mathematics, Oak Ridge National Laboratory, Oak Ridge, TN 37831 (zhangg@ornl.gov).
###### Abstract

This work proposes and analyzes a generalized acceleration technique for decreasing the computational complexity of using stochastic collocation (SC) methods to solve partial differential equations (PDEs) with random input data. The SC approaches considered in this effort consist of sequentially constructed multi-dimensional Lagrange interpolant in the random parametric domain, formulated by collocating on a set of points so that the resulting approximation is defined in a hierarchical sequence of polynomial spaces of increasing fidelity. Our acceleration approach exploits the construction of the SC interpolant to accelerate the underlying ensemble of deterministic solutions. Specifically, we predict the solution of the parametrized PDE at each collocation point on the current level of the SC approximation by evaluating each sample with a previously assembled lower fidelity interpolant, and then use such predictions to provide deterministic (linear or nonlinear) iterative solvers with improved initial approximations. As a concrete example, we develop our approach in the context of SC approaches that employ sparse tensor products of globally defined Lagrange polynomials on nested one-dimensional Clenshaw-Curtis abscissas. This work also provides a rigorous computational complexity analysis of the resulting fully discrete sparse grid SC approximation, with and without acceleration, which demonstrates the effectiveness of our proposed methodology in reducing the total number of iterations of a conjugate gradient solution of the finite element systems at each collocation point. Numerical examples include both linear and nonlinear parametrized PDEs, which are used to illustrate the theoretical results and the improved efficiency of this technique compared with several others.

s

tochastic and parametric PDEs, stochastic collocation, high-dimensional approximation, uncertainty quantification, sparse grids, multivariate polynomial approximation, iterative solvers, conjugate gradient method

{AMS}

65N30, 65N35, 65N12, 65N15, 65C20

## 1 Introduction

Modern approaches for predicting the behavior of physical and engineering problems, and assessing risk and informing decision making in manufacturing, economic forecasting, public policy, and human welfare, rely on mathematical modeling followed by computer simulation. Such predictions are obtained by constructing models whose solutions describe the phenomenon of interest, and then using computational methods to approximate the outputs of the models. Thus, the solution of a mathematical model can be viewed as a mapping from available input information onto a desired output of interest; predictions obtained through computational simulations are merely approximations of the images of the inputs, that is, of the output of interest. There are several causes for possible discrepancies between observations and approximate solutions obtained via computer simulations. The mathematical model may not, and usually does not, provide a totally faithful description of the phenomenon being modeled. Additionally, when an application is considered, the mathematical models need to be provided with input data, such as coefficients, forcing terms, initial and boundary conditions, geometry, etc. This input data may be affected by a large amount of uncertainty due to intrinsic variability or the difficulty in accurately characterizing the physical system.

Such uncertainties can be included in the mathematical model by adopting a probabilistic setting, provided enough information is available for a complete statistical characterization of the physical system. In this effort we assume our mathematical model is described by a partial differential equation (PDE) and the random input data are modeled as finite dimensional random fields, parameterized by a vector of dimension , consisting of uncorrelated real-valued random variables. Therefore, the goal of the mathematical and computational analysis becomes the approximation of the solution map , or statistical moments (mean, variance, covariance, etc.) of the solution or some quantity of interest (QoI) of the system, given the probability distribution of the input random data. A major challenge associated with developing approximation techniques for such problems involves alleviating the curse of dimensionality, by which the computational complexity of any naïve polynomial approach will grow exponentially with the dimension of the parametric domain.

Monte Carlo (MC) methods (see, e.g., [17]) are the most popular approaches for approximating high-dimensional integrals, based on independent realizations , , of the parameterized PDE; approximations of the expectation or other QoIs are obtained by averaging over the corresponding realizations of that quantity. The resulting numerical error is proportional to , thus achieving convergence rates independent of dimension , but requiring a very large number of samples to achieve reasonably small errors. Other ensemble-based methods, including quasi-MC (QMC) and important sampling (see [29, 24, 39] and the references therein), have been devised to produce increase convergence rates, e.g., proportional to , however, the function increases with dimension . Moreover, since both MC and QMC are quadrature techniques for QoIs, neither have the ability to simultaneously approximate the solution map , required by a large class of applications.

In the last decade, two global polynomial approaches have been proposed that often feature much faster convergence rates: intrusive stochastic Galerkin (SG) methods, constructed from pre-defined orthogonal polynomials [19, 44], or best -term and quasi-optimal approaches [9, 12, 14, 6], and non-intrusive stochastic collocation (SC) methods, constructed from (sparse) Lagrange interpolating polynomials [1, 31, 30], or discrete projections [27, 28]. These methods exploit the underlying regularity of the PDE solution map with respect to the parameters , evident in a wide class of high-dimensional applications, to construct an approximate solution, and differ only in the choice of basis.

For both SG and SC approaches, the overall computational cost grows rapidly with increasing dimension. A recent development for alleviating such complexity and accelerating the convergence of parameterized PDE solutions is to utilize multilevel methods (see e.g., multilevel Monte Carlo (MLMC) methods [20, 11, 4, 40, 5] and the multilevel stochastic collocation (MLSC) approach [41]). The main ingredient to multilevel methods is the exploitation of a hierarchical sequence of spatial approximations to the underlying PDE, which are then combined with discretizations in parameter space in such a way as to minimize the overall computational cost. The approximation of the solution on the finest mesh is represented by the approximation on the coarsest mesh plus a sequence of “correction” terms. The resulting decrease in complexity with the use of multilevel methods results from the fact that the dominant behavior of the solution can be captured with cheap simulations on coarse meshes, so that the number of expensive simulations computed on fine meshes can be considerably reduced.

Nonetheless, the dominant cost in applying any uncertainty quantification (UQ) approach lies in the solution of the underlying parametrized linear/nonlinear PDEs, for a given value of the random inputs. Such solutions are often computed using iterative solvers, e.g., conjugate gradient (CG) methods for symmetric positive-definite linear systems, generalized minimal residual method (GMRES) for non-symmetric linear systems [37], and fixed-point iteration methods[36] for nonlinear PDEs. However, many high-fidelity, multi-physics models can exhaust the resources of the largest machines with a single instantiation and, as such, are not practical for even the most advanced UQ techniques. As such, several methods for improving the performance of iterative solvers have been proposed; especially preconditioner and subspace methods for iterative Krylov solvers. A strategy that utilizes shared search directions for solving a collection of linear systems based on the CG method is proposed in [8]. In [33] a technique called Krylov recycling was introduced to solve sets of linear systems sequentially, based on ideas adapted from restarted and truncated GMRES (see [38] and the references therein). This approach was later applied to the linear systems that arise from SG approximations that use the so-called doubly orthogonal bases to solve stochastic paramterized PDEs [25] . In addition, several preconditioners have been developed that improve the performance of solving the large linear systems resulting from SG approximations that employ standard orthogonal polynomials [18, 35, 16, 21].

On the other hand, when a general linear solver is employed to solve the underling SG or SC approximation, it is straightforward to see that improved initial approximations can significantly reduce the number of iterations required to reach a prescribed accuracy. A sequential orthogonal expansion is utilized in [18, 34] such that a low resolution solution provides an initial guess for the solution of the system with an enriched basis. However, at each step, all the expansion coefficients must be explicitly recomputed, resulting in increased costs. Similarly, in [21] an extension of a mean-based preconditioner is applied to each linear system coming from a sequential SC approach, wherein the solution of the -th system is given as the initial vector for the -th system. This approach, as well as the Krylov recycling method, impose an ordering of the linear systems that appear in the SC approximation. Consequently, new approaches are needed to amortize the cost of expensive simulations by reusing both deterministic and stochastic information across multiple ensembles of solutions.

In this work, we propose to improve the computational efficiency of non-intrusive approximations, by focusing on SC approaches that sequentially construct a multi-dimensional Lagrange interpolant in a hierarchical sequence of polynomial spaces of increasing fidelity. As opposed to multilevel methods that reduce the overall computational burden by taking advantage of a hierarchical spatial approximation, our approach exploits the structure of the SC interpolant to accelerate the underlying ensemble of deterministic solutions. Specifically, we predict the solution of the parametrized PDE at each collocation point on the current level of the SC approximation by evaluating each sample with a previously assembled lower fidelity interpolant, and then use such predictions to provide deterministic (linear or nonlinear) iterative solvers with improved initial approximations. As a particular application, we pose this acceleration technique in the context of hierarchical SC methods that employ sparse tensor products of globally defined Lagrange polynomials [31, 30], on nested one-dimensional Clenshaw-Curtis abscissas. However, the same idea can be extended to other non-intrusive collocation approaches including orthogonal polynomials [44], as well as piecewise local and wavelet polynomials expansions [7, 22].

The sparse grid SC approximation considered in this work produces a sequence of interpolants, where a new set of collocation points is added on each level in order to increase the accuracy of the interpolant. For each newly added collocation point on the current level, we predict the solution of the underlying deterministic PDE using the most up to date sparse grid interpolant available; the previous level’s interpolant. We then use the prediction as the starting point of the iterative solver. The uniform convergence of the sparse grid interpolant to the true solution results in an increasingly accurate initial guess as the level increases, so that the overall complexity of the SC method can be dramatically reduced. We apply our novel approach in the context of solving both linear and nonlinear stochastic PDEs, wherein, we assume that the parameterized systems are solved by some existing linear or nonlinear iterative method. Furthermore, in the linear case, this technique can also be used to efficiently generate improved preconditioners for linear systems associated to the collocation points on higher levels, which further accelerates the convergence rate of the underlying solver.

The outline of this paper is as follows: We begin by describing the class of parameterized linear and nonlinear stochastic PDEs under consideration in §2. In §3 we describe our acceleration technique in the context of general stochastic collocation methods, defined on a hierarchical sequence of polynomial spaces, for approximating both linear and nonlinear stochastic elliptic PDEs using nonlinear iterative solvers. In §4 we briefly recall the sparse grid SC method, where the sparse grid interpolant is constructed with the use of nested one-dimensional Clenshaw-Curtis abscissas. The theoretical convergence rates, with respect to the level of the interpolant and the degrees of freedom are shown in §4.1. In §4.2 we provide a rigorous computational complexity analysis of the resulting fully discrete sparse grid SC approximation, with and without acceleration, used to demonstrate the effectiveness of our proposed methodology in reducing the total number of iterations of a conjugate gradient solution of the finite element systems at each collocation point. Finally, in §5 we provide several numerical examples, including both moderately large-dimensional linear and nonlinear parametrized PDEs, which are used to illustrate the theoretical results and the improved efficiency of this technique compared with several others.

## 2 Problem setting

Let , be a bounded domain and let denote a complete probability space with sample space , -algebra , and probability measure . Define as a differential operator that depends on a coefficient with and . Analogously, the forcing term can be assumed to be a random field as well. In general, and belong to different probability spaces but, for economy of notation, we simply denote the stochastic dependences in the same probability space. Consider the stochastic boundary value problem. Find a random function such that, -a.e. in , the following equations hold:

 {L(a)(u)=f % in D,u=g on ∂D, (1)

where is a suitable boundary condition. We denote by a Banach space and assume the underlying random input data are chosen so that the corresponding stochastic system (1) is well-posed and has a unique solution , the function space given by

 LqP(Ω;W(D)):={ u:¯¯¯¯¯D×Ω→R∣∣u%isstronglymeasurableand∫Ω∥u∥qW(D)dP(ω)<+∞}.

In this setting, the approximation space consists of Banach-space valued functions that have finite -th order moments. Two example problems posed in this setting are given as follows. {example} (Linear elliptic problem). Find a random field such that -a.e.

 {−∇⋅(a(x,ω)∇u(x,ω))=f(x,ω) in D×Ω,u(x,ω)=0 on ∂D×Ω, (2)

where denotes the gradient operator with respect to the spatial variable . The well-posedness of (2) is guaranteed in with uniformly elliptic, i.e.,

 P(ω∈Ω:amin≤a(x,ω)≤amax ∀x∈¯¯¯¯¯D)=1withamin,amax∈(0,∞), (3)

and square integrable, i.e.,

 ∫DE[f2]dx:=∫D∫Ωf2(x,ω)dP(ω)dx<+∞.
{example}

(Nonlinear elliptic problem). For , find a random field such that -a.e.

 {−∇⋅(a(x,ω)∇u(x,ω))+u(x,ω)|u(x,ω)|k=f(x,ω) in D,u(x,ω)=0 on ∂D. (4)

The well-posedness of (4) is guaranteed in with as in Example 2 and [31].

In many applications, the source of randomness can be approximated with only a finite number of uncorrelated, or even independent, random variables. For instance, the random input data and in (1) may have a piecewise representation, or in other applications may have spatial variation that can be modeled as a correlated random field, making them amenable to approximation by a Karhunen-Loève (KL) expansion [26]. In practice, one has to truncate such expansions according to the desired accuracy of the simulation. As such, we make the following assumption regarding the random input data and (cf [23, 31]).

{assumption}

(Independence and finite dimensional noise). The random fields and have the form:

 a(x,ω)=a(x,y(ω)) and f(x,ω)=f(x,y(ω)) on D×Ω,

where is a vector of independent and uncorrelated real-valued random variables.

We note that Assumption 2 and the Doob-Dynkin lemma [32] guarantee that and are Borel-measurable functions of the random vector . In our setting, we denote by the image of the random variable , and set , where . If the distribution measure of is absolutely continuous with respect to Lebesgue measure, then there exists a joint probability density function of denoted by \sϱ(y): Γ→R_+,  with ϱ(y) = ∏_n=1^N ϱ_n(y_n) ∈L^∞(Γ). \fTherefore, based on Assumption 2, the probability space is mapped to , where is the Borel -algebra on and is a probability measure on . By assuming the solution of (1) is -measurable with respect to and , the Doob-Dynkin lemma guarantees that can also be characterized by the same random vector , i.e.,

 u(x,ω)=u(x,y1(ω),…,yN(ω))∈Lqϱ(Γ;W(D)),

where is defined by

 Lqϱ(Γ;W(D))={u :¯¯¯¯¯D×Γ→\R∣∣u strongly% measurable and ∫Γ\normuqW(D)ϱ(y)dy<∞}.

Note that the above integral will be replaced by the essential supremum when :

 L∞(Γ;W(D))={u :¯¯¯¯¯D×Γ→\R∣∣u strongly% measurable and esssupy\normu(y)W(D)<∞}.

### 2.1 Weak formulation

In what follows, we treat the solution to (1) as a parameterized function of the -dimensional random variables . This leads to a Galerkin weak formulation [23] of the PDE in (1), with respect to both physical and parameter space, i.e., seek such that

 ∫Γ∫D⎛⎝∑ν∈Λ1∪Λ2Sν(u;y)Tν(v)⎞⎠ϱdxdy=∫Γ∫Dfvϱdxdy,∀v∈Lqϱ(Γ;W(D)),

where are linear operators independent of , while the operators are linear for , and nonlinear for . Moreover, since the solution can be viewed as a mapping , for convenience we may omit the dependence on and write to emphasize the dependence of on . As such, we may also write the problem (1) in the alternative weak form

 ∫D⎛⎝∑ν∈Λ1∪Λ2Sν(u(y);y)Tν(v)⎞⎠dx=∫Df(y)vdx,∀v∈W(D),ϱ-a.e. in Γ. (5)

Therefore, the stochastic boundary-value problem (1) has been converted into a deterministic parametric problem (5). The acceleration technique proposed in §3 and the sparse-grid SC method discussed in §4 will be based on the solution of the weak form (5) above.

## 3 Accelerating stochastic collocation methods

Our acceleration scheme will be proposed in the context of both linear and nonlinear elliptic PDEs. A general SC approach requires the semi-discrete solution at a set of collocation points , given by

 uh(x,yL,j)=Mh∑i=1cL,j,iφi(x),j=1,…,ML. (6)

Here is a predefined finite element basis of , and for , the coefficient vector is the solution of the following system of equations:

 Mh∑i=1cL,j,i∫D∑ν∈Λ1Sν(φi;yL,j)Tν(φi′)dx =∫Df(yL,j)φi′−∑ν∈Λ2Sν(Mh∑i=1cL,j,iφi;yL,j)Tν(φi′)dx,i′=1,…,Mh,

with and defined as above. Note that (3) is equivalent to (5) with the nonlinear operators subtracted on the right hand side. When , the PDE is linear, and a standard FEM discretization leads to a linear system of equations.

For , we denote by an interpolation operator that utilizes collocation points, defined by . More generally, assume that we have a family of interpolation operators , which for each approximates the solution in polynomial spaces \sP_1(Γ) ⊂…⊂P_L(Γ)⊂P_L+1(Γ)⊂…⊂L^2_ϱ(Γ), \fof increasing fidelity, defined on sets of sample points . Assume further that the fully discrete solution has Lagrange interpolating form

 uh,L(x,y):=IL[uh](x,y)=ML∑j=1(Mh∑i=1cL,j,iφi(x))ΨL,j(y), (8)

where is a basis for . The approximation (8) can be constructed by solving for independently at each sample point . In §4, we construct a specific example of an interpolation scheme satisfying (8), namely global sparse grid collocation.

For each , the bulk of the computational cost in using (8) goes into solving the systems of equations (3) corresponding to each collocation point . Since the systems are independent and deterministic, they can be solved separately using existing FEM solvers, providing a straightforward path to parallelization compared to intrusive methods such as stochastic Galerkin methods. In this work, we consider iterative solvers for the system in (3), and propose an acceleration scheme to reduce the total number of iterations necessary to the collection of systems over the set of sample parameters.

Denoting by the output of the selected iterative solver for the system (3), for the semi-discrete solution is approximated by

 uh(x,yL,j)=Mh∑i=1cL,j,iφi(x)≈˜uh(x,yL,j)=Mh∑i=1˜cL,j,iφi(x),

where we define , and therefore the final SC approximation is given by a perturbation of (8), i.e.,

 ˜uh,L(x,y):=ML∑j=1(Mh∑i=1˜cL,j,iφi(x))ΨL,j(y). (9)

We observe that the performance of the underlying iterative solver can be improved by proposing a good initial guess, denoted , or constructing an effective preconditioner to reduce the condition number of the system. Here, we propose our approach for improving initial deterministic approximations, remarking that the same idea can be also utilized to construct preconditioners. To start the iterative solver for the system in (3), it is common to use a zero initial guess, i.e., . However, we can predict the solution at level using lower level approximations to construct improved initial solutions . Assume that we first obtain by collocating solutions to (3) over . Then at level , for each new point , the initial guess can be given by interpolating the solutions from level , i.e.,

 c(0)L,j=(˜uh,L−1(x1,yL,j),…,˜uh,L−1(xMh,yL,j))⊤=ML−1∑j′=1˜cL−1,j′ΨL−1,j′(yL,j). (10)

For a convergent interpolation scheme, we expect the necessary number of iterations to compute to become smaller as the level increases to an overall maximum level, denoted . As such, the construction of the desired solution is accelerated through the intermediate solutions . Note that this approach reduces computational cost by improving initial guesses, but does not depend on the specific solver used. Thus, our scheme may be combined with other techniques for accelerating convergence, such as faster nonlinear solvers or better preconditioners. When the underlying PDE is nonlinear with respect to , iterative solvers are commonly used for the solution of (3). In Algorithm 1, we outline the acceleration procedure described above, using a general nonlinear iterative method for the solution of (3).

The efficiency of the proposed algorithm will depend crucially on the number of times the iterative solver is utilized, i.e., how many sample points are in the set for each level . In fact, if the sample points are not nested, it could be the case that , and the algorithm may be very inefficient. Hence, in the following sections we will assume: {assumption} Assume that the multidimensional point sets are nested, i.e.,

 H1⊂H2⊂…⊂HLmax⊂Γ.

Then , and we can construct the intermediate solutions using a subset of the information needed to approximate .

In §4 we construct an interpolant using a point set which satisfies Assumption 3. Next, we give several examples of Algorithm 1, using iterative solvers for both nonlinear and linear elliptic PDEs. {example} Consider the weak form of the nonlinear elliptic PDE in Example 2, letting , , , and (note that , ). When using the fixed point iterative method in Algorithm 1, for the update step we define

 S(r(1)L,j,…,r(k)L,j)=A−1L,jr(k)L,j,

where the matrix is defined by

 [AL,j]i,i′=∫Da(yL,j)∇φi′∇φidx, for i,i′=1,…,Mh. (10)

With , this update is equivalent to solving the following linear system

 ∫Da(yL,j)∇u(k+1)h,L∇vdx=∫D[f(yL,j)−u(k)h,L(yL,j)|u(k)h,L(yL,j)|s]vdx∀v∈Wh(D),

to update to at the -th iteration. Note that each iteration of the solver in Algorithm 1 requires the solution of this linear system, which is not accelerated by our algorithm.

{example}

As a special case of the example above, consider the weak form of the linear elliptic problem in Example 2 with , , and in (3). Due to the linearity, at each collocation point the solution can be approximated by solving the following linear system

 AL,jcL,j=fL,j, (11)

with as in (10), and for . Under our assumptions on the coefficient , the linear system (11) is symmetric positive definite, and we can use the CG method [37] to find its solution. For , by recursively defining

 p(k)L,j=r(k)L,j−∑k′

we get the update function

 S(r(1)L,j,…,r(k)L,j)=p(k)⊤L,jr(k)L,jp(k)⊤L,jAL,jp(k)L,jp(k)L,j.

Recall the following well-known error estimate for CG:

 ∥∥cL,j−c(k)L,j∥∥AL,j ≤2(√κL,j−1√κL,j+1)k∥∥cL,j−c(0)L,j∥∥AL,j, (12)

where denotes the condition number of , is the vector of initial guess and is the output of the -th iteration of the CG solver. As opposed to Example 3, for this example Algorithm 1 accelerates the solution of the linear system (11).

To evaluate the efficiency of the accelerated SC method, we define cost metrics for the construction of standard and accelerated SC approximations. In general, the computational cost in floating point operations (flops) is the total number iterations to solve (3) summed over each of the sample points—denoted by and for the standard and accelerated SC methods, respectively—multiplied by the cost of performing one iteration, denoted . Let be the additional cost of interpolation incurred by using the accelerated initial vectors (10). Then, we define

 Czero=CiterKzero, (13)

for the standard SC approach, and

 Cacc=CiterKacc+Cint, (14)

for the accelerated SC approximation, respectively.

In Example 3 the discretization of the linear PDE leads to sparse systems of equations of size . When solving these systems with a CG solver, and are the sum of solver iterations contributed from each sample system. In this case, the cost of one iteration is just the cost of one matrix vector product, i.e., , where depends on the domain and the type of finite element basis. {remark} (Relationship to multilevel methods). Multilevel methods reduce the complexity of stochastic sampling methods by balancing errors and computational cost across a sequence of stochastic and spatial approximations. Let , , be a sequence of semi-discrete approximations built in nested spaces, i.e., . Multilevel methods are based on the following identity:

 uhK=K∑k=0(uhk−uhk−1).

Letting denote the chosen method of stochastic approximation, a general multilevel method might be written as

 u(ML)K=K∑k=0QLK−k[uhk−uhk−1].

The main idea is that highly resolved, expensive stochastic approximations, e.g., , in combination with coarse deterministic approximations, that is, , and vice versa. In a similar way, collocation with nested grid points provides a natural multilevel hierarchy which we use in our method to accelerate each PDE solve (10). A combination of these methods could involve using our algorithm to accelerate the construction of the operators , as well as reusing information from level to level, thus improving further the performance of SC methods. {remark} (Interpolation costs). Note that many adaptive interpolation schemes already require evaluation of the intermediate interpolation operators as in (10), e.g., to compute residual error estimators. Thus, these methods will incur the interpolation cost even in the case of zero initial vectors. Furthermore, for most nonlinear problems the deterministic solver is expensive, thus reducing the number of iterations is the most important element in reducing the cost. In each of these settings, we can define the cost metrics to simply be and .

{remark}

(Hierarchical preconditioner construction). When solving linear systems using iterative methods, convergence properties can be improved by considering the condition number of the system. As with initial vectors, an interpolation algorithm can be used to construct good, cheap preconditioners. We consider preconditioner algorithms where an explicit preconditioner matrix, or its inverse, is constructed. In this case, for some low collocation level , we construct a strong preconditioner, , for each individual iterative solver, . Then, these lower level preconditioners are interpolated for the subsequent levels. More specifically, for , and , we use the preconditioner

 ˜PL,j:=˜P(yL,j)=MLPC∑j′=1PLPC,j′ ΨLPC,j′(yL,j). (15)

Numerical illustrations of this approach are given in §5.

## 4 Applications to sparse grid stochastic collocation

In this section, we provide a specific example of an interpolation scheme satisfying the assumptions described in §3, i.e., a generalized sparse grid SC approach for a fixed level . In what follows, we briefly review the construction of sparse grid interpolants, and rigorously analyze the approximation errors and the complexities of both the standard and accelerated SC approaches, in order to demonstrate the improved efficiency of the proposed acceleration technique when applied to iterative linear solvers.

The fully discrete SC approximation is built by polynomial interpolation of the semi-discrete solution on an appropriate set of collocation points in . In our setting, such an interpolation scheme is based on a sparse tensor products of one-dimensional Lagrange interpolating polynomials with global support. Specifically, in the one-dimensional case, , we introduce a sequence of Lagrange interpolation operators , with the space of degree polynomials over . Given a general function , these operators are defined by

 Um(l)[v](y)=m(l)∑j=1v(ylj)ψlj(y).

Here represents the resolution level of the operator, denotes the number of interpolation points on level , and for ,

 ψlj(y)=m(l)∏i=1i≠jy−yliylj−yli% for j=1,…,m(l),

are the global Lagrange polynomials of degree associated with the point set . To satisfy Assumption 3, we need nestedness of the one-dimensional sets, i.e., , which is determined by the choice of interpolation points and the definition of . In addition, we remark that similar constructions for can be built based on wavelets [22] or other locally supported polynomial functions [23].

In the multi-dimensional case, i.e., , using the convention that , we introduce the difference operator

 Δm(l1)⊗⋯⊗Δm(lN)=N⨂n=1(Um(ln)−Um(ln−1)), (16)

and define the multi-index . The desired approximation is defined by a linear combination of tensor-product operators (16) over a set of multi-indices, determined by the condition , for , and a strictly increasing function. For , we now define the generalized SC operator by

 Im,gL[v](y) =∑g(l)≤L(Δm(l1)⊗⋯⊗Δm(lN))[v](y) (17) =∑g(l)≤L∑i∈{0,1}N(−1)|i|(Um(l1−i1)⊗⋯⊗Um(lN−iN))[v](y),

where is a multi-index with , , and represents the approximation level. This approximation lives in the tensor product polynomial space given by

 PΛm,gL=span{N∏n=1ylnn ∣∣∣ l∈Λm,gL},

where the multi-index set is defined as follows

 Λm,gL={l∈NN ∣∣∣ g(m†(l+1))≤L}.

Here , and is the left inverse of (see [2]).

Specific choices for the one-dimensional growth rate and the function are needed to define the multi-index set and the corresponding polynomial space for the approximation. In this work, we construct the interpolant in (17) using the anisotropic Smolyak construction, i.e.,

 m(1)=1,m(l)=2l−1+1 for l>1 and g(l)=N∑n=1αnαmin(ln−1), (18)

where is a vector of weights reflecting the anisotropy of the system, i.e., the relative importance of each dimension, with (see [30] for more details). Our analysis does not depend strongly on this choice of and , and we could use other functions, e.g., and define the anisotropic tensor product approximation.

When is a bounded domain in , a common choice is the Clenshaw-Curtis abcsissas [10] given by the sets of extrema of Chebyshev polynomials including the end-point extrema. For a sample set of any size , the abscissas in the standard domain are given by

 ϑl={ylj∈[−1,1]∣∣∣ylj=−cos(π(j−1)m(l)−1) for j=1,…,m(l)}. (19)

By taking and letting grow according to the rule in (18), one gets a sequence of nested sets for . In addition, with defined as in (18), the resulting set of -dimensional abscissas is a Clenshaw-Curtis sparse grid. Other nested families of sparse grids can be constructed from, e.g., the Leja points [13], Gauss-Patterson [42], etc. {remark} (Specific Choice of ). For the remainder of the paper, we will assume that the functions and are given as in (18), and use an underlying Clenshaw-Curtis sparse grid. For simplicity, we will also only consider isotropic collocation methods, i.e. . We then lighten the notation by defining .

Construction of the approximation requires evaluation of on a set of collocation points with cardinality . In our case, since the one-dimensional point sets are nested, i.e.,  for , so that the multi-dimensional point set used by is given by

 \mcHL=⋃g(l)=L(ϑl1⊗⋯⊗ϑlN),

and the nested structure is preserved, i.e., , to satisfy assumption 3. Define the difference of the sets , and the number of new collocation points . With this nestedness condition, the approximation is a Lagrange interpolating polynomial [31], and thus (17) can be rewritten as a linear combination of Lagrange basis functions,

 \mcIL[v](y) =ML∑j=1v(yL,j)ΨL,j(y) (20) =ML∑j=1v(yL,j)∑l∈J(L,j)∑i∈{0,1}N(−1)|i|N∏n=1ψln−inkn(j)(yn)ΨL,j(y),

where the index set is defined by

For a given and , this represents the subset of multi-indices corresponding to the tensor-product operators in (17) with the supporting point