Convergence of Sparse Collocation for Functions of Countably Many Gaussian Random Variables (with Application to Elliptic PDEs)

Convergence of Sparse Collocation for Functions of Countably Many Gaussian Random Variables (with Application to Elliptic PDEs)

Oliver G. Ernst, Björn Sprungk and Lorenzo Tamellini
Department of Mathematics, TU Chemnitz, Germany
Istituto di Matematica Applicata e Tecnologie Informatiche “E. Magenes” del CNR, Pavia, Italy
Abstract

We give a convergence proof for the approximation by sparse collocation of Hilbert-space-valued functions depending on countably many Gaussian random variables. Such functions appear as solutions of elliptic PDEs with lognormal diffusion coefficients. We outline a general -convergence theory based on previous work by Bachmayr et al. [4] and Chen [9] and establish an algebraic convergence rate for sufficiently smooth functions assuming a mild growth bound for the univariate hierarchical surpluses of the interpolation scheme applied to Hermite polynomials. We verify specifically for Gauss-Hermite nodes that this assumption holds and also show algebraic convergence w.r.t. the resulting number of sparse grid points for this case. Numerical experiments illustrate the dimension-independent convergence rate.

Keywords: random PDEs, parameteric PDEs, lognormal diffusion coefficient, best--term approximation, sparse grids, stochastic collocation, high-dimensional approximation, high-dimensional interpolation, Gauss-Hermite points.

Mathematics Subject Classification: 65D05, 65D15, 65C30, 60H25.

1 Introduction

The elliptic diffusion problem

(1)

with a random diffusion coefficient with respect to an underlying probability space has become the standard model problem for numerical methods for solving random PDEs. For modeling reasons the diffusion field is often taken to have a lognormal probability law, which complicates both the study of the well-posedness of the problem [8, 24, 19, 31] as well as the analysis of approximation methods. One of the challenges is that the most common parametrization of a Gaussian random field – the Karhunen-Loève expansion [2, 22] – involves a countable number of standard normal random variables

(2)

where and i.i.d. for , leading to an elliptic PDE with a countably infinite number of random parameters .

Besides the stochastic Galerkin method [22, 29] the most common methods for approximating the solution of such random or parametric elliptic PDEs are polynomial collocation methods. Early works on such methods for random PDEs considered a finite (if large) number of random parameters, a setting also referred to as finite-dimensional noise [44, 3, 37, 36]. In this case the parametric representation of is typically obtained by truncating a series expansion of the random field such as (2).

The analysis of the problem involving an infinite number of random variables was first discussed by Cohen, DeVore and Schwab in [14, 15] in the simpler setting in which the diffusion field , rather than its logarithm as in (2), is expanded in a series. This results in an affine dependence of on the random variables , which are, moreover, assumed to have bounded support. In this framework the convergence of the best -term approximation of the solution of the diffusion equation by Taylor and Legendre series was shown to be independent of the number of random variables; this result was further refined in the recent paper [5]. Employing the theoretical concepts stated in [14, 15], Chkifa, Cohen and Schwab analyze in [11] collocation methods based on Lagrange interpolation with Leja points for problems with diffusion coefficients depending linearly on an infinite number of bounded random variables, which are adaptive in the polynomial degree as well as the number of active dimensions or random variables, respectively. The adaptive algorithm itself is related to the earlier work [21]. Each interpolatory approximation gives rise to a quadrature scheme, and in [39] Schillings and Schwab consider sparse adaptive quadrature schemes in the same setting of [11] in connection with approximating expectations with respect to posterior measures in Bayesian inference. Extensions to the case where the diffusion coefficient depends non-linearly on an infinite number of random variables with bounded support was discussed in [12].

Returning to the original lognormal diffusion problem, i.e., with expanded as in (2) and depending on random variables with unbounded support, Hoang and Schwab [26] have obtained convergence results on best -term approximation by Hermite polynomials. These were recently extended by Bachmayr et al. [4] using a different analytical approach employing a weighted -summability of the coefficients of the Hermite expansion of the solution and their relation to partial derivatives. The theoretical tools provided in [4] enabled a convergence analysis for adaptive sparse quadrature [9] employing, e.g., Gauss-Hermite nodes for Banach space-valued functions of countably many Gaussian random variables.

In this paper we address the convergence of sparse polynomial collocation for functions of infinitely many Gaussian random variables, such as the solution to the lognormal diffusion problem (1). Specifically, we follow the approach of [4] and [9] to prove an algebraic convergence rate with respect to the number of grid points for sparse collocation based on Gauss-Hermite interpolation nodes in the case of countably many variables. In particular, the result applies to solutions of (1) where is a lognormal random field. In addition, we highlight the common ideas surrounding sparse collocation found in the works mentioned above. The convergence result in terms of the number of collocation points is obtained in two steps: we first link the error to the size of the multi-index set definining the sparse collocation and then derive a bound on the number of points in the associated sparse grid. This procedure has been followed also in all the above-mentioned work analyzing the convergence of sparse grid quadrature and collocation schemes. An alternative strategy which instead links the error directly to the number of collocation points by introducing the so-called “profits” of each component of the sparse grids, has been discussed in [34, 25], albeit only in the case of random variables with bounded support.

We remark that, besides the classical node families such as Gauss-Hermite and Genz-Keister [20] for quadrature and interpolation on with respect to a Gaussian measure, Jakeman and Narayan [32] have introduced weighted Leja points—a generalization of the classical Leja point construction (see e.g. [30, 17] and references therein) to unbounded domains and arbitrary weight functions. Moreover, they have proved that these node sets possess the correct asymptotic distribution of interpolation nodes and illustrate their computational potential in numerical experiments. Note that such weighted Leja points provide a nested and linearly growing sequence of interpolation nodes. The analysis of sparse collocation based on normal Leja points, i.e., weighted Leja points for a Gaussian measure, is an interesting topic for future research.

The remainder of the paper is organized as follows. In the next section we introduce the general setting and notation and construct the sparse grid collocation operator based on univariate Lagrange interpolation operators. Section 3 is devoted to the convergence analysis of such operators. First, we outline in Subsection 3.1 the general approaches to prove algebraic convergence rates as they can be found in the works mentioned above. Later, we follow in Subsection 3.2 the approach of [4, 9] and derive sufficient conditions for the underlying univariate interpolation nodes in order to obtain such rates when approximating “countably-variate” functions of certain smoothness. Finally, in Subsection 3.3 we verify these conditions for Gauss-Hermite nodes, provide bounds for the number of nodes in the resulting sparse grids, and state a convergence result with respect to this number. Section 4 comes back to our motivation and comments on the application to random elliptic PDEs before we verify our theoretical findings in Section 5 for a simple boundary value problem in one spatial dimension. We draw final conclusions in Section 6.

2 Setting and Sparse Collocation

We consider functions defined on a parameter domain taking values in a separable real Hilbert space with inner product and norm . As our interest lies in the approximation of the dependence of on by multivariate polynomials based on Lagrange interpolation, a minimal requirement is that point evaluation of at any be well-defined. Stronger smoothness requirements on become necessary when deriving convergence rate estimates for the approximations.

We introduce a probability measure on the measurable space as the countable product measure of standard Gaussian measures on , i.e.,

(3)

and denote by the space of all (equivalence classes of) functions with finite second moments with respect to in the sense that

which forms a Hilbert space with inner product

In the following we require

Assumption A1.

Let where . There holds (for a measurable extension of to ) that .

It is shown, e.g., in [40, Theorem 2.5], that the countable tensor product of Hermite polynomials forms an orthonormal basis of . Under Assumption A1 we therefore have

(4)

where and denotes the univariate Hermite orthonormal polynomial of degree as well as

2.1 Sparse Polynomial Collocation

The construction of sparse collocation operators below is based on sequences of univariate Lagrange interpolation operators mapping into the set of univariate polynomials of degree at most . Thus,

where denote the Lagrange fundamental polynomials of degree associated with the set of distinct interpolation nodes .

Remark 1.

It may also be of interest to consider sequences of interpolation operators with a more general degree of polynomial exactness where is indecreasing and , see for instance [44, 3, 37, 36, 35, 34]. However, we restrict ourselves to for simplicity.

We also introduce the detail operators

where we set , and observe that

Tensorization

For any multi-index the (full) tensor product interpolation operator is defined by

(5)

where ranges over all points in the Cartesian product

(6)

and where

(7)

is a multivariate polynomial of (total) degree . Note that ; in particular, since all but a finite number of factors in (6) and (7) are equal to one so that the corresponding products can be regarded as finite. The tensor product interpolation operator maps into the multivariate (tensor product) polynomial space

(8)

Note that, since both the univariate polynomial sets of Lagrange fundamental polynomials and the Hermite orthonormal polynomials form a basis of , equivalent characterizations are

In order for the tensor product interpolation operator to be applicable also to functions defined only on a subset , we assume the interpolation nodes to all lie in :

Assumption A2.

Let denote the domain from Assumption A1. For all the Cartesian products of nodal sets given in (6) satisfy

In the following we denote by the set of all mappings from to . In analogy to (5) we define for any multi-index the tensorized detail operator

Finally, we associate with a finite subset the multivariate polynomial space

(9)

and define the associated sparse (polynomial) collocation operator by

(10)

We will see that is exact on under some natural assumptions on the multi-index set , for which we first recall some basic definitions given in [13, 11, 12].

Partial orderings and monotone sets of multi-indices

We define a partial ordering on by

as well as
and introduce the relation

We shall call a set of multi-indices monotone if and together imply that also . Finally, for a multi-index we define its rectangular envelope by

Note that for is a finite (and monotone) set with cardinality

(11)

2.2 Polynomial Exactness of Sparse Collocation

The introduction of the rectangular envelope of a multi-index permits a convenient characterization of monotone multi-index sets and the associated polynomial space introduced in (9).

Proposition 2.

If is monotone, then

Proof.

Since for all the set on the left is obviously a subset of that on the right. Conversely, given for some , the definition of implies , which in turn implies by the monotonicity of . Moreover, monontonicity also implies

where monontonicity is required for the two last equalities. ∎

In view of Proposition 2, for a multi-index set represents a sparsification of . In particular, the full tensor product polynomial space coincides with for . Similarly, the full tensor approximation operator defined in (5) can be expressed as .

Proposition 3.

Let be a finite and monotone set. Then for all . In particular, for all we have for .

Proof.

Observe first that, for any such that we have

It suffices to prove the assertions for all monomials in . For any must satisfy and therefore , proving the second assertion. We conclude that

where the third equality follows from the fact that for all due to the monotonicity of . The proof concludes with

Note that the third equality is obtained by rewriting a (finite) product of sums: since there exists an such that for . For such we have and therefore

Proposition 3 can be seen as an extension of [6, Proposition 1] to general monotone multi-index sets as well as an extension of [13, Theorem 6.1] and [11, Theorem 2.1] to interpolation operators with non-nested node sets. As mentioned in [13, p. 89], if the set is not monotone then will not be exact on in general. However, the exactness on is a crucial property in the subsequent convergence analysis and we therefore choose to work exclusively with monotone sets .

2.3 Sparse Grid Associated with

The construction of for consists of a linear combination of tensor product interpolation operators requiring the evaluation of at certain multivariate nodes. We shall refer to the collection of these nodes as the sparse grid associated with . For a monotone and finite set there holds

(12)

because for we have

i.e., for computing we need to evaluate at

Since is a monotone set, the resulting sparse grid for is

We remark that the unisolvence on of point evaluation on is discussed in [13, Theorem 6.1].

3 Convergence Analysis

In this section we analyze the error

where denotes the norm in , is assumed to satisfy Assumption A1 and is required to be monotone and finite. Our first goal here is to establish a convergence rate for the error of for a nested sequence of monotone subsets of with , i.e.,

(13)

where may depend on as well as the univariate nodal sets. The line of proof we present here follows and builds upon the works [9, 26, 4]. We complement this convergence rate with a bound on the number of collocation points associated with a given multi-index set.

3.1 General Convergence Results

The subsequent error analysis for the sparse collocation operator is based on the representation of multivariate functions in the orthonormal basis of multivariate Hermite polynomials . We shall therefore examine the worst-case approximation error of any applied to a given multivariate Hermite basis polynomial . To this end we define

(14)

This quantity is finite since for and hence

where the maximum is taken over a finite set. The quantities also measure the deviation of the error of oblique projection from that of orthogonal projection, as these numbers would all be zero or one if is replaced with the -orthogonal projection onto . Moreover, we obtain the following bound:

Proposition 4.

For all the quantity defined in (14) satisfies

In particular, if for the univariate Hermite polynomials there exists and such that

(15)

where we have denoted the univariate Gaussian measure again by , then

(16)
Proof.

In view of Proposition 3 we have and, particularly, for , since . Therefore

giving

Moreover, if (15) holds, then

where we have used (11) and in the last inequality. ∎

Remark 5.

Bounds such as (15) can often be found in the sparse collocation or sparse quadrature literature, e.g., for quadrature operators applied to Hermite polynomials [9], norms of quadrature operators on bounded domains [39] or Lebesgue constants for Leja points [12]. Numerical estimates for the specific case of Genz-Keister points have been provided in [7].

The following lemma provides a natural starting point for bounding the approximation error of for monotone subsets . The proof follows the same line of argument as the proof of [9, Lemma 3.2].

Lemma 6 (cf. [9, Lemma 3.2]).

For a finite and monotone subset there holds

(17)
Proof.

Due to the monotonicity of we can apply Proposition 3 and obtain

Building on Lemma 6 the approximation error may be further bounded given summability results for the sequence . The key result here is known as Stechkin’s lemma which provides a decay rate for the -tail of an -summable sequence for and is due to Stechkin [41] for (cf. also [13, Lemma 3.6]).

Lemma 7 (Stechkin).

Let and let

be a sequence of nonnegative numbers. Then for denoting the set of multi-indices corresponding to the largest elements , there holds

(18)

The index sets in Stechkin’s lemma associated with the largest sequence elements are not necessarily monotone and, therefore Lemma 6 and Lemma 7 can not be combined to bound the error without additional assumptions. An obvious way to ensure monotonicity of the sets in Stechkin’s lemma is to assume the sequence to be nonincreasing, i.e.,

This leads to

Theorem 8.

Let Assumptions A1 and A2 be satisfied and let there exist a nonincreasing sequence with such that

Then there exists a nested sequence of finite and monotone subsets with such that (13) holds with rate .

We will provide a proof below. The convergence analysis in [12, 39] for sparse quadrature and interpolation in case of bounded follows Theorem 8, although sometimes hidden in the details. There the authors employ explicit bounds on the norms of the Legendre or Taylor coefficients of to construct a dominating and nonincreasing sequence , .

In our setting it is, however, not always possible to derive explicit bounds on the norm of the Hermite coefficients . In [4] a technique was developed which relies on somewhat implicit bounds on via a weighted -summability property. We adapt this approach to the current setting in

Theorem 9.

Let Assumptions A1 and A2 be satisfied and let there exist a sequence of positive numbers such that

(19)

as well as another nonincreasing sequence , , for which

Then there exists a nested sequence of finite and monotone subsets with such that (13) holds with rate .

Proof of Theorem 8 and Theorem 9.

Let be the set of multi-indices corresponding to the largest elements of . Then each is monotone and the sequence can be chosen to be nested.

If the assumption of Theorem 8 hold, we can apply Lemma 6 and Stechkin’s lemma with to obtain

where .

If the assumptions of Theorem 9 hold, Lemma 6 combined with the Cauchy-Schwarz inequality and Stechkin’s lemma for give

where now , respectively. ∎

Remark 10.

Another application of the weighted -summability property (19) is the analysis of sparse quadrature given in [9], where the author employs the slightly different estimate

After showing that the series on the right is bounded and applying Stechkin’s lemma to , this yields the same convergence rate as stated in Theorem 9.

Remark 11.

We mention that sparse collocation attains a smaller convergence rate than best -term approximation in case the assumptions of Theorem 9 hold. Namely, under these assumptions the best -term rate is , see [4, Theorem 1.2]. This reduced convergence rate is not caused by the additional factors in the error analysis of sparse collocation. The reason for the slower rate is missing orthogonality: in the proof of Lemma 6 we could not apply Parseval’s identity and had to use the triangle inequality to bound the error. This led to bounds in terms of rather than as in the case of orthogonal projections, e.g., best -term approximations.

We emphasize that the construction of such a nonincreasing, -summable dominating sequence is by no means trivial. Without the first property we can not conclude that the multi-index sets occurring in Stechkin’s lemma are monotone, which in turn is needed to use Lemma 6 as the starting point of our error analysis. Of course, we could consider monotone envelopes of , but their size can grow quite rapidly with (e.g., polynomially or even faster, see counterexample below). Moreover, it is not at all obvious that for a sequence there exists a dominating and nonincreasing . In particular, we provide the following counterexample: let and define , by

i.e.,  . Then . The smallest positive nonincreasing sequence dominating is given by , see [13, Section 3.8]. In our case, we get

and, thus,

Although the example is somewhat pathological, it illustrates that for a -summable nonincreasing dominating sequence need not exist.

3.2 Sufficient Conditions for Weighted Summability and Majorization

We now follow the strategy of Theorem 9 and study under which requirements the assumptions of Theorem 9 hold. To this end we recall a result from [4] for weighted -summability of Hermite coefficients given the following smoothness conditions on :

Assumption A3.

Let satisfy Assumption A1. There exists an integer and a sequence of positive numbers , , such that

  1. for any with the (weak) partial derivative exists and satisfies ,

  2. there holds

    (20)

    where and .

Observe that the sum in (20) is actually a series, because has infinitly many components and therefore there are countably many vectors such that . Assumption A3(a) states that we require a finite order of partial differentiability of , i.e., up to order with respect to each variable , and, maybe more importantly, Assumption A3(b) asks for a weighted square-summability of the -norms of the corresponding partial derivatives. The latter, in particular, implies bounds of the form

since otherwise the summability requirement (20) would not hold. Recalling that this bound implies that, e.g., the -norm of the derivative , , decays if .

The following result shows that the smoothness condition of Assumption A3 implies the first condition (19) of Theorem 9:

Theorem 12 (cf. [4, Theorem 3.1]).

Let Assumption A3 be satisfied. Then, with the weights

(21)

where

there holds

(22)

(We mention in passing that in [4] the assertion of Theorem 12 was actually proven without requiring that both series in (22) be finite.) To apply Theorem 9 it remains to prove the existence of a nonincreasing and -summable sequence which dominates , . Since the are explicitly given in (21), this boils down to the question, how fast the projection errors are allowed to grow. As it turns out, a polynomial growth w.r.t. as given in (16) in Proposition 4 is sufficient. We therefore state the following lemma, which is strongly based on the techniques developed in the proofs of [4, Lemma 5.1] and [9, Lemma 3.4].

Lemma 13.

Let there exists a and a such that

Then for any increasing sequence such that for a and for any there exists a nonincreasing sequence such that

where is as in (21).

Proof.

We start with constructing the dominating sequence