Convergence of Sparse Collocation for Functions of Countably Many Gaussian Random Variables (with Application to Elliptic PDEs)
Abstract
We give a convergence proof for the approximation by sparse collocation of Hilbertspacevalued functions depending on countably many Gaussian random variables.
Such functions appear as solutions of elliptic PDEs with lognormal diffusion coefficients.
We outline a general convergence theory based on previous work by Bachmayr et al. [4] and Chen [9] and establish an algebraic convergence rate for sufficiently smooth functions assuming a mild growth bound for the univariate hierarchical surpluses of the interpolation scheme applied to Hermite polynomials.
We verify specifically for GaussHermite nodes that this assumption holds and also show algebraic convergence w.r.t. the resulting number of sparse grid points for this case.
Numerical experiments illustrate the dimensionindependent convergence rate.
Keywords:
random PDEs,
parameteric PDEs,
lognormal diffusion coefficient,
bestterm approximation,
sparse grids,
stochastic collocation,
highdimensional approximation,
highdimensional interpolation,
GaussHermite points.
Mathematics Subject Classification: 65D05, 65D15, 65C30, 60H25.
1 Introduction
The elliptic diffusion problem
(1) 
with a random diffusion coefficient with respect to an underlying probability space has become the standard model problem for numerical methods for solving random PDEs. For modeling reasons the diffusion field is often taken to have a lognormal probability law, which complicates both the study of the wellposedness of the problem [8, 24, 19, 31] as well as the analysis of approximation methods. One of the challenges is that the most common parametrization of a Gaussian random field – the KarhunenLoève expansion [2, 22] – involves a countable number of standard normal random variables
(2) 
where and i.i.d. for , leading to an elliptic PDE with a countably infinite number of random parameters .
Besides the stochastic Galerkin method [22, 29] the most common methods for approximating the solution of such random or parametric elliptic PDEs are polynomial collocation methods. Early works on such methods for random PDEs considered a finite (if large) number of random parameters, a setting also referred to as finitedimensional noise [44, 3, 37, 36]. In this case the parametric representation of is typically obtained by truncating a series expansion of the random field such as (2).
The analysis of the problem involving an infinite number of random variables was first discussed by Cohen, DeVore and Schwab in [14, 15] in the simpler setting in which the diffusion field , rather than its logarithm as in (2), is expanded in a series. This results in an affine dependence of on the random variables , which are, moreover, assumed to have bounded support. In this framework the convergence of the best term approximation of the solution of the diffusion equation by Taylor and Legendre series was shown to be independent of the number of random variables; this result was further refined in the recent paper [5]. Employing the theoretical concepts stated in [14, 15], Chkifa, Cohen and Schwab analyze in [11] collocation methods based on Lagrange interpolation with Leja points for problems with diffusion coefficients depending linearly on an infinite number of bounded random variables, which are adaptive in the polynomial degree as well as the number of active dimensions or random variables, respectively. The adaptive algorithm itself is related to the earlier work [21]. Each interpolatory approximation gives rise to a quadrature scheme, and in [39] Schillings and Schwab consider sparse adaptive quadrature schemes in the same setting of [11] in connection with approximating expectations with respect to posterior measures in Bayesian inference. Extensions to the case where the diffusion coefficient depends nonlinearly on an infinite number of random variables with bounded support was discussed in [12].
Returning to the original lognormal diffusion problem, i.e., with expanded as in (2) and depending on random variables with unbounded support, Hoang and Schwab [26] have obtained convergence results on best term approximation by Hermite polynomials. These were recently extended by Bachmayr et al. [4] using a different analytical approach employing a weighted summability of the coefficients of the Hermite expansion of the solution and their relation to partial derivatives. The theoretical tools provided in [4] enabled a convergence analysis for adaptive sparse quadrature [9] employing, e.g., GaussHermite nodes for Banach spacevalued functions of countably many Gaussian random variables.
In this paper we address the convergence of sparse polynomial collocation for functions of infinitely many Gaussian random variables, such as the solution to the lognormal diffusion problem (1). Specifically, we follow the approach of [4] and [9] to prove an algebraic convergence rate with respect to the number of grid points for sparse collocation based on GaussHermite interpolation nodes in the case of countably many variables. In particular, the result applies to solutions of (1) where is a lognormal random field. In addition, we highlight the common ideas surrounding sparse collocation found in the works mentioned above. The convergence result in terms of the number of collocation points is obtained in two steps: we first link the error to the size of the multiindex set definining the sparse collocation and then derive a bound on the number of points in the associated sparse grid. This procedure has been followed also in all the abovementioned work analyzing the convergence of sparse grid quadrature and collocation schemes. An alternative strategy which instead links the error directly to the number of collocation points by introducing the socalled “profits” of each component of the sparse grids, has been discussed in [34, 25], albeit only in the case of random variables with bounded support.
We remark that, besides the classical node families such as GaussHermite and GenzKeister [20] for quadrature and interpolation on with respect to a Gaussian measure, Jakeman and Narayan [32] have introduced weighted Leja points—a generalization of the classical Leja point construction (see e.g. [30, 17] and references therein) to unbounded domains and arbitrary weight functions. Moreover, they have proved that these node sets possess the correct asymptotic distribution of interpolation nodes and illustrate their computational potential in numerical experiments. Note that such weighted Leja points provide a nested and linearly growing sequence of interpolation nodes. The analysis of sparse collocation based on normal Leja points, i.e., weighted Leja points for a Gaussian measure, is an interesting topic for future research.
The remainder of the paper is organized as follows. In the next section we introduce the general setting and notation and construct the sparse grid collocation operator based on univariate Lagrange interpolation operators. Section 3 is devoted to the convergence analysis of such operators. First, we outline in Subsection 3.1 the general approaches to prove algebraic convergence rates as they can be found in the works mentioned above. Later, we follow in Subsection 3.2 the approach of [4, 9] and derive sufficient conditions for the underlying univariate interpolation nodes in order to obtain such rates when approximating “countablyvariate” functions of certain smoothness. Finally, in Subsection 3.3 we verify these conditions for GaussHermite nodes, provide bounds for the number of nodes in the resulting sparse grids, and state a convergence result with respect to this number. Section 4 comes back to our motivation and comments on the application to random elliptic PDEs before we verify our theoretical findings in Section 5 for a simple boundary value problem in one spatial dimension. We draw final conclusions in Section 6.
2 Setting and Sparse Collocation
We consider functions defined on a parameter domain taking values in a separable real Hilbert space with inner product and norm . As our interest lies in the approximation of the dependence of on by multivariate polynomials based on Lagrange interpolation, a minimal requirement is that point evaluation of at any be welldefined. Stronger smoothness requirements on become necessary when deriving convergence rate estimates for the approximations.
We introduce a probability measure on the measurable space as the countable product measure of standard Gaussian measures on , i.e.,
(3) 
and denote by the space of all (equivalence classes of) functions with finite second moments with respect to in the sense that
which forms a Hilbert space with inner product
In the following we require
Assumption A1.
Let where . There holds (for a measurable extension of to ) that .
It is shown, e.g., in [40, Theorem 2.5], that the countable tensor product of Hermite polynomials forms an orthonormal basis of . Under Assumption A1 we therefore have
(4) 
where and denotes the univariate Hermite orthonormal polynomial of degree as well as
2.1 Sparse Polynomial Collocation
The construction of sparse collocation operators below is based on sequences of univariate Lagrange interpolation operators mapping into the set of univariate polynomials of degree at most . Thus,
where denote the Lagrange fundamental polynomials of degree associated with the set of distinct interpolation nodes .
Remark 1.
We also introduce the detail operators
where we set , and observe that
Tensorization
For any multiindex the (full) tensor product interpolation operator is defined by
(5) 
where ranges over all points in the Cartesian product
(6) 
and where
(7) 
is a multivariate polynomial of (total) degree . Note that ; in particular, since all but a finite number of factors in (6) and (7) are equal to one so that the corresponding products can be regarded as finite. The tensor product interpolation operator maps into the multivariate (tensor product) polynomial space
(8) 
Note that, since both the univariate polynomial sets of Lagrange fundamental polynomials and the Hermite orthonormal polynomials form a basis of , equivalent characterizations are
In order for the tensor product interpolation operator to be applicable also to functions defined only on a subset , we assume the interpolation nodes to all lie in :
Assumption A2.
In the following we denote by the set of all mappings from to . In analogy to (5) we define for any multiindex the tensorized detail operator
Finally, we associate with a finite subset the multivariate polynomial space
(9) 
and define the associated sparse (polynomial) collocation operator by
(10) 
We will see that is exact on under some natural assumptions on the multiindex set , for which we first recall some basic definitions given in [13, 11, 12].
Partial orderings and monotone sets of multiindices
We define a partial ordering on by
as well as  
and introduce the relation  
We shall call a set of multiindices monotone if and together imply that also . Finally, for a multiindex we define its rectangular envelope by
Note that for is a finite (and monotone) set with cardinality
(11) 
2.2 Polynomial Exactness of Sparse Collocation
The introduction of the rectangular envelope of a multiindex permits a convenient characterization of monotone multiindex sets and the associated polynomial space introduced in (9).
Proposition 2.
If is monotone, then
Proof.
Since for all the set on the left is obviously a subset of that on the right. Conversely, given for some , the definition of implies , which in turn implies by the monotonicity of . Moreover, monontonicity also implies
where monontonicity is required for the two last equalities. ∎
In view of Proposition 2, for a multiindex set represents a sparsification of . In particular, the full tensor product polynomial space coincides with for . Similarly, the full tensor approximation operator defined in (5) can be expressed as .
Proposition 3.
Let be a finite and monotone set. Then for all . In particular, for all we have for .
Proof.
Observe first that, for any such that we have
It suffices to prove the assertions for all monomials in . For any must satisfy and therefore , proving the second assertion. We conclude that
where the third equality follows from the fact that for all due to the monotonicity of . The proof concludes with
Note that the third equality is obtained by rewriting a (finite) product of sums: since there exists an such that for . For such we have and therefore
∎
Proposition 3 can be seen as an extension of [6, Proposition 1] to general monotone multiindex sets as well as an extension of [13, Theorem 6.1] and [11, Theorem 2.1] to interpolation operators with nonnested node sets. As mentioned in [13, p. 89], if the set is not monotone then will not be exact on in general. However, the exactness on is a crucial property in the subsequent convergence analysis and we therefore choose to work exclusively with monotone sets .
2.3 Sparse Grid Associated with
The construction of for consists of a linear combination of tensor product interpolation operators requiring the evaluation of at certain multivariate nodes. We shall refer to the collection of these nodes as the sparse grid associated with . For a monotone and finite set there holds
(12) 
because for we have
i.e., for computing we need to evaluate at
Since is a monotone set, the resulting sparse grid for is
We remark that the unisolvence on of point evaluation on is discussed in [13, Theorem 6.1].
3 Convergence Analysis
In this section we analyze the error
where denotes the norm in , is assumed to satisfy Assumption A1 and is required to be monotone and finite. Our first goal here is to establish a convergence rate for the error of for a nested sequence of monotone subsets of with , i.e.,
(13) 
where may depend on as well as the univariate nodal sets. The line of proof we present here follows and builds upon the works [9, 26, 4]. We complement this convergence rate with a bound on the number of collocation points associated with a given multiindex set.
3.1 General Convergence Results
The subsequent error analysis for the sparse collocation operator is based on the representation of multivariate functions in the orthonormal basis of multivariate Hermite polynomials . We shall therefore examine the worstcase approximation error of any applied to a given multivariate Hermite basis polynomial . To this end we define
(14) 
This quantity is finite since for and hence
where the maximum is taken over a finite set. The quantities also measure the deviation of the error of oblique projection from that of orthogonal projection, as these numbers would all be zero or one if is replaced with the orthogonal projection onto . Moreover, we obtain the following bound:
Proposition 4.
For all the quantity defined in (14) satisfies
In particular, if for the univariate Hermite polynomials there exists and such that
(15) 
where we have denoted the univariate Gaussian measure again by , then
(16) 
Proof.
Remark 5.
Bounds such as (15) can often be found in the sparse collocation or sparse quadrature literature, e.g., for quadrature operators applied to Hermite polynomials [9], norms of quadrature operators on bounded domains [39] or Lebesgue constants for Leja points [12]. Numerical estimates for the specific case of GenzKeister points have been provided in [7].
The following lemma provides a natural starting point for bounding the approximation error of for monotone subsets . The proof follows the same line of argument as the proof of [9, Lemma 3.2].
Lemma 6 (cf. [9, Lemma 3.2]).
For a finite and monotone subset there holds
(17) 
Proof.
Building on Lemma 6 the approximation error may be further bounded given summability results for the sequence . The key result here is known as Stechkin’s lemma which provides a decay rate for the tail of an summable sequence for and is due to Stechkin [41] for (cf. also [13, Lemma 3.6]).
Lemma 7 (Stechkin).
Let and let
be a sequence of nonnegative numbers. Then for denoting the set of multiindices corresponding to the largest elements , there holds
(18) 
The index sets in Stechkin’s lemma associated with the largest sequence elements are not necessarily monotone and, therefore Lemma 6 and Lemma 7 can not be combined to bound the error without additional assumptions. An obvious way to ensure monotonicity of the sets in Stechkin’s lemma is to assume the sequence to be nonincreasing, i.e.,
This leads to
Theorem 8.
We will provide a proof below. The convergence analysis in [12, 39] for sparse quadrature and interpolation in case of bounded follows Theorem 8, although sometimes hidden in the details. There the authors employ explicit bounds on the norms of the Legendre or Taylor coefficients of to construct a dominating and nonincreasing sequence , .
In our setting it is, however, not always possible to derive explicit bounds on the norm of the Hermite coefficients . In [4] a technique was developed which relies on somewhat implicit bounds on via a weighted summability property. We adapt this approach to the current setting in
Theorem 9.
Proof of Theorem 8 and Theorem 9.
Let be the set of multiindices corresponding to the largest elements of . Then each is monotone and the sequence can be chosen to be nested.
Remark 10.
Another application of the weighted summability property (19) is the analysis of sparse quadrature given in [9], where the author employs the slightly different estimate
After showing that the series on the right is bounded and applying Stechkin’s lemma to , this yields the same convergence rate as stated in Theorem 9.
Remark 11.
We mention that sparse collocation attains a smaller convergence rate than best term approximation in case the assumptions of Theorem 9 hold. Namely, under these assumptions the best term rate is , see [4, Theorem 1.2]. This reduced convergence rate is not caused by the additional factors in the error analysis of sparse collocation. The reason for the slower rate is missing orthogonality: in the proof of Lemma 6 we could not apply Parseval’s identity and had to use the triangle inequality to bound the error. This led to bounds in terms of rather than as in the case of orthogonal projections, e.g., best term approximations.
We emphasize that the construction of such a nonincreasing, summable dominating sequence is by no means trivial. Without the first property we can not conclude that the multiindex sets occurring in Stechkin’s lemma are monotone, which in turn is needed to use Lemma 6 as the starting point of our error analysis. Of course, we could consider monotone envelopes of , but their size can grow quite rapidly with (e.g., polynomially or even faster, see counterexample below). Moreover, it is not at all obvious that for a sequence there exists a dominating and nonincreasing . In particular, we provide the following counterexample: let and define , by
i.e., . Then . The smallest positive nonincreasing sequence dominating is given by , see [13, Section 3.8]. In our case, we get
and, thus,
Although the example is somewhat pathological, it illustrates that for a summable nonincreasing dominating sequence need not exist.
3.2 Sufficient Conditions for Weighted Summability and Majorization
We now follow the strategy of Theorem 9 and study under which requirements the assumptions of Theorem 9 hold. To this end we recall a result from [4] for weighted summability of Hermite coefficients given the following smoothness conditions on :
Assumption A3.
Let satisfy Assumption A1. There exists an integer and a sequence of positive numbers , , such that

for any with the (weak) partial derivative exists and satisfies ,

there holds
(20) where and .
Observe that the sum in (20) is actually a series, because has infinitly many components and therefore there are countably many vectors such that . Assumption A3(a) states that we require a finite order of partial differentiability of , i.e., up to order with respect to each variable , and, maybe more importantly, Assumption A3(b) asks for a weighted squaresummability of the norms of the corresponding partial derivatives. The latter, in particular, implies bounds of the form
since otherwise the summability requirement (20) would not hold. Recalling that this bound implies that, e.g., the norm of the derivative , , decays if .
The following result shows that the smoothness condition of Assumption A3 implies the first condition (19) of Theorem 9:
Theorem 12 (cf. [4, Theorem 3.1]).
(We mention in passing that in [4] the assertion of Theorem 12 was actually proven without requiring that both series in (22) be finite.) To apply Theorem 9 it remains to prove the existence of a nonincreasing and summable sequence which dominates , . Since the are explicitly given in (21), this boils down to the question, how fast the projection errors are allowed to grow. As it turns out, a polynomial growth w.r.t. as given in (16) in Proposition 4 is sufficient. We therefore state the following lemma, which is strongly based on the techniques developed in the proofs of [4, Lemma 5.1] and [9, Lemma 3.4].
Lemma 13.
Let there exists a and a such that
Then for any increasing sequence such that for a and for any there exists a nonincreasing sequence such that
where is as in (21).
Proof.
We start with constructing the dominating sequence