Convergence of Sparse Collocation for Functions of Countably Many Gaussian Random Variables (with Application to Elliptic PDEs)
We give a convergence proof for the approximation by sparse collocation of Hilbert-space-valued functions depending on countably many Gaussian random variables.
Such functions appear as solutions of elliptic PDEs with lognormal diffusion coefficients.
We outline a general -convergence theory based on previous work by Bachmayr et al.  and Chen  and establish an algebraic convergence rate for sufficiently smooth functions assuming a mild growth bound for the univariate hierarchical surpluses of the interpolation scheme applied to Hermite polynomials.
We verify specifically for Gauss-Hermite nodes that this assumption holds and also show algebraic convergence w.r.t. the resulting number of sparse grid points for this case.
Numerical experiments illustrate the dimension-independent convergence rate.
lognormal diffusion coefficient,
Mathematics Subject Classification: 65D05, 65D15, 65C30, 60H25.
The elliptic diffusion problem
with a random diffusion coefficient with respect to an underlying probability space has become the standard model problem for numerical methods for solving random PDEs. For modeling reasons the diffusion field is often taken to have a lognormal probability law, which complicates both the study of the well-posedness of the problem [8, 24, 19, 31] as well as the analysis of approximation methods. One of the challenges is that the most common parametrization of a Gaussian random field – the Karhunen-Loève expansion [2, 22] – involves a countable number of standard normal random variables
where and i.i.d. for , leading to an elliptic PDE with a countably infinite number of random parameters .
Besides the stochastic Galerkin method [22, 29] the most common methods for approximating the solution of such random or parametric elliptic PDEs are polynomial collocation methods. Early works on such methods for random PDEs considered a finite (if large) number of random parameters, a setting also referred to as finite-dimensional noise [44, 3, 37, 36]. In this case the parametric representation of is typically obtained by truncating a series expansion of the random field such as (2).
The analysis of the problem involving an infinite number of random variables was first discussed by Cohen, DeVore and Schwab in [14, 15] in the simpler setting in which the diffusion field , rather than its logarithm as in (2), is expanded in a series. This results in an affine dependence of on the random variables , which are, moreover, assumed to have bounded support. In this framework the convergence of the best -term approximation of the solution of the diffusion equation by Taylor and Legendre series was shown to be independent of the number of random variables; this result was further refined in the recent paper . Employing the theoretical concepts stated in [14, 15], Chkifa, Cohen and Schwab analyze in  collocation methods based on Lagrange interpolation with Leja points for problems with diffusion coefficients depending linearly on an infinite number of bounded random variables, which are adaptive in the polynomial degree as well as the number of active dimensions or random variables, respectively. The adaptive algorithm itself is related to the earlier work . Each interpolatory approximation gives rise to a quadrature scheme, and in  Schillings and Schwab consider sparse adaptive quadrature schemes in the same setting of  in connection with approximating expectations with respect to posterior measures in Bayesian inference. Extensions to the case where the diffusion coefficient depends non-linearly on an infinite number of random variables with bounded support was discussed in .
Returning to the original lognormal diffusion problem, i.e., with expanded as in (2) and depending on random variables with unbounded support, Hoang and Schwab  have obtained convergence results on best -term approximation by Hermite polynomials. These were recently extended by Bachmayr et al.  using a different analytical approach employing a weighted -summability of the coefficients of the Hermite expansion of the solution and their relation to partial derivatives. The theoretical tools provided in  enabled a convergence analysis for adaptive sparse quadrature  employing, e.g., Gauss-Hermite nodes for Banach space-valued functions of countably many Gaussian random variables.
In this paper we address the convergence of sparse polynomial collocation for functions of infinitely many Gaussian random variables, such as the solution to the lognormal diffusion problem (1). Specifically, we follow the approach of  and  to prove an algebraic convergence rate with respect to the number of grid points for sparse collocation based on Gauss-Hermite interpolation nodes in the case of countably many variables. In particular, the result applies to solutions of (1) where is a lognormal random field. In addition, we highlight the common ideas surrounding sparse collocation found in the works mentioned above. The convergence result in terms of the number of collocation points is obtained in two steps: we first link the error to the size of the multi-index set definining the sparse collocation and then derive a bound on the number of points in the associated sparse grid. This procedure has been followed also in all the above-mentioned work analyzing the convergence of sparse grid quadrature and collocation schemes. An alternative strategy which instead links the error directly to the number of collocation points by introducing the so-called “profits” of each component of the sparse grids, has been discussed in [34, 25], albeit only in the case of random variables with bounded support.
We remark that, besides the classical node families such as Gauss-Hermite and Genz-Keister  for quadrature and interpolation on with respect to a Gaussian measure, Jakeman and Narayan  have introduced weighted Leja points—a generalization of the classical Leja point construction (see e.g. [30, 17] and references therein) to unbounded domains and arbitrary weight functions. Moreover, they have proved that these node sets possess the correct asymptotic distribution of interpolation nodes and illustrate their computational potential in numerical experiments. Note that such weighted Leja points provide a nested and linearly growing sequence of interpolation nodes. The analysis of sparse collocation based on normal Leja points, i.e., weighted Leja points for a Gaussian measure, is an interesting topic for future research.
The remainder of the paper is organized as follows. In the next section we introduce the general setting and notation and construct the sparse grid collocation operator based on univariate Lagrange interpolation operators. Section 3 is devoted to the convergence analysis of such operators. First, we outline in Subsection 3.1 the general approaches to prove algebraic convergence rates as they can be found in the works mentioned above. Later, we follow in Subsection 3.2 the approach of [4, 9] and derive sufficient conditions for the underlying univariate interpolation nodes in order to obtain such rates when approximating “countably-variate” functions of certain smoothness. Finally, in Subsection 3.3 we verify these conditions for Gauss-Hermite nodes, provide bounds for the number of nodes in the resulting sparse grids, and state a convergence result with respect to this number. Section 4 comes back to our motivation and comments on the application to random elliptic PDEs before we verify our theoretical findings in Section 5 for a simple boundary value problem in one spatial dimension. We draw final conclusions in Section 6.
2 Setting and Sparse Collocation
We consider functions defined on a parameter domain taking values in a separable real Hilbert space with inner product and norm . As our interest lies in the approximation of the dependence of on by multivariate polynomials based on Lagrange interpolation, a minimal requirement is that point evaluation of at any be well-defined. Stronger smoothness requirements on become necessary when deriving convergence rate estimates for the approximations.
We introduce a probability measure on the measurable space as the countable product measure of standard Gaussian measures on , i.e.,
and denote by the space of all (equivalence classes of) functions with finite second moments with respect to in the sense that
which forms a Hilbert space with inner product
In the following we require
Let where . There holds (for a measurable extension of to ) that .
where and denotes the univariate Hermite orthonormal polynomial of degree as well as
2.1 Sparse Polynomial Collocation
The construction of sparse collocation operators below is based on sequences of univariate Lagrange interpolation operators mapping into the set of univariate polynomials of degree at most . Thus,
where denote the Lagrange fundamental polynomials of degree associated with the set of distinct interpolation nodes .
We also introduce the detail operators
where we set , and observe that
For any multi-index the (full) tensor product interpolation operator is defined by
where ranges over all points in the Cartesian product
is a multivariate polynomial of (total) degree . Note that ; in particular, since all but a finite number of factors in (6) and (7) are equal to one so that the corresponding products can be regarded as finite. The tensor product interpolation operator maps into the multivariate (tensor product) polynomial space
Note that, since both the univariate polynomial sets of Lagrange fundamental polynomials and the Hermite orthonormal polynomials form a basis of , equivalent characterizations are
In order for the tensor product interpolation operator to be applicable also to functions defined only on a subset , we assume the interpolation nodes to all lie in :
In the following we denote by the set of all mappings from to . In analogy to (5) we define for any multi-index the tensorized detail operator
Finally, we associate with a finite subset the multivariate polynomial space
and define the associated sparse (polynomial) collocation operator by
Partial orderings and monotone sets of multi-indices
We define a partial ordering on by
|as well as|
|and introduce the relation|
We shall call a set of multi-indices monotone if and together imply that also . Finally, for a multi-index we define its rectangular envelope by
Note that for is a finite (and monotone) set with cardinality
2.2 Polynomial Exactness of Sparse Collocation
The introduction of the rectangular envelope of a multi-index permits a convenient characterization of monotone multi-index sets and the associated polynomial space introduced in (9).
If is monotone, then
Since for all the set on the left is obviously a subset of that on the right. Conversely, given for some , the definition of implies , which in turn implies by the monotonicity of . Moreover, monontonicity also implies
where monontonicity is required for the two last equalities. ∎
In view of Proposition 2, for a multi-index set represents a sparsification of . In particular, the full tensor product polynomial space coincides with for . Similarly, the full tensor approximation operator defined in (5) can be expressed as .
Let be a finite and monotone set. Then for all . In particular, for all we have for .
Observe first that, for any such that we have
It suffices to prove the assertions for all monomials in . For any must satisfy and therefore , proving the second assertion. We conclude that
where the third equality follows from the fact that for all due to the monotonicity of . The proof concludes with
Note that the third equality is obtained by rewriting a (finite) product of sums: since there exists an such that for . For such we have and therefore
Proposition 3 can be seen as an extension of [6, Proposition 1] to general monotone multi-index sets as well as an extension of [13, Theorem 6.1] and [11, Theorem 2.1] to interpolation operators with non-nested node sets. As mentioned in [13, p. 89], if the set is not monotone then will not be exact on in general. However, the exactness on is a crucial property in the subsequent convergence analysis and we therefore choose to work exclusively with monotone sets .
2.3 Sparse Grid Associated with
The construction of for consists of a linear combination of tensor product interpolation operators requiring the evaluation of at certain multivariate nodes. We shall refer to the collection of these nodes as the sparse grid associated with . For a monotone and finite set there holds
because for we have
i.e., for computing we need to evaluate at
Since is a monotone set, the resulting sparse grid for is
We remark that the unisolvence on of point evaluation on is discussed in [13, Theorem 6.1].
3 Convergence Analysis
In this section we analyze the error
where denotes the norm in , is assumed to satisfy Assumption A1 and is required to be monotone and finite. Our first goal here is to establish a convergence rate for the error of for a nested sequence of monotone subsets of with , i.e.,
where may depend on as well as the univariate nodal sets. The line of proof we present here follows and builds upon the works [9, 26, 4]. We complement this convergence rate with a bound on the number of collocation points associated with a given multi-index set.
3.1 General Convergence Results
The subsequent error analysis for the sparse collocation operator is based on the representation of multivariate functions in the orthonormal basis of multivariate Hermite polynomials . We shall therefore examine the worst-case approximation error of any applied to a given multivariate Hermite basis polynomial . To this end we define
This quantity is finite since for and hence
where the maximum is taken over a finite set. The quantities also measure the deviation of the error of oblique projection from that of orthogonal projection, as these numbers would all be zero or one if is replaced with the -orthogonal projection onto . Moreover, we obtain the following bound:
For all the quantity defined in (14) satisfies
In particular, if for the univariate Hermite polynomials there exists and such that
where we have denoted the univariate Gaussian measure again by , then
Bounds such as (15) can often be found in the sparse collocation or sparse quadrature literature, e.g., for quadrature operators applied to Hermite polynomials , norms of quadrature operators on bounded domains  or Lebesgue constants for Leja points . Numerical estimates for the specific case of Genz-Keister points have been provided in .
The following lemma provides a natural starting point for bounding the approximation error of for monotone subsets . The proof follows the same line of argument as the proof of [9, Lemma 3.2].
Lemma 6 (cf. [9, Lemma 3.2]).
For a finite and monotone subset there holds
Due to the monotonicity of we can apply Proposition 3 and obtain
Building on Lemma 6 the approximation error may be further bounded given summability results for the sequence . The key result here is known as Stechkin’s lemma which provides a decay rate for the -tail of an -summable sequence for and is due to Stechkin  for (cf. also [13, Lemma 3.6]).
Lemma 7 (Stechkin).
Let and let
be a sequence of nonnegative numbers. Then for denoting the set of multi-indices corresponding to the largest elements , there holds
The index sets in Stechkin’s lemma associated with the largest sequence elements are not necessarily monotone and, therefore Lemma 6 and Lemma 7 can not be combined to bound the error without additional assumptions. An obvious way to ensure monotonicity of the sets in Stechkin’s lemma is to assume the sequence to be nonincreasing, i.e.,
This leads to
We will provide a proof below. The convergence analysis in [12, 39] for sparse quadrature and interpolation in case of bounded follows Theorem 8, although sometimes hidden in the details. There the authors employ explicit bounds on the norms of the Legendre or Taylor coefficients of to construct a dominating and nonincreasing sequence , .
In our setting it is, however, not always possible to derive explicit bounds on the norm of the Hermite coefficients . In  a technique was developed which relies on somewhat implicit bounds on via a weighted -summability property. We adapt this approach to the current setting in
Let be the set of multi-indices corresponding to the largest elements of . Then each is monotone and the sequence can be chosen to be nested.
After showing that the series on the right is bounded and applying Stechkin’s lemma to , this yields the same convergence rate as stated in Theorem 9.
We mention that sparse collocation attains a smaller convergence rate than best -term approximation in case the assumptions of Theorem 9 hold. Namely, under these assumptions the best -term rate is , see [4, Theorem 1.2]. This reduced convergence rate is not caused by the additional factors in the error analysis of sparse collocation. The reason for the slower rate is missing orthogonality: in the proof of Lemma 6 we could not apply Parseval’s identity and had to use the triangle inequality to bound the error. This led to bounds in terms of rather than as in the case of orthogonal projections, e.g., best -term approximations.
We emphasize that the construction of such a nonincreasing, -summable dominating sequence is by no means trivial. Without the first property we can not conclude that the multi-index sets occurring in Stechkin’s lemma are monotone, which in turn is needed to use Lemma 6 as the starting point of our error analysis. Of course, we could consider monotone envelopes of , but their size can grow quite rapidly with (e.g., polynomially or even faster, see counterexample below). Moreover, it is not at all obvious that for a sequence there exists a dominating and nonincreasing . In particular, we provide the following counterexample: let and define , by
i.e., . Then . The smallest positive nonincreasing sequence dominating is given by , see [13, Section 3.8]. In our case, we get
Although the example is somewhat pathological, it illustrates that for a -summable nonincreasing dominating sequence need not exist.
3.2 Sufficient Conditions for Weighted Summability and Majorization
We now follow the strategy of Theorem 9 and study under which requirements the assumptions of Theorem 9 hold. To this end we recall a result from  for weighted -summability of Hermite coefficients given the following smoothness conditions on :
Let satisfy Assumption A1. There exists an integer and a sequence of positive numbers , , such that
for any with the (weak) partial derivative exists and satisfies ,
where and .
Observe that the sum in (20) is actually a series, because has infinitly many components and therefore there are countably many vectors such that . Assumption A3(a) states that we require a finite order of partial differentiability of , i.e., up to order with respect to each variable , and, maybe more importantly, Assumption A3(b) asks for a weighted square-summability of the -norms of the corresponding partial derivatives. The latter, in particular, implies bounds of the form
since otherwise the summability requirement (20) would not hold. Recalling that this bound implies that, e.g., the -norm of the derivative , , decays if .
Theorem 12 (cf. [4, Theorem 3.1]).
Let Assumption A3 be satisfied. Then, with the weights
(We mention in passing that in  the assertion of Theorem 12 was actually proven without requiring that both series in (22) be finite.) To apply Theorem 9 it remains to prove the existence of a nonincreasing and -summable sequence which dominates , . Since the are explicitly given in (21), this boils down to the question, how fast the projection errors are allowed to grow. As it turns out, a polynomial growth w.r.t. as given in (16) in Proposition 4 is sufficient. We therefore state the following lemma, which is strongly based on the techniques developed in the proofs of [4, Lemma 5.1] and [9, Lemma 3.4].
Let there exists a and a such that
Then for any increasing sequence such that for a and for any there exists a nonincreasing sequence such that
where is as in (21).
We start with constructing the dominating sequence