Parametric PDEs: Sparse or lowrank approximations?^{†}^{†}thanks: This work has been supported by ERC AdG 338977 BREAD, DFG SFBTransregio 40, DFG Research Group 1779, the Excellence Initiative of the German Federal and State Governments (RWTH Aachen Distinguished Professorship), by DARPABAA1513, and by the Hausdorff Center of Mathematics, University of Bonn.
Abstract
We consider adaptive approximations of the parametertosolution map for elliptic operator equations depending on a large or infinite number of parameters, comparing approximation strategies of different degrees of nonlinearity: sparse polynomial expansions, general lowrank approximations separating spatial and parametric variables, and hierarchical tensor decompositions separating all variables. We describe corresponding adaptive algorithms based on a common generic template and show their nearoptimality with respect to natural approximability assumptions for each type of approximation. A central ingredient in the resulting bounds for the total computational complexity are new operator compression results for the case of infinitely many parameters. We conclude with a comparison of the complexity estimates based on the actual approximability properties of classes of parametric model problems, which shows that the computational costs of optimized lowrank expansions can be significantly lower or higher than those of sparse polynomial expansions, depending on the particular type of parametric problem.
MSC 2010: 41A46, 41A63, 42C10, 65D99, 65J10, 65N12, 65N15
Keywords: parameterdependent PDEs, lowrank approximations, sparse polynomial expansions, a posteriori error estimates, adaptive methods, complexity bounds
1 Introduction
Complex design, optimization, or uncertainty quantification tasks based on parameter dependent families of PDEs arise in virtually all branches of science and engineering. Typical scenarios are models whose physical properties – such as diffusivity, transport velocity or domain geometry – are described by a finite number of real parameter values. In certain instances, one may even encounter infinitely many parameters of decreasing influence. This occurs for instance in the case of a random stochastic diffusion field represented by an infinite expansion in a given basis. The development and analysis of numerical strategies for capturing the dependence of the PDE on the parameters has been the subject of intensive research efforts in recent years.
1.1 Problem formulation
The problems that are addressed in this paper have the following general form. Let be a separable Hilbert space. We consider a parametric operator of the form
(1.1) 
where or in the finite or infinite dimensional case, respectively. In the infinite dimensional case, we require that the above series converges in for any . We assume uniform boundedness and ellipticity of over the parameter domain, that is
(1.2) 
for some , which implies in particular that is boundedly invertible uniformly in , with
(1.3) 
We also consider parametric data , and for each , we define the solution to the equation
(1.4) 
A guiding example is provided by affinely parametrized diffusion problems of the form
(1.5) 
with homogeneous Dirichlet boundary conditions, posed in the weak sense on a spatial domain . In this particular case of frequent interest, the data is independent of . The validity of (1.2) is then usually ensured by the uniform ellipticity assumption
(1.6) 
for some . We then have , and the corresponding operators for are defined by
for .
Thus, is typically a function space defined over some physical domain , with , for example the Sobolev space in the above case of second order elliptic equations with homogeneous Dirichlet boundary conditions. Therefore the solution may either be viewed as the Hilbert space valued map
(1.7) 
which acts from to or as the scalar valued map
(1.8) 
where and are referred to as the spatial and parametric variables.
Approximating such solution maps amounts to approximating functions of a large or even infinite number of variables. In applications, one is often interested in specific functionals of the solution. Here we focus on the basic question of approximating the entire solution map in an appropriate norm.
The guiding questions, to be made precise below, are the following: What are the most suitable approximations to cope with the high dimensionality in problems of the form (1.4), and what features of problems (1.4) favor certain approaches over others? Moreover, at what numerical cost can one find these approximations, and how do these costs depend on particular features of the given problem? To address the latter question, for each setting we construct adaptive computational methods that exhibit nearoptimal complexity, in a sense to be made precise below.
1.2 Sparse and lowrank approximability
Before addressing any concrete numerical schemes, we discuss basic concepts of approximations for the solution map in (1.7). We focus on the meansquare error for an approximation , where
for a given probability measure over . In what follows, we assume that is the uniform probability measure on . The results carry over, however, to other product measures on . The following types of approximation make essential use of the tensor product structure of the Bochner space where .
Sparse polynomial expansions.
A first approach to approximating is to employ an a priorily chosen basis , and compute the as the corresponding coefficients of this approximation. One prominent example of this approach are orthogonal polynomial expansion methods, see e.g. [22, 23, 39, 49]. In this case, the parametric functions are picked from the set of tensorized Legendre polynomials
(1.9) 
with the univariate Legendre polynomial sequence normalized in . The functions are an orthonormal basis of , where is in the case or the set of finitely supported sequences of nonnegative integers in the case , that is
(1.10) 
One thus has
(1.11) 
Then, one natural choice is the best term approximation obtained by restricting the above expansion to the set of indices corresponding to the largest , since this set minimizes the error among all possible choices of term truncations. This strategy for generating sparse polynomial approximations in the context of parametric PDEs was first introduced and analyzed in [14, 15]. In practice, the set is not accessible, but provides a benchmark for the performance of algorithms.
This representational complexity, however, does not yet determine the resulting computational complexity, since the coefficients in (1.11) in turn need to be approximated as well. For instance, one may choose a fixed basis of and expand
in (1.11), where . The simplest strategy is to use the same discretization for all by selecting a finite , which yields the approximation
Using instead an independently adapted spatial discretization for each corresponds to adaptive sparse polynomial approximations of the form
(ASP) 
with . It is natural to quantify the complexity of such an approximation by the number of activated degrees of freedom . Here one can again ask for best term approximations, now with respect to the fixed basis , obtained by minimizing the error over all with . This now results in a fully discrete approximation.
Lowrank approximation.
More generally, one may consider approximations of the form
(1.12) 
where and are functions of the spatial and parametric variable, respectively. This contains (1.11) as a special case, but we now allow also to be arbitrary functions that are not given a priori, but adapted to the given problem.
The shortest expansion of the form (1.12) that achieves a prescribed error in is given by truncation of the HilbertSchmidt decomposition of interpreted as the operator
(1.13) 
acting from to . In this context, we define as the rank of the operator , so that in particular with a representation by separable terms as in (1.12) has . The functions and are given by the left and right singular functions, respectively, which yield the optimal rank approximation of in .
This particular system of basis functions is a natural benchmark as it minimizes the rank required to ensure a meansquare accuracy . However, it is not obvious how to compute sufficiently good approximations of these basis functions at affordable cost, a point to be taken up again later.
The methods considered in this paper are based on computing approximations of both and . A lowrank approximation trying to approximately realizing a truncated HilbertSchmidt decomposition would be a first example for this category aiming at meeting the above mentioned benchmark. In this case the error caused by truncation should ideally be balanced against the error in approximating the unknown basis functions .
Note that there exist alternative approaches for deriving computable expansions of the form (1.12), where only the functions and their span are constructed, which we comment on in §1.4.
To obtain numerically realizable approximations, we may again use bases of and as in (ASP) and consider expansions for and to arrive at fully discrete lowrank approximations of the form
(LR) 
with , , .
Hierarchical tensor decompositions.
One may as well go beyond the HilbertSchmidt decomposition (1.12) and consider higherorder lowrank tensor representations that correspond to further decompositions of the factors in (1.12). For simplicity, at this point let us consider this in the finitedimensional case , possibly after truncating the expansion (1.1) for . Introducing an additional tensor decomposition of the factors , we obtain the general approximations in subspacebased tensor formats,
(STF) 
where each is a function of the individual variable . The minimal such that can be represented in the form (STF) are called multilinear ranks of .
We confine our discussion to hierarchical tensor representations (with the tensor train format as a special case), see e.g. [26, 30, 41], where the highorder core tensor is further decomposed in terms of lowerorder tensors, based on matricizations of . For instance, if
(1.14) 
one has a factorized representation of the form
(1.15) 
in terms of the tensors , , of order at most three, and only these loworder tensors need to be stored and manipulated.
1.3 Guiding questions
In the different types of approximation outlined above, the degrees of freedom enter in varying degrees of nonlinearity. More strongly nonlinear approximations (STF) with hierarchical decomposition (1.15) can potentially yield more strongly compressed representations, in the sense that the number of degrees of freedom required for a target accuracy in scales more favorably. Handling this stronger compression in the computation of such representations, however, leads to additional difficulties, and the number of required operations may in fact scale less favorably than .
Here we aim for algorithms which, for each of the above types of approximation, are guaranteed to achieve any prescribed accuracy , and which are universal. This means that they do not require a priori knowledge on the approximability of the solution (e.g., on the decay of coefficients), but adjust to such approximability automatically. This goes hand in hand with a mechanism for obtaining a posteriori error bounds, making use only of the given data. This leads us to our first guiding question:
(I) For a given parametric problem and approximation format (ASP), (LR) or (STF), can one contrive a universal numerical scheme that can achieve any given target accuracy , with approximate solutions close to the minimum required representation complexity , and can be related to and hence to ?
The minimum required representation complexity can be expressed in terms of the intrinsic approximability properties of the parametrized solutions in each of the formats. The corresponding required number of operations also depends on the problem data that are used in the solution process. We construct algorithms, based on a common generic strategy for (ASP), (LR), and (STF), which are nearoptimal in this regard. With such algorithms at hand, a natural further question is the following.
(II) Which of the approximation types (ASP), (LR), or (STF) is best suited for a given parametric problem, in the sense of leading to the smallest growth of as ?
This amounts to asking for which parametric problems the investment into approximations of higher structural nonlinearity pays off, or conversely, for which problems possible gains in approximation efficiency are offset by more demanding computations. We address this point by analyses of the approximability of model problems, complemented by numerical experiments, with conclusions depending on the particular problem type.
For problems with finitely many parameters that are each of comparable influence, hierarchical tensor representations of the form (STF) with (1.15) turn out to be clearly advantageous. In the case of an anisotropic dependence on infinitely many parameters, for representative model problems we demonstrate that (ASP) can in general yield faster convergence than (LR) or (STF). The particular structure of such infinite parameter expansions also turns out to have a major influence on the efficiency of the adaptive schemes.
1.4 Relation to previous work
There is a variety of results on the convergence of sparse polynomial expansions (1.11), see, e.g., [14, 15, 5]. Furthermore, some estimates are available that include multilevel spatial discretizations and hence provide upper bounds for the error of best term approximation (ASP), see, e.g., [15, 16]. Concerning our question (II), there are only few specialized results comparing the different approximation formats. In the case of general bivariate functions, a systematic comparison between sparse grids and low rank approximation is discussed in [28], showing in particular that for Sobolev classes the latter does not bring any improvement. In the case of highdimensional functions associated to parametric PDEs, possible gains by lowrank approximations have been identified in [37, 2] by exploiting the particular structure of the problem, all concerning the case of finitely many parameters.
There are various approaches for generating sparse polynomial expansions, for instance based on collocation [1, 8] or adaptive Taylor expansion [10]. Note that these strategies do not currently yield a posteriori error bounds for the computed solutions, and their performance is thus described by a priori estimates which may not be sharp.
The adaptive methods proposed in [18, 19], based on finite element discretization for the spatial variable, yields a posteriori error bounds for the full approximations. However, the complexity bounds proven in [19] are given only in terms of the resulting finite element meshes.
Adaptive schemes using waveletbased spatial discretizations, which yield approximations of the form (ASP), have been studied by Gittelson [24, 25]. In this case, bounds for the complete computational complexity are proven which, however, do not fully comply with the approximability properties of the solution.
Reduced basis and POD methods [44, 32, 37, 43] correspond to expansions of the form (1.12), where only the spatial basis elements spanning are explicitly computed in an offline stage. Then, in an online stage, for any given , the approximate solution is defined as the Galerkin projection of on the space . For known variants of these methods, accuracy guarantees in the respective norms (where reduced basis methods usually aim at the error in norm ) require a sufficiently dense sampling of the parameter domain. This becomes prohibitive for large , and one only obtains a posteriori bounds for the resulting error in each given .
In methods based on higherorder tensor representations, instead of sampling in the parameter domain, one also approximates as in (STF), at the price of additional approximability requirements as in (1.15). A variety of schemes have been proposed that operate on fixed discretizations [34, 36, 33, 40], which do not yield information on the discretization error. Based on [18], an adaptive scheme for hierarchical tensor approximation is proposed in [20]. It provides rigorous a posteriori bounds for the approximation error, but is not proven to converge.
1.5 Novelty of the paper and outline
Question (I) is addressed in sections §2 to §5. A generic algorithm is described in §2 based on the work in [6], which is guaranteed to converge without any a priori assumptions on the solution. Furthermore, it yields rigorous a posteriori error bounds, using only information on the problem data. Suitable specifications cover all above mentioned types of approximations (ASP), (LR), and (STF). The scheme is formulated in a general sequence space framework, using a discretization of the space through a basis with elements of the form . Here, is a given Riesz basis of (for example, a wavelet basis in the case where is a Sobolev space) and is the previously described multivariate Legendre basis. The algorithm performs an iteration in the sequence space . It involves at each step specific routines and aiming at controlling the rank of the current approximation as well as the number of degrees of freedom in each of its factors, respectively.
We then describe realizations of this generic algorithm corresponding to two distinct settings. In §3 we apply the algorithm for the generation of approximations (STF) in the setting of finitely many parametric variables. In this case the routine is based on a truncation of a hierarchical singular value decomposition of the coefficient tensor. We analyze the performance of the algorithms for classes described by the decay of the corresponding singular values and joint sparsity of the corresponding singular vectors.
§4 and §5 are devoted to the case of anisotropic dependence on infinitely many parameters in the diffusion problem (1.5). In §4 we analyze a specialized version of Algorithm 1 producing term sparse Legendre expansions, see (ASP). In this version the routine is simply the identity, and hence Algorithm 1 agrees with the adaptive solver developed and analyzed in [12]. In §5 we consider, in the same setting as in §4, a solver for approximations (LR). In this case the routine is based on standard SVD truncation. The corresponding notions of approximability are analogous to those arising in §3.
A key ingredient in §4 and §5 is the adaptive approximation of the operator based on matrix compression results in Appendix A. Here we obtain new estimates for wavelettype multilevel expansions of the parametrized coefficients that are more favorable than what is known for KarhunenLoèvetype expansions. Our further algorithmic developments also require substantially weaker assumptions on the in (1.1) than the methods in [18, 20], which require summability of . By the new operator compression results, we establish, in particular, computational complexity estimates for (ASP) which significantly improve on those of similar schemes in [24, 25].
Based on these complexity estimates, question (II) is then addressed in §6. While the presented algorithms are guaranteed to converge, the corresponding computational cost can only be quantified in terms of approximability properties of solutions . In §6, we study the corresponding properties, which are different for each realization of the scheme, in representative examples of parametric problems of the form (1.5). In particular, for a certain class of such problems, we prove that the best term Legendre approximation is already asymptotically nearoptimal among all rank approximations. For other examples, we prove that optimized lowrank approximations can achieve significantly better complexity than best term Legendre approximations. This is illustrated further by numerical tests, demonstrating that these observations also hold for more involved model problems.
2 A generic algorithm
In this section, we follow the approach developed in [6], by first reformulating the general equation (1.4) in a sequence space, and then introducing a generic resolution algorithm based on this equivalent formulation.
We first notice that (1.4) may also be written as
(2.1) 
where is elliptic and boundedly invertible from to and can be defined in a weak sense by
(2.2) 
We assume that , so that there exists a unique solution .
Given a Riesz basis of , we tensorize it with the orthonormal basis of . The resulting system is a Riesz basis of , which we now use to discretize (2.1). For this purpose, we define the matrices
(2.3) 
where is set to be the identity on , and the right hand side column vector
(2.4) 
We thus obtain an equivalent problem
(2.5) 
on , where
(2.6) 
and is the coordinate vector of in the basis .
Regarding as the column index of the infinite matrix , we denote by the columns of , which are precisely the basis representations of the Legendre coefficients .
In what follows we always denote by the norm on the respective index set which could be , or , or the corresponding operator norm when this is clear from the context. Since is a Riesz basis for we have uniformly in , which together with boundedness and ellipticity of implies that is bounded and elliptic on and that we have
(2.7) 
with uniform constants. On account of (2.7), solving (2.5) approximately up to some target accuracy is equivalent to solving (2.5) in to essentially the same accuracy.
As a further consequence, one can find a fixed positive such that , ensuring that a simple Richardson iteration converges with a fixed error reduction rate per step. This serves as the conceptual starting point for the adaptive lowrank approximation scheme introduced in [6] as given in Algorithm 1.
This basic algorithmic template can be used to produce various types of sparse and lowrank approximations, with appropriate choices of the subroutines , , , and .
The procedures and are independent of the considered and , and satisfy
(2.8) 
for any and any compactly supported . Here is intended to reduce the support of the sequence , whereas reduces the rank of in a lowrank tensor representation. The particular realizations of these routines depend on the dimensionality of the problem and on the type of approximation. We shall use the constructions given in [6]. In the case of the sparse approximations considered in §4, is chosen as the identity, and Algorithm 1 essentially reduces to the method analyzed in [12].
The routines and are assumed to satisfy, for compactly supported and any , the requirements
(2.9) 
Their construction not only depends on the type of approximation, but also on the specific problem under consideration. These two routines are indeed the main driver of adaptivity in Algorithm 1, and a major part of what follows concerns the construction of in different scenarios.
It hinges on the compression of matrices by exploiting their nearsparsity in certain basis representations. We use the following notion introduced in [11]: A biinfinite matrix is called compressible if there exist matrices with entries per row and column and such that
(2.10) 
and where the sequences and are summable. Here we always assume .
Remark 2.1.
At this point, we record for later usage a particular feature of that arises as a consequence of our choice of tensor product orthogonal polynomials for the parameterdependence: The approximate application of is facilitated by the fact that the matrices are bidiagonal. That is, in view of the threeterm recurrence relation
(2.11) 
where
(2.12) 
one has whenever , providing
(2.13) 
with the Kronecker sequence .
3 Hierarchical tensor representations in the case of finitely many parameters
We begin by considering the setting
(3.1) 
where and .
Here we are interested in the case that all coordinates in have comparable influence. As illustrated in §6, a direct sparse Legendre expansion of over will then in general be infeasible already for moderately large . However, one may as well exploit Cartesian product structure in , regarding as a higherorder tensor, and use corresponding hierarchical lowrank representations. As we shall detail in what follows, the results of [6] can be adapted to this problem in a rather straightforward manner.
It will be convenient to introduce a numbering of tensor modes as follows: , , …, . We additionally introduce the notation
The representations of higherorder tensors which we consider are built on the HilbertSchmidt case via matricizations: for each nonempty , any induces a compact operator .
In terms of the left singular vectors of , , we obtain the HOSVD representation [38] in the Tucker format [47, 48],
(3.2) 
Here the tensor of order is referred to as core tensor, and as the multilinear ranks of .
The hierarchical tensor format [31], on which the variant of our scheme described in this section is based, can be interpreted as a further decomposition of into tensors of order at most three. This decomposition is obtained using further matricizations of the tensor according to a recursive decomposition of the set of modes into a binary tree, which we denote by . For simplicity, we focus in our exposition on linear trees corresponding to factorizations (1.15), where
(3.3) 
For each , the rank of the corresponding matricization is denoted by , its singular values by . Here for all , and we set
(3.4) 
In this section we specialize the generic template Algorithm 1 to produce approximate solutions to (2.1) of the form (STF) with core tensor in hierarchical form as in (1.15). More precisely, the output is given in the form of a tensor of order , supported on with finite . This tensor is given (assuming as in (3.3)) in the following form: let for and for , then
The adaptive scheme identifies, in an intertwined fashion, the ranks , , the sets , as well as the coefficient tensors and of . The function represented by has precisely the form (STF), where
The hierarchical format can offer substantially more favorable complexity characteristics for large than (3.2). The left singular vectors of the involved matricizations yield a hierarchical singular value decomposition [26]. We refer also to [21, 35, 31, 27, 30] for detailed expositions regarding the finitely supported case (see also [41, 42] for the related tensor train representation), and to [6] for analogous results for tensors in sequence spaces, with notation analogous to the present paper.
For quantifying the approximability of tensors on in terms of the best selection of finite as above, a pivotal role is played by the quantitites
(3.5) 
They are introduced in [6] and called contractions in analogy to the terminology in tensor analysis. An efficient evaluation (without any dimensional summations) is possible due to the relation
(3.6) 
where are the mode singular values of . As in our previous notation, we abbreviate , .
3.1 Adaptive scheme
In the present case, we consider Algorithm 1 with the routines and for the hierarchical format as given in [6, Rem. 15].
is based on a truncation of a hierarchical singular value decomposition up to a prescribed accuracy , which can be ensured based on the norm of omitted singular values of matricizations. We denote this operation by . As shown in [26], it satisfies the quasioptimality property
(3.7) 
with the inequality between ranks as defined in (3.4) to be understood componentwise.
retains the degrees of freedom for each mode that correspond to the largest contractions (3.5). Let be such that is nonincreasing. Denote for by the array obtained by retaining all entries of corresponding to indices in , while replacing all others by zero. Given , we define the product set
where , , are chosen to such that is minimal subject to the condition
(3.8) 
Noting that the left side in (3.8) is an upper bound for , we define as a numerical realization of , for which one has an analogous quasioptimality property as in (3.7) with constant .
Furthermore, as defined in (2.6) is in the present case of finitely many parameters a finite sum of Kronecker product operators,
which considerably simplifies the construction of the corresponding routine . The action of can thus increase each hierarchical rank of its argument at most by a factor of .
Remark 3.1.
In contrast to the case considered in [7], here the Hilbert space on which the problem is posed is endowed with a cross norm. As a consequence, the isomorphism that takes to its coefficients with respect to the tensor product basis is of Kronecker rank one. The original lowrank structure (1.1) of is therefore preserved in the representation (2.6) of the problem.
Consequently, the routine that adaptively approximates the action of can be obtained following the generic construction given in [6], provided that the operators and acting on each mode have the required compressibility properties. Recall that by (2.13), the infinite matrices are bidiagonal, and hence do not require any further approximation. To use the construction of [6], we thus only need that the operators acting on the spatial variables are compressible.
3.2 Convergence analysis
Our complexity results aim at the following type of statements: given a certain approximability of the solution, the algorithm recovers the corresponding convergence rates without their explicit knowledge.
To describe these approximability properties, we now recall the definition of approximation classes to quantify the convergence of hierarchical lowrank approximations from [6], in terms of the hierarchical rank defined by (3.4). Let be positive and strictly increasing with and as , for let
where – that is, the errors of approximation by hierarchical ranks at most – can be bounded in terms of the hierarchical singular values for . We introduce the approximation classes
We restrict our considerations to that satisfy
which corresponds to a restriction to at most exponential growth; our considerations would still apply for faster growth, but lead to less sharp results.
For an approximation of bounded support to , the number of nonzero coefficients required in each tensor mode to achieve a certain accuracy depends on the best term approximability of the sequences .
This approximability by sparse sequences is quantified by the classical approximation classes , where and is a countable index set, comprised of all for which the quasinorm
(3.9) 
is finite.
In particular, if , then these sequences can be approximated within accuracy by finitely supported sequences with nonzero entries. In what follows, we do not explicitly specify the index set in the spaces when this is clear from the context.
We analyze the complexity of the algorithm under the following benchmark assumptions, see the discussion in §6.1.
Assumptions 3.2.
For the hierarchical tensor approximation in the case (3.1) of parametric variables, we assume the following:

, , for an .

, where with , .

The , , are compressible for an , and hence there exist matrices with entries per row and column and such that , and where the sequences and