Approximation of high-dimensional parametric PDEs

# Approximation of high-dimensional parametric PDEs

Albert Cohen and Ronald DeVore This research was supported by the ONR contracts N00014-11-1-0712 and N00014-12-1-0561, the NSF grant DMS 1222715, the Institut Universitaire de France and the ERC advanced grant BREAD.
###### Abstract

Parametrized families of PDEs arise in various contexts such as inverse problems, control and optimization, risk assessment, and uncertainty quantification. In most of these applications, the number of parameters is large or perhaps even infinite. Thus, the development of numerical methods for these parametric problems is faced with the possible curse of dimensionality. This article is directed at (i) identifying and understanding which properties of parametric equations allow one to avoid this curse and (ii) developing and analyzing effective numerical methodd which fully exploit these properties and, in turn, are immune to the growth in dimensionality.

The first part of this article studies the smoothness and approximability of the solution map, that is, the map where is the parameter value and is the corresponding solution to the PDE. It is shown that for many relevant parametric PDEs, the parametric smoothness of this map is typically holomorphic and also highly anisotropic in that the relevant parameters are of widely varying importance in describing the solution. These two properties are then exploited to establish convergence rates of -term approximations to the solution map for which each term is separable in the parametric and physical variables. These results reveal that, at least on a theoretical level, the solution map can be well approximated by discretizations of moderate complexity, thereby showing how the curse of dimensionality is broken. This theoretical analysis is carried out through concepts of approximation theory such as best -term approximation, sparsity, and -widths. These notions determine a priori the best possible performance of numerical methods and thus serve as a benchmark for concrete algorithms.

The second part of this article turns to the development of numerical algorithms based on the theoretically established sparse separable approximations. The numerical methods studied fall into two general categories. The first uses polynomial expansions in terms of the parameters to approximate the solution map. The second one searches for suitable low dimensional spaces for simultaneously approximating all members of the parametric family. The numerical implementation of these approaches is carried out through adaptive and greedy algorithms. An a priori analysis of the performance of these algorithms establishes how well they meet the theoretical benchmarks.

## 1 Overview

### 1.1 Parametric and stochastic PDEs

Partial differential equations (PDEs) are commonly used to model complex systems in a variety of physical contexts. When solving a given PDE, one typically fixes certain parameters: the shape of the physical domain, the diffusion or velocity field, the source term, the flux or reaction law, etc. We use the terminology parametric PDEs when some of these parameters are allowed to vary over a certain range of interest. When treating such parametric PDEs, one is interested in finding the solution for all parameters in the range of interest.

To describe such problems in their full generality, we adopt the formulation

 P(u,a)=0, (1.1)

where denotes the parameters, is the unknown of the problem, and

 P:V×X→W, (1.2)

is a linear or nonlinear partial differential operator, with a triplet of Banach spaces. We assume that the parameter ranges over a compact set , and that for any in this range there exists a unique solution to (1.1). This allows us to define the solution map

 u:a↦u(a), (1.3)

which acts from onto and is well defined over . We also define the solution manifold as the family

 M=u(A)={u(a):a∈A}, (1.4)

which gathers together all solutions as the parameter varies within its range.

One simple guiding example, which will be often used throughout this article, is the linear elliptic equation

 −div(a∇u) = f,on D, u = 0on ∂D, (1.5)

set on a Lipschitz domain . Here, we fix the right side as a real valued function and consider real valued diffusion coefficients as the parameter. The corresponding operator is therefore given by

 P(u,a)=f+div(a∇u). (1.6)

A possible choice for the triplet of spaces is then

 (X,V,W)=(L∞(D),H10(D),H−1(D)). (1.7)

Indeed, if , and , one then defines as an element of by requiring that

 ⟨P(u,a),v⟩=⟨f,v⟩+∫Da∇u⋅∇v,v∈V, (1.8)

where is the duality bracket between and . Lax-Milgram theory ensures the existence and uniqueness of a solution to (1.1) from , if for some , the diffusion satisfies the ellipticity condition

 a(x)≥r,x∈D. (1.9)

Therefore, a typical parameter range is a set , which in addition is assumed to be compact in .

Although elementary, the above example gathers important features that are present in other relevant examples of parametric PDEs. In particular, the solution map acts from an infinite dimensional space into another infinite dimensional space. Also note that, while the operator of (1.6) is linear both in and (up to the constant additive term ) the solution map is nonlinear. Because of the high dimensionality of the parameter space , such problems represent a significant challenge when trying to capture this map numerically. One objective of this article is to understand which properties of this map allow for a successful numerical treatment. Concepts such as holomorphy, sparsity, and adaptivity are at the heart of our development.

The solution map may also be viewed as a function

 (x,a)↦u(x,a) (1.10)

of both the physical variable and the parametric variable . The parametric variable has a particular status because its different instances are uncoupled: for any fixed instance , we may solve the PDE exactly or approximately and therefore compute for all values of , while ignoring all other values . This plays an important role for certain numerical methods which are based on solving the parametric PDE for different particular values of , since this task can be parallelized.

Parametric PDEs occur in a variety of modeling contexts. We draw the following major distinctions in their setting:

• Deterministic modeling: the parameters are deterministic design or control variables which may be tuned by the user so that the solution , or some quantity of interest , has prescribed properties. For instance, if the elliptic equation (1.5) is used to model the heat conduction in a component produced by an industrial process, one may want to design the material in order to minimize the heat flux on a certain part on the boundary , in which case the quantity of interest to be optimized is

 Q(u)=∫Γa∇u⋅n, (1.11)

where is the outer normal. This amounts to optimizing the function

 a↦F(a):=Q(u(a)). (1.12)

over .

• Stochastic modeling: the parameters are random variables with prescribed probability laws, which account for uncertainties in the model. Therefore has a certain probability distribution supported on . One is then typically interested in the resulting probabilistic properties of the solution , which is itself a random variable over with values in , or in the probabilistic properties of a quantity of interest . For instance, if the elliptic equation (1.5) is used to model oil or ground water diffusion, a common way to deal with the uncertainty of the underground porous media is to define as a random field with some prescribed law. Then one might want to estimate the mean solution

 ¯¯¯u:=E(u), (1.13)

or its variance

 V(u):=E(∥u−¯¯¯u∥2V), (1.14)

or the average flux through a certain interface , that is, with as in (1.11), or the probability that this flux exceeds a certain quantity, etc.

In both the deterministic and stochastic settings, the given application may require evaluating for a large number of instances of the parameter . This is the case, for instance, when using a descent method for optimizing a quantity of interest in the deterministic framework, or a Monte-Carlo method for evaluating an expectation in the stochastic framework. Since each individual instance is the solution of a PDE, its exact evaluation is typically out of reach. Instead, each query for is approximately evaluated by a numerical solver, which may itself be computationally intensive for high accuracy.

In order to significantly reduce the number of computations needed for attaining a prescribed accuracy, alternate strategies, commonly referred to as reduced modeling, have been developed. Understanding which of these strategies are effective, in the case where the parameter has large or infinite dimension, and why, is the subject of this article.

### 1.2 Affine representation of the parameters

So far our description of the set of parameters allows it to be any compact subset of . An important ingredient in both our theoretical and numerical developments is to identify any through a sequence of real numbers. We are especially interested in affine representations of . We say that a sequence of functions is an affine representer for , or representer for short, if we can write each as

 a=a(y)=¯¯¯a+∑j≥1yjψj,y:=(yj)j≥1,yj=yj(a), (1.15)

where the are real numbers, is a fixed function from , and the series converges in the norm of for each . We are making a slight abuse of notation here since we use to represent a general element of and also use to represent the map

 a:y↦a(y), (1.16)

from to . But the meaning will always be clear from the context.

It is easy to see that for any compact set in a Banach space affine representers exist. For example, if has a Schauder basis then any such basis will be a representer. Even if does not admit a Schauder basis, as is the case for our example , we can still find representers as follows. Choose any . Since is compact there exist finite dimensional spaces , with , such that

 dist(K,Xn)X:=supa∈Kminb∈Xn∥a−b∥X→0asn→∞. (1.17)

We can also take the spaces to be nested, that is

 Xn⊂Xn+1,n≥0. (1.18)

Let be any basis for and define . The sequence

 ψj:=ϕj−Nn,n,j=Nn,…,Nn+1, (1.19)

contains each of the bases for all . Given any , let be a best approximation in to from ,with . Then, we can write

 a=¯a+∞∑n=1(an−an−1). (1.20)

Each term is in and hence can be written as a linear combination of the . Therefore, is a representer for .

Affine representations (1.15) often occur in the natural formulation of the parametric problem. For instance, if the diffusion coefficient in (1.5) is piecewise constant over a fixed partition of the physical domain , then it is natural to set

 a(y)=¯¯¯a+d∑j=1yj\largeχDj, (1.21)

where is a constant and the are the characteristic functions of the subdomains . Similarly, if the parameter describes the shape of the boundary of the physical domain in a computer-aided design setting, a typical format is

 a(y)=¯¯¯a+d∑j=1yjBj, (1.22)

where represents a nominal shape and the are B-spline functions associated to control points. In these two examples, is finite, yet possibly very large.

In the statistical context, if is a second order random field over a domain , a frequently used choice in (1.15) is , the average field, and , the Karhunen-Loeve basis, that is, the eigenfunctions of the covariance operator

 v↦Rav:=∫DCa(⋅,x)v(x)dx,Ca(z,x):=E((a(z)−¯¯¯a(z))(a(x)−¯¯¯a(x)). (1.23)

Then, the resulting scalar variables are centered and uncorrelated, that is, and when .

Even if an affine representation of the form (1.15) is not given in the formulation of the problem, one can be derived by taking any representation system in the Banach space . For example, if admits a Schauder basis, then one can take any such basis for and arrive at such an expansion. In classical spaces , such as or Sobolev spaces, standard systems of approximation, such as Fourier series, splines, or wavelets can be used.

The advantage of the representation (1.15) is that can now be identified through the sequence . When considering all , we obtain a family of such sequences. Note that this family can be quite complicated. In order to simplify matters, we normalize the , so that for any ,

 supa∈A|yj(a)|=1. (1.24)

Such a renormalization is usually possible because is compact and depends continuously on . After this normalization, for each , the sequence belongs to the infinite dimensional cube

 U:=[−1,1]N. (1.25)

Notice that taking a general sequence from this cube, there may not be an with , . Also, if is not a basis , the representation (1.15) may not be unique. We define

 UA:={(yj)j≥1∈U:∑j≥1yjψj∈A}. (1.26)

We are mainly interested in representers , for which

 ¯a+∑j≥1yjψj (1.27)

converges in for each . We call such representers complete. In this case, we may define

 a(U):={a=a(y)=¯¯¯a+∑j≥1yjψj:(yj)j≥1∈U}, (1.28)

so that

 A⊂a(U). (1.29)

A typical case of a complete representer is when is a sequence in .

Once an affine representation has been chosen, the initial solution map becomes equivalent to the map which is defined on . With an abuse of notation, we write this new solution map as

 y↦u(y):=u(a(y)). (1.30)

This is a Banach space valued function of an infinite number of variables. Note that in the case where the affine representation has a finite number of terms, the range of is . However, the infinite dimensional case subsumes the finite dimensional case, since the latter may be viewed as a particular case with for .

In the case of a complete representer, is defined on all of . However, we do not know whether the solution map is defined on all of . To guarantee this, the following assumption will be used often.

Assumption A: The parameter set has a complete representer and the solution map is well defined on the whole set , or equivalently the solution map is well defined on the whole set .

This assumption naturally holds when the set is exactly defined as .

### 1.3 Smoothness of the solution map

One objective of this article is to develop efficient numerical approximations to the solution maps of (1.3) or (1.30). One of the main difficulties is that these maps are high or infinite dimensional, in the sense that the dimension of the variable or is high or infinite. In order to understand what might be good strategies for constructing such approximations, we need first to understand the inherent properties of these maps that might allow us to circumvent this difficulty.

We initiate such a program in §2, where we first analyze the smoothness of the solution map . In the case of the elliptic equation (1.5), it is easily seen that this map is not only infinitely differentiable, but also admits a holomorphic extension to certain subdomains of the complex valued . We propose two general approaches which allow us to establish similar holomorphy properties for other relevant instances of linear and nonlinear parametric PDEs. One first approach is based on the Ladyzenskaia-Babushka-Brezzi theory. It applies to a range of linear PDEs where the operator and the right hand side have holomorphic dependence in . These include parabolic and saddle-point problems, such as the heat equations or the Stokes problem, with parameter in the diffusion term, similar to (1.5). One second approach is based on the implicit function theorem in complex valued Banach spaces. In contrast to the first approach, it can be applied to certain nonlinear PDEs.

Using the affine representation (1.15) of , we then study the solution map under Assumption A, which means that it is defined on the whole of . In addition to holomorphy, an important property of can be extracted from the affine representation (1.15). The functions appearing in (1.15) have norms of varying size. Since the variable is scaled to be in , when is small, this variable has a reduced effect on the variations of . Thus the variables are not democratic, but rather they have varying importance. In other words, the map is highly anisotropic. More specifically, we derive holomorphic extension results for this map on certain multivariate complex domains of tensor product type. In particular, we consider polydiscs of the general form

 Uρ:=⊗{|zj|≤ρj}={z=(zj)j≥1∈CN:|zj|≤ρj} (1.31)

where is a positive sequence which serves to describe the anisotropy of the solution map. We also consider polyellipses which deviate less far from the real axis. These holomorphy domains play a key role in the derivation of approximation results.

###### Remark 1.1

While we are generally interested in real valued solutions to the parametric PDE (1.1), corresponding to real valued parameters or , our analysis of holomorphic smoothness leads us naturally to complex valued solutions, corresponding to complex valued parameters. For this reason, the spaces are always assumed to be complex valued Banach spaces throughout this paper.

### 1.4 Approximation of the solution map

Reduced modeling methods seek to take advantage of the properties of the solution maps or such as the holomorphy and anisotropy mentioned above. These properties suggest strategies for approximating these map by simple functions in which the physical variables and the parametric variable or are separated and hence take the form

 (x,a)↦un(x,a):=n∑i=1vi(x)ϕi(a), (1.32)

or

 (x,y)↦un(x,y):=n∑i=1vi(x)ϕi(y), (1.33)

where are functions of living in the solution space and are functions of or with values in or .

We may view as a rank approximation to , in analogy with low rank approximation of matrices. We adopt the notations

 a↦un(a)=un(⋅,a)andy↦un(y)=un(⋅,y), (1.34)

for the above approximations.

Let us discuss the potential accuracy of separable approximations of the form (1.32). If our objective is to capture for all with a prescribed accuracy , this means that we search for an error bound in the uniform sense, i.e., of the form

 ∥u−un∥L∞(A,V):=supa∈A∥u(a)−un(a)∥V≤ε(n). (1.35)

For certain applications, in particular in the stochastic framework, we may instead decide to measure the error on average, for instance by searching for an error bound in the mean square sense,

 E(∥u−un∥2V):=∥u−un∥2L2(A,V,μ)=∫A∥u(a)−un(a)∥2Vdμ(a)≤ε(n)2, (1.36)

where is the probability measure for the distribution of over . Since is a probability measure, one has for any ,

 ∥v∥L2(A,V,μ)≤∥v∥L∞(A,V). (1.37)

Therefore the uniform bound is stronger than the average bound, in the sense that (1.35) implies (1.36) with the same value of .

Likewise, for the approximation of the map , we may search for a uniform bound

 ∥u−un∥L∞(UA,V):=supy∈UA∥u(y)−un(y)∥V≤ε(n) (1.38)

or a mean square bound

 E(∥u−un∥2V):=∥u−un∥2L2(UA,V,μ)=∫UA∥u(y)−un(y)∥2Vdμ(y)≤ε(n)2, (1.39)

where is the probability measure for the distribution of over .

###### Remark 1.2

We do not indicate the measure in our notation or , since in all relevant examples considered in this article we always consider the exact supremum over in or over in , rather than the essential supremum.

For any , the approximation belongs to

 Vn:=span{v1,…,vn}, (1.40)

which is a fixed -dimensional subspace of . Ideal benchmarks for the performance of separable expansions of the form (1.32) may thus be defined by selecting optimal -dimensional spaces for the approximation of in either a uniform or an average sense.

For uniform error bounds, this benchmark is given by the concept of Kolmogorov’s -width, which is well known in approximation theory. If is a compact set in a Banach space , we define its Kolmogorov -width as

 dn(K)V:=infdim(Vn)≤nsupv∈ÊKminw∈Vn∥v−w∥V. (1.41)

This quantity, first introduced in [57], describes the best achievable accuracy, in the norm of , when approximating all possible elements of by elements from a linear -dimensional space . Obviously, the optimal choice of for approximation of in a uniform sense corresponds to the space, if it exists, that reaches the above infimum when is taken to be the solution manifold . The best achievable error in the uniform sense is thus given by

 ε(n):=dn(M)V. (1.42)

There exist many other notions of widths that are used to measure the size of compact sets. Here we only work with the above one and refer to [74] for a more general treatment.

For mean square bounds, in the case where is a Hilbert space, the corresponding benchmark is related to the concept of principal component analysis, which is of common use in statistics. By choosing an arbitrary orthonormal basis of , we may expand according to

 u(a)=∑i≥0ziei,zi=zi(a):=⟨u(a),ei⟩V. (1.43)

Approximation of from an -dimensional space of is then equivalent to the approximation of from an -dimensional space of . The optimal space is then obtained through the study of the correlation operator

 R=(Ri,j)i,j≥0,Ri,j:=E(zi(a)zj(a))=∫a∈Azi(a)zj(a)dμ(a). (1.44)

This operator is symmetric, positive, compact and of trace class. It therefore admits an orthonormal basis of eigenvectors associated to a positive, non-increasing and summable sequence of eigenvalues. The space

 Gn:=span{g1,…,gn}, (1.45)

minimizes over all -dimensional spaces the mean square error between and its orthogonal projection , with

 E(∥z−PGnz∥2ℓ2)=∑k>nλk. (1.46)

In turn, the optimal space is spanned by the functions

 vk:=∑i≥0gk,iei,k=1,…,n, (1.47)

where is the -th component of , that is, . The best achievable error in the mean square sense is thus given by

 ε(n)2:=∑k>nλk. (1.48)

Note that, in view of (1.37), one has the comparison

 ∑k>nλk≤dn(M)2V. (1.49)

The above described optimal spaces are usually out of reach, both from an analytic and computational point of view, and it is therefore interesting to consider sub-optimal approximations. In addition, when considering the solution map , the tensor product structure of allows us to consider approximations based on further separation between the parametric variables, that is, where each function in (1.33) is itself a product of univariate functions of the different . The simplest example of such approximations are multivariate polynomials, which have the general form

 un(x,y)=∑ν∈Λnvν(x)yν,yν:=∏j≥1yνjj, (1.50)

where each index is a finitely supported sequence of positive integers, or equivalently such that , and is a set of such sequences.

In §3, we obtain such polynomial approximations by taking finite portions of infinite polynomial expansions of . Here, we work under Assumption A, which means that is defined on the whole of . We consider two types of expansions:

• Power series the form

 u(y)=∑ν∈Ftνyν, (1.51)

where is the set of all finitely supported sequence of positive integers.

• Orthogonal series of the form

 u(y)=∑ν∈FwνPν(y),Pν(y):=∏j≥1Pνj(yj), (1.52)

where is the Legendre polynomial of degree defined on .

Using the holomorphy and anisotropy properties of the solution map established in §2 for specific classes of parametric PDEs, we derive a priori bounds on the -norms and of the coefficients which appear in these expansions. In this way, we are able to establish algebraic convergence rates for certain truncations of the above expansions, where is the cardinality of the truncation set .

One critical aspect of the truncation strategy is that we retain the largest coefficients, which is a form of nonlinear approximation, also known as sparse or best -term approximation. With such a choice for , the exponent in the convergence rate is related to the available summability for of the -norms of the coefficients in the considered infinite expansion. In particular, for uniform approximation estimates, that is, in , one has

 s=1p−1, (1.53)

once the -summability of these -norms has been proven. The main result from §3 shows that, under suitable assumptions, the summability of the sequence implies that the norms of the coefficients in the expansions (1.51) or (1.52) are also summable.

The fact that we obtain the algebraic convergence rate despite the infinite dimensional nature of the variable reveals that the curse of dimensionality can be avoided in the approximation of relevant parametric PDEs.

### 1.5 The n-widths of the solution manifold M

In both the uniform or mean square ways of measuring error, as described above, the success of reduced modeling can be proven if converges to zero sufficiently fast as . Thus, the study of these widths constitutes a major subject in the theoretical justification of reduced modeling.

A common way to measure the widths of compact sets in classical spaces is to embed into an appropriate smoothness space such as a Sobolev or Besov space. For example, in our model parametric elliptic equation (1.5), the space is , and this approach would lead us to examine the Sobolev smoothness of the individual functions for . Classical elliptic regularity theory says that for smooth domains the smoothness of can be inferred from the smoothness of the right side . However, for general domains, there are severe limits on this regularity due to the irregularity of the boundary. Therefore, bounding the decay of widths of through such regularity results will generally prove only slow decay for and therefore is not useful for obtaining the fast decay rates we seek. Indeed, recall, that classical smoothness spaces, such as Sobolev or Besov spaces of order in variables, have widths that decay, at best, like as when these widths are measured in an norm for some . For example, it is known that if is the unit ball of , then its -width in satisfies

 cdn−m/d≤dn(K)L∞≤Cdn−m/d,n≥1. (1.54)

Likewise, if is the unit ball of , then its -width in satisfies

 cdn−(m−1)/d≤dn(K)H1≤Cdn−(m−1)/d,n≥1. (1.55)

Also note that the regularity of , as measured by membership in Sobolev and Besov spaces, is closely related to the performance of piecewise polynomial approximation such as that used in finite element methods and is the reason why these algorithms have rather slow convergence. So the fast decay of to zero cannot be obtained by such an approach.

The widths go to zero fast not because the individual elements in are smooth in the physical variable, but rather because is the image of the solution map which is, as previously discussed, smooth and anisotropic in the parametric variable. However, let us remark that it is by no means trivial to deduce fast decay for from this fact alone, because of what is called the curse of dimensionality. Namely, approximation rates for a given target function are generally proved by showing that the target function, in our case the function , has sufficiently high regularity. But, in high dimensions, regularity by itself is usually not enough. Indeed, returning to the bounds (1.55), we see that the large dimension affects the approximation rate in two detrimental ways. The exponent in the smoothness rate is divided by and the constants are known to grow exponentially with increasing . In our case can even be infinite and this makes the derivation of approximation rates a subtle problem.

In §4, we discuss general principles for estimating by above the -widths of the solution manifold . One immediate consequence of the approximation results in §3 is that, for specific classes of parametric PDEs, when Assumption AÊ holds and , then these widths decay at least like with given by (1.53). We extend this analysis to the general case when Assumption AÊ does not necessarily hold. Since the manifold is not directly accessible, one would like to understand what properties of , which is assumed to be completely known to us, imply decay rates for . We show that the asymptotic decay of the -width of in is related to that of the -width of in . This follows from general results on widths of images of compact sets under holomorphic maps. Namely, if is holomorphic in a neighborhood of a general compact set , then we prove the following result on the -width of the image in :

 supn≥1nrdn(A)X<∞⇒supn≥1nsdn(M)V<∞,s

This result shows that from the view point of preserving -widths, holomorphic maps behave almost as good as linear maps, up a loss of in the convergence rate. One open problem is to understand if this loss is sharp or could be improved.

### 1.6 Numerical methods for reduced modeling

The above mentioned results on holomorphic extensions, sparse expansions for , and -widths of the manifold , can be thought of as theoretical justifications for the role of reduced modeling in solving parametric and stochastic equations. They provide evidence that reduced modeling numerical methods should yield significant computational savings over traditional methods such as finite element solvers for parametric problems or Monte Carlo methods for stochastic problems. However, they do not constitute actual numerical methods.

The second part of our article turns to the construction of numerical algorithms motivated by these theoretical results. Such algorithms compute specific separable approximations of the form (1.32) or (1.33) for a given value of at an affordable computational cost. Our objective, in this regard, is not to give an exhaustive description of numerical reduced modeling methods and their numerous variants. Rather, our main focus is to introduce some important representative examples of these methods for which an a priori analysis quantifies the gains in numerical performance of these methods.

One important distinction is between non-adaptive and adaptive methods. In the first ones, the choice of the functions or of used in (1.32) or (1.33), for a given value of is made in an a priori manner, if possible using the available information on the problem. In the second ones, the computations executed for lower values of are exploited in order to monitor the choice at stage . One desirable feature of an adaptive algorithm is that it should be incremental or greedy: only one new function or is added at each stage to the previously selected functions which are left unchanged. Adaptive algorithms are known to often perform better than their non-adaptive counterpart, but their convergence analysis is usually more delicate.

A second important distinction between the various numerical methods is whether they are non-intrusiveÊ or intrusive. A non-intrusive algorithm builds on an existing exact or approximate solver for the PDE which may be computationally expensive. It derives approximations of the form (1.32) or (1.33) by choosing instances or and using the values or computed by the solver. Non-intrusive algorithms may be implemented even when this solver is a black box, and therefore with a possibly limited knowledge on the exact PDE model. An intrusive algorithm, on the other hand, directly exploits the precise form of the PDE for computing the approximation (1.32) or (1.33), and therefore requires the full knowledge of the PDE model for its implementation.

It should be noted that instances as well as the functions used in (1.32) or (1.33) can only be computed with a certain level of spatial discretization, for example resulting from a finite element solver. In such a case they belong to a finite dimensional space . If the same finite element space is used to discretize all instances up to a precision that is satisfactory for the user, this means that we are actually trying to capture the approximate solution maps

 a↦uh(a)∈Vh, (1.57)

or

 y↦uh(y)∈Vh. (1.58)

The analysis of the performance of numerical reduced modeling methods needs to incorporate the additional error produced by this discretization.

### 1.7 Sparse polynomial approximation algorithms

One first class of methods that we analyze consists in finding a numerically computable polynomial approximation of the form (1.50). There are two major tasks in constructing such a numerical approximation: (i) find good truncation sets and (ii) numerically compute an approximation to the coefficients for each . By far, the most significant issue in numerical methods based on polynomial expansions is to find a good choice for the sets . If everything was known to us, we would simply take for the set of indices corresponding to the largest -norms of the coefficients in (1.51) or (1.52). However, finding such an optimal would require in principle that we compute the coefficients for all values which is obviously out of reach. In addition, the structure of the optimal set can be quite complicated. One saving factor is that our analysis in §3 gives a priori bounds on the size of these coefficients. These bounds can be used in order to make an a priori selection of the sets . This is a non-adaptive approach, which generally gives suboptimal performance due to the possible lack of sharpness in the a priori bounds. This leads one to try to enhance performance by combining the a priori bounds together with an adaptive selection of the sets .

The numerical methods are facilitated by imposing that the selected index sets are downward closed (or lower sets), i.e. satisfy the following property:

 ν∈Λnand~ν≤ν⇒~ν∈Λn, (1.59)

where means that for all .

In §6, we discuss algorithms which compute the polynomial approximation by an interpolation process. These algorithms are non-intrusive and apply to a broad scope of problems. One key issue is the choice of the interpolation points, which is facilitated by the following result: if is a sequence of pairwise distinct points in and if , then for any downward closed set , any polynomial of the form (1.50) is uniquely characterized by its value on the grid . This allows us to construct the interpolation in a hierarchical manner, by simultaneously incrementing the polynomial space and the interpolation grid. The sets can either be a priori chosen based on the bounds on the coefficients established in §3 or adaptively generated. These sets generally differ from the ideal sets corresponding to the largest coefficients (which may not fullfill the downward closedness property). We show that certain choices of the univariate sequence , known as Leja or -Leja points lead to stable interpolation processes (in the sense that the Lebesgue constant have moderate growth with the number of points) allowing us to retrieve by interpolation the same algebraic convergence rate which are proved for the best polynomial approximations.

In §7, we discuss another class of algorithms which recursively compute the exact coefficients in the Taylor series of the approximate solution map (1.58). These algorithms are intrusive and they only apply to problems where is linear both in and up to a constant term, such as the elliptic equation (1.5) which serves a guiding example. The recursive computation is facilitated by imposing that the index sets in the truncated Taylor expansion are downward closed. Similar to the interpolation algorithm from §6, the sets can either be a priori chosen based on the available bounds on the coefficients established in §3, or adaptively generated. One main result shows that adaptive algorithms based on a so-called bulk chasing procedure have the same convergence rate as the one which is established when using the index sets corresponding to the largest Taylor coefficients.

### 1.8 Reduced basis algorithms

The second class of methods that we analyze seeks to find, in an offline stage, a set of functions for which the resulting -dimensional space is close to an optimal linear -dimensional approximation space. For mean square estimates, one such approach, known as proper orthogonal decomposition, consists in building the functions based on an approximation of the exact covariance operator (1.44) computed from a sufficiently dense sampling of the random solution . Another approach, which targets uniform estimates, is the reduced basis method, which consists in generating by a selection of particular solution instances chosen from a very large set of potential candidates. In both cases, the offline stage is potentially very costly.

Once such a space is chosen, one builds an online solver, such that for any given , the approximate solution is an element from . There are several possibilities on how to build this online solver. The most prominent of these is to take the Galerkin projection of onto which consist of finding by solving the system of equations

 ⟨P(un(a),a),w⟩=0,w∈Vn (1.60)

for a suitable duality product . This online computation determines for each the values for which appears in (1.32). The advantage of using the Galerkin solver, is that, for certain problems such as elliptic ones, it is known to give the best error in approximating by elements of when error is measured in the norm of . Its disadvantage its computational cost in finding given the query . For this reason other projections onto are also studied, some of these based on interpolation.

The key issue when using reduced basis methods is how to find a good space , i.e. how to find good basis functions. In §8, we discuss an elementary greedy strategy for the offline selection of the instances , that consists in picking the -th instance which deviates the most from the space generated from the previously selected ones. The approximation error

 σn(M):=supv∈Minfw∈Vn∥v−w∥, (1.61)

produced by such spaces may be significantly larger than the ideal benchmark of the -width of the solution manifold for a given value of . However, a striking result is that both are comparable in terms of rate of decay: for any , there is a constant such that

 supn≥1nsσn(M)≤Cssupn≥1nsdn(M)V. (1.62)

Similar results are established for exponential convergence rates.

While both classes of numerical methods aim to construct separable approximations of the form (1.32) or (1.33), there is a significant distinction between them in the way they organize computation. For the first class of polynomial approximation methods, the offline stage fixes the polynomial functions through the selection of the set and computes the coefficients . Then the online stage is in some sense trivial since it simply computes through the linear combination (1.50). For the second class, the online stage still requires solving PDE approximately in the chosen reduced space . This offline/online splitting makes it difficult to draw a fair comparison between the different methods from the point of view of computational time vs accuracy.

Notation for constants: Numerous multiplicative constants appear throughout this paper, for example in convergence estimates. We use the generic notation , which may therefore change value between different formulas, and if necessary we indicate the parameters on which depends. We use a more specific notation if we want to express the dependence of the constant with respect to a cetain parameter (for example the dimension in (1.55)) or if we want to refer to this specific constant later in the paper.

### 1.9 Historical orientation

Numerical methods for parametric and stochastic PDEs using polynomials (or other approximation tools) in the parametric variable have been widely studied since the 1990’s. We refer in particular to [40, 46, 55, 56, 90] for general introductions to these approaches, and to [1, 41, 37, 53, 54, 83, 84] for related work prior to the results exposed in our paper.

The approximation results presented in §3 have been obtained by the authors and their co-authors in a series of paper [23, 24, 19], and the results in §4 on the evaluation of -widths are from [25]. These results establish for the first time convergence rates immune to the curse of dimensionality, in the sense that they hold with infinitely many variables, see also [44] for a survey dealing in particular with these issues. In a similar infinite dimensional framework, and not covered in our paper, let us mention the following related works: (i) similar holomorphy and approximation results are established in [47, 48, 58] for specific type of PDEs and control problems, (ii) approximation of integrals by quadratures is discussed in [60, 61], (iii) inverse problems are discussed in [78, 79, 82], following the Bayesian perspective from [87], and (iv) diffusion problems with lognormal coefficients are treated in [50, 43, 45].

The sparse interpolation method presented in §6 is introduced and studied in [18]. See also [2, 4, 71, 72] for related work on collocation methods. Other non-intrusive methods are based on least-square regression as discussed in [16, 34, 36, 69], or on pseudo-spectral approximation as discussed in [91, 26]. The Taylor approximation algorithm presented in §7 is introduced and studied in [17]. Other intrusive methods based on Galerkin projection are discussed in [4, 5, 23, 42].

Reduced basis methods have been studied since the 1980’s [73]. The greedy algorithms presented in §8 have been introduced and discussed in [89, 66, 67, 68, 88], and their convergence analysis was given in [6] and [32]. We refer to [52] for a general introduction on the related POD method, which is not discussed in our paper.

## 2 Holomorphic extensions

In this section, we discuss smoothness properties of the solution maps and which are central to the development of efficient numerical methods that are immune to the curse of dimensionality. We show that, under suitable assumptions on the parametric PDE, these maps admit holomorphic extensions to certain complex domains. Recall that a map from a complex Banach space to a second complex Banach space is said to be holomorphic on an open set if for each , has a Frechet derivative at . Here is a linear operator mapping to such that

 ∥F(x+h)−F(x)−dF(x)h∥Y=o(∥h∥X),h∈X. (2.1)

### 2.1 Extension of a↦u(a) for the model elliptic equation

In order to formulate results on holomorphy, we need to introduce existence-uniqueness theory for solutions to (1.1) in the case that is complex valued.

We begin with our guiding example of the elliptic equation (1.5). We now consider

 (X,V,W)=(L∞(D),H10(D),H−1(D)) (2.2)

as spaces of complex valued functions, and extend the standard variational formulation to such spaces in a straightforward manner: for a given , and with , find such that

 ∫Da∇u⋅∇v=⟨f,v⟩W,V,v∈V, (2.3)

where in the left integrand,

 ∇u⋅∇v:=m∑i=1∂xiu¯¯¯¯¯¯¯¯¯∂xiv, (2.4)

is the standard hilbertian inner product, and is the anti-duality pairing between and , which when is given by the hilbertian inner product

 ⟨f,v⟩=∫Df¯¯¯v. (2.5)

We recall that

 ∥v∥V:=∥∇v∥L2(D), (2.6)

and

 ∥v∥W:=sup{⟨v,w⟩:∥w∥V≤1}. (2.7)

This is therefore a particular case of a linear problem with the following general variational formulation. Let denote the set of all sesquilinear forms defined on and let be the set of all antilinear functionals defined on , i.e., is the antidual of . We define the following norm on :

 ∥B∥:=sup∥v∥V≤1, ∥w∥V≤1|B(v,w)|. (2.8)

Problem: Given and , find such that

 B(u,v)=L(v),∀v∈V. (2.9)

The existence-uniqueness theory for such problems can be proven from the complex version of Lax-Milgram theorem given in Theorem 2.1. This theorem is a particular case of Theorem 2.2 proved below. To formulate this theorem and for later use, we introduce the notation for the space of all linear operators mapping the Banach space into the Banach space with its usual norm

 ∥T∥L(X,Y):=supx∈X∥Tx∥Y∥x∥X. (2.10)

Given , one has that is an anti-linear functional and hence for any , there is a such that

 B(u,v)=⟨Bu,v⟩W,V,v∈V, (2.11)

where is the anti-duality pairing between and . Therefore, is a linear operator from into and its norm is the same as that of :

 ∥B∥L(V,W)=∥B∥. (2.12)

So the operator is bounded and hence continuous. The problem (2.9) is equivalent to the equation

 Bu=L, (2.13)

set in . With this notation and remarks in hand, we can now state the complex version of the Lax-Milgram theorem.

###### Theorem 2.1

Assume that, is a sesquilinear form on such that

 |B(u,u)|≥α∥u∥2V,u∈V, (2.14)

for some . Then, is is invertible and its inverse satisfies

 ∥B−1∥L(W,V)≤1α. (2.15)

Thus, for each , the problem (2.9) has a unique solution which satisfies the a priori estimate

 ∥uL∥V≤∥L∥Wα. (2.16)

For the particular and given by the left and right side of (2.3), the ellipticity condition (2.14) holds with under the assumption

 R(a(x))≥r,x∈D, (2.17)

since the latter implies, for all ,

 |B(v,v)|≥R(B(v,v))=∫DR(a)|∇v|2≥r∥∇v∥2L2=r∥v∥2V. (2.18)

Therefore, for any we may extend the solution map of the elliptic problem (1.5) to the complex domain

 Dr:={a∈X:R(a)≥r}, (2.19)

with the uniform bound

 ∥u∥L∞(Dr,V)=sup{∥u(a)∥V:a∈Dr}≤∥f∥Wr. (2.20)

This extension is therefore defined on the open set .

The fact that this map is holomorphic immediately follows by viewing it as a chain of holomorphic maps: introducing for any the operator acting from into , we can decompose into the chain of maps

 a↦B(a)↦B(a)−1↦B(a)−1f=u(a). (2.21)

The first and third maps are continuous linear and therefore holomorphic, from into and from onto respectively. The second map is the operator inversion which is holomorphic at any invertible .

For further purposes, it is interesting to compute the Frechet complex derivative for the elliptic problem (1.5). Fix an and let be such that . Then, the solution map is also defined at . Substracting the variational formulations (2.3) for and , we find that

 ∫Da∇(u(a+h)−u(a))⋅∇v=−∫Dh∇u(a+h)⋅∇v. (2.22)

We first use this identity to obtain a Lipschitz continuity bound: by taking and taking the real part of both sides, we find that

 r∥u(a+h)−u(a)∥2V≤∥h∥X∥u(a+h)∥V∥u(a+h)−u(a)∥V≤2∥f∥Wr∥h∥X∥u(a+h)−u(a)∥V,

and therefore

 ∥u(a+h)−u(a)∥V≤C∥h∥X,C:=2∥f∥Wr2. (2.23)

We next show that can be defined as the solution to the problem

 ∫Da∇w⋅∇v=−∫Dh∇u(a)⋅∇v,v∈V, (2.24)

which is well-posed in the sense of the above Lax-Milgram theorem. Indeed, on the one hand, the dependence of on is linear and it is continuous because taking we find that

 ∥w∥V≤C∥h∥X,C:=∥u(a)∥Vr. (2.25)

On the other hand, the remainder is the solution to

 ∫Da∇g⋅∇v=∫Dh(∇u(a)−∇u(a+h))⋅∇v,v∈V, (2.26)

which by taking and using (2.23) gives the quadratic bound

 ∥g∥V≤1r∥h∥X∥u(a)−u(a+h)∥V≤C∥h∥2X,C:=2∥f∥Wr3. (2.27)

This confirms that .

### 2.2 Extensions by the Ladyzhenskaya-Babushka-Brezzi theory

Our next goal is to treat more general linear parametric problems that are not necessarily elliptic. In particular, we have in mind parabolic problems such as the heat equation, or saddle points problems such as the Stokes equations. In order to formulate this general class of problems, we suppose that and are two complex Hilbert spaces with inner products and , respectively. So, for example, for every ,