A generalized sampling and preconditioning scheme for sparse PCE approximation

A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions

John D. Jakeman Akil Narayan  and  Tao Zhou
Abstract.

In this paper we propose an algorithm for recovering sparse orthogonal polynomials using stochastic collocation. Our approach is motivated by the desire to use generalized polynomial chaos expansions (PCE) to quantify uncertainty in models subject to uncertain input parameters. The standard sampling approach for recovering sparse polynomials is to use Monte Carlo (MC) sampling of the density of orthogonality. However MC methods result in poor function recovery when the polynomial degree is high. Here we propose a general algorithm that can be applied to any admissible weight function on a bounded domain and a wide class of exponential weight functions defined on unbounded domains. Our proposed algorithm samples with respect to the weighted equilibrium measure of the parametric domain, and subsequently solves a preconditioned -minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. We present theoretical analysis to motivate the algorithm, and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. Numerical examples are also provided that demonstrate that our proposed Christoffel Sparse Approximation algorithm leads to comparable or improved accuracy even when compared with Legendre and Hermite specific algorithms.

John D.Jakeman. Computer Science Research Institute, Sandia National Laboratories, 1450 Innovation Parkway, SE, Albuquerque, NM 87123
Akil Narayan. Scientific Computing and Imaging (SCI) Institute and Mathematics Department, University of Utah, 72 S Central Campus Drive, Salt Lake City, UT 84112. A. Narayan was partially supported by AFOSR FA9550-15-1-0467 and DARPA N660011524053.
Tao Zhou. Institute of Computational Mathematics and the Chinese Academy of Sciences, Beijing, China. T. Zhou work was supported the National Natural Science Foundation of China (Award Nos. 91130003 and 11571351).

1. Introduction

Quantifying the effect of uncertain model parameters on model output is essential to building the confidence of stakeholders in the predictions of that model. When a simulation model is computationally expensive to run, building an approximation of the response of the model output to variations in the model input, is often an efficient means of quantifying parametric uncertainty. In this paper we consider the polynomial approximation of a function (model) where is a finite-dimensional random variable with associated probability density function . Specifically we will approximate with a generalized Polynomial Chaos Expansion (PCE) which consists of an polynomial basis whose elements are orthogonal under the weight  [21, 47].

The stochastic Galerkin [21, 47] and stochastic collocation [1, 46] methods are the two main approaches for approximating PCE coefficients. In this paper we focus on stochastic collocation because it allows the computational model to be treated as a black box. Stochastic collocation proceeds in two steps: (i) running the computational model with a set of realizations of the random parameters ; and (ii) constructing an approximation of corresponding model output . Pseudo-spectral projection [12, 13], sparse grid interpolation [2, 8, 25], orthogonal least interpolation [35], least squares [32, 45] are stochastic collocation methods which have effectively used polynomial approximations in many situations.

Recently compressed sensing via -minimization techniques [10, 11, 14, 15] have been shown to be an effective means of approximating PCE coefficients from small number of function samples [6, 17, 24, 38, 50]. These methods are most effective when the number of non-zero terms in the PCE approximation of the model output is small (i.e. sparsity) or the magnitude of the PCE coefficients decays rapidly (i.e. compressibility). The efficacy of sparse polynomial approximation, however, is heavily reliant on the sampling strategy used to generate the random variable ensemble . The most common approach is to draw samples from the probability measure of the random variables, however the accuracy of the function approximation decreases as the polynomial degree increases. To improve sparse recovery for high-degree polynomials the authors of [42] proposed sampling from the Chebyshev distribution and applying an appropriate preconditioning to the polynomial Vandermonde-type matrix. This method, which we will here-after refer to as the asymptotic sampling method for bounded variables, however can only be applied to polynomials which are orthogonal to bounded random variables and moreover, theoretical and numerical results [49] have shown that the accuracy of the procedure proposed degrades with increasing parameter dimension.

Recently the authors of [23] developed a sampling strategy that attempts to overcome the limitations of the probabilistic sampling and Chebyshev sampling methods. Their so called coherence optimal sampling strategy, can be applied to a large class of orthogonal polynomials and performs well for low and high-degree polynomials and low-and high-dimensional parameter dimensions. The scheme uses Markov Chain Monte Carlo (MCMC) sampling to draw samples from a measure that attempts to minimize the coherence parameter of the orthogonal polynomial system

In this paper we present another general sampling strategy for accurate sparse recovery of orthogonal polynomials, which we coin that we call Christoffel Sparse Approximation (CSA). The CSA algorithm is applicable for bounded and unbounded domains, with tensor-product or more general non-tensor-product weights and domains. The algorithm is based on a similar algorithm for discrete least-squares approximation that we introduced in [34].

The essential idea of the CSA algorithm is to sample from a certain equilibrium measure that is associated with the random parameter density and state space. We use the Christoffel function to generate preconditioning weights which are then used to solve a standard weighted -minimization problem. In Section 3 we present this algorithm, the Christoffel Sparse Approximation (CSA) method and give various formula for the sampling density.

In section 4 we prove that the CSA-based algorithm can successfully recover univariate sparse and compressible solutions. We show that for the bounded unvariate domains, we can accomplish this with the optimal samples. For unbounded univariate domains, we pay an additional penalty, requiring , where is the maximum polynomial degree in the basis. Although we only present theoretical results for univariate -term approximation using CSA, the numerical results presented in Section 6 demonstrate the CSA algorithm performs comparably or better than existing -minimization algorithms in a number of practical settings, including high-dimensional multivariate settings.

Finally, we note that our results are useful outside of the context of this paper: For example, the 3 parts of Theorem 5 are used in quantifying sampling count criterion for recovery when randomly sub-sampling from a tensor-product Gauss quadrature grid. This idea, using our results, is explored in [22], and builds on the idea presented in [44].

2. Background

2.1. Polynomial chaos expansions

Polynomial Chaos methods represent both the model inputs and model output as an expansion of orthonormal polynomials of random variables . Specifically we represent the random inputs as

(1)

and the model output as

(2)

where for we use the notation . We refer to (1) and (2) as polynomial chaos expansions (PCE). The PCE basis functions are tensor products of orthonormal polynomials which are chosen to be orthonormal with respect to the distribution of the random vector . That is

where is the range of the random variables.

The random variable (germ) of the PCE is typically related to the distribution of the input variables. For example, if the one-dimensional input variable is uniform on then is also chosen to be uniform on [-1,1] and are chosen to be Legendre polynomials such that . For simplicity and without loss of generality, we will assume that has the same distribution as and thus we can use the two variables interchangeably (up to a linear transformation which we will ignore).

Any function can be represented by an infinite PCE, and this expansion converges in the sense under relatively mild conditions on the distribution of [20]. However in practice an infinite PCE must be truncated to a form like (2). The most common approach is to set a degree and retain only the multivariate polynomials of degree at most . Rewriting (2) using the typical multi-dimensional index notation

(3)

the total degree basis of degree is given by

(4)

Here we denote the space of polynomials of degree at most by . The number of terms in this total degree basis is

The rate of convergence is dependent on the regularity of the response surface. If is analytical with respect to the random variables then (2) converges exponentially with the polynomial degree in -sense [5].

2.2. -minimization

The coefficients of a polynomial chaos expansion can be approximated effectively using -minimization. Specifically, given a small set of unstructured realizations , with corresponding model outputs , we would like to find a solution that satisfies

where denotes the vector of PCE coefficients and denotes the Vandermonde matrix with entries .

When the model is high-dimensional and computationally expensive, the number of model simulations that can be generated is typically much smaller than the number of unknown PCE coefficients, i.e . Under these conditions, finding the PCE coefficients is ill-posed and we must impose some form of regularization to obtain a unique solution. One way to regularize is to enforce sparsity in a PCE.

A polynomial chaos expansion is defined as -sparse when , i.e. the number of non-zero coefficients does not exceed . Under certain conditions, -minimization provides a means of identifying sparse coefficient vectors from a limited amount of simulation data111In practice, not many simulation models will be truly sparse, but PCE are often compressible, that is the magnitude of the coefficients decays rapidly or alternatively most of the PCE variance is concentrated in a few terms. Compressible vectors are well-approximated by sparse vectors and thus the coefficients of compressible PCE can also be recovered accurately using -minimization.. Specifically -minimization attempts to find the dominant PCE coefficients by solving the optimization problem

(5)

where is a diagonal matrix with entries chosen to enhance the recovery properties of -minimization, and is a noise/tolerance that allows the data to slightly deviate from the PCE. This -minimization problem is often referred to as Basis Pursuit Denoising (BPDN). The problem obtained by setting , to enforce interpolation, is termed Basis Pursuit.

2.3. Near-optimal sparse recovery

For simplicity we first consider the situation when the function is assumed to be exactly -sparse in the PCE basis . In this case we take the noise tolerance in (5) to be zero, so that we are enforcing exact interpolation. The intent of the computational optimization problem (5) is to recover a sparse representation. Such a representation could be recovered by solving the more difficult analytical problem

(6)

This -hard problem is not computationally feasible, so the above optimization is usually not performed. However, if the matrix satisfies the restricted isometry property (RIP) [10, 9], then the desired sparse solution coincides with the computable minimal -norm solution from (5). One is then led to wonder how systems satisfying the RIP can be constructed. The RIP essentially states that has a limited ability to elongate or contract the norm of -sparse vectors. This can be roughly interpreted as requiring (i) that the rows of are approximately orthonormal, and (ii) the entries of are not too concentrated in any subset of entries and instead that the total mass is roughly equidistributed. One expects that property (i) can be satisfied if one constructs by choosing the as an orthonormal basis and sampling parameter values (rows) according to the measure of orthogonality. A quantifiable way to enforce (ii) is via the concept of coherence. It was shown in [40] that if the are an orthonormal basis, then if the number of rows of the system satisfies

(7)

then the system satisfies the RIP, and thus the computable solution from (5) (with ) coincides with the -sparse solution. Above, the infinity norm is taken over the support of the sampling measure. The parameter is called the mutual coherence, and is a bound on the entry-wise maximum of the matrix . Up to logarithmic factors this is the best one can hope for, and so constructing matrices that attain the RIP is an important task.

Unfortunately, for almost all PCE expansions, the mutual coherence is not , and usually grows as the maximum polynomial degree grows.222The notable exception is when the PCE basis corresponds to the Chebyshev polynomials, and in this case the basis elements have norms that are uniformly bounded with respect to the polynomial degree. However, it was noted in [42] that for a particular PCE basis, the Legendre polynomial basis, the functions admit an envelope function such that , where is a normalization constant. The idea presented in [42] is then to multiply the basis elements by so that they become uniformly bounded with coherence parameter . In other words, solve the problem

(8)

where is a diagonal matrix with the entries of given by the (inverse) envelope function, . This motivates the matrix introduced in (5). In order to retain orthonormality of the rows of , a Monte Carlo sampling procedure cannot sample from the original orthogonality density , but must instead sample from the biased density .

With the size of the dictionary, we exploit the idea above in this paper using the (inverse) envelope function

(9)

When the basis elements correspond to a PCE, the function is known as the () Christoffel function from the theory of orthogonal polynomials. Note that in the context of the problem (8), the weight defined by (9) means that the preconditioner is simply formed using inverse row-norms of .

With the weight matrix specified, we need only to determine the biased density . If we use the index set , then under mild conditions on and , there is a unique probability measure such that , where becomes equality interpreted in the appropriate sense as , and . This measure is the weighted pluripotential equilibrium measure [4].

This essentially completes a high-level description of the CSA algorithm: sample from an equilibrium measure, and precondition/weight with the Chrsitoffel function. The detailed algorithm is given in Section 3 where more precise formulas regarding sampling densities are shown. Section 4 provides convergence results for the univariate case – the multivariate case requires bounds on values of Christoffel-function-weighted polynomials, which we are unable to currently provide. Nevertheless, our numerical results in Section 6 show that the CSA algorithm is competitive in various situations both with a standard unweighted Monte Carlo approach (sampling from the density ), and with other approaches in the literature that have been specially tailored for certain distributions.

Finally, much our discussion above concerns noiseless recovery of exactly sparse functions, the efficacy of all the procedures above essentially generalizes to “approximately sparse” (or compressible) signals in the presence of noisy data. Indeed the Theorem 4.1 presented in Section 4 provides bounds on the error of the -term approximation recovered by the CSA algorithm in the presence of noisy data.

3. Christoffel sparse approximation

The Christoffel sparse approximation (CSA) algorithm solves the preconditioned BPDN problem (5) to accurately recover sparse orthogonal polynomials from limited data. Given and its distribution along with an index set , the CSA method consists of five steps which are outlined here and summarized in Algorithm 1:

  1. sample iid with respect to the probability density of an equilibrium measure, which depends on the orthogonality weight . When is a random variable with unbounded state space , then depends on , the maximum polynomial degree of the index set defining the dictionary. In this case we write .

  2. evaluate the function at the selected samples

  3. form Vandermonde-like matrix with entries

  4. form the diagonal preconditioning matrix using the values , which are evaluations of the (scaled) Christoffel function from the -orthogonal polynomial family . (See (9)).

  5. solve the preconditioned -minimization problem (5) to approximate the polynomial coefficients

input : Weight/density function with associated orthonormal polynomial family , index set , function
output : Expansion coefficients such that
1 Generate iid samples from equilibrium density ;
2 Assemble with entries ;
3 Form matrix with entries ;
4 Compute weights with entries ;
5 Compute ;
Algorithm 1 Christoffel Sparse Approximation (CSA)

The weight function is a density that depends on the density of and . Owing to the discussion at the end of Section 2.3, we require . As , one has equality (interpreted appropriately) with a unique limit given by the weighted pluripotential equilibrium measure. Said in another way, the asymptotic behavior of the Christoffel function with respect to the orthogonality density turns out to be governed by the weighted pluripotential equilibrium measure [4].

When is bounded, our sampling density is the density associated to the unweighted pluripotential equilibrium measure on the domain , regardless of . When is unbounded and is an exponential weight, then is a scaled version of the density associated to the -weighted pluripotential equilibrium measure on the domain ; the scaling factor depends on . We detail these sampling techniques for various canonical weights and domains in the coming discussion. The main purpose of this section is to describe the sampling density for the CSA algorithm; once this is completed, the remainder of the algorithm follows a standard preconditioned recovery procedure in compressive sampling.

In this paper we use the equilibrium measure as a tool for performing simulations, and so omit details concerning its definition or properties. Standard references for pluripotential theory are [27] and Appendix B of [43]. Much of the material presented here defining the sampling density is also in [34].

Given domain , let be a certain function defining a weight function . This new weight function in general can be distinct from the probability density , but in the cases we describe below they are related. The weight function is associated with the -weighted pluripotential equilibrium measure on , denoted by ,which is a unique probability measure. When , we use the notation . This measure, loosely speaking, carries information about the extremal behavior of weighted polynomials on . Particular examples of this measure are given in the following sections. On bounded domains, we will identify , and define our sampling density to correspond directly to . On unbounded domains with exponential weights, we will identify , and define our sampling density to be a dilated version of .

As noted above, the particular sampling strategy (via the weighted equilibrium measure) differs on bounded versus unbounded domains, so we discuss them individually here. The main difference between the bounded and unbounded cases is that for the bounded case, does not depend on , whereas for the unbounded case it does. The state of knowledge for the univariate case is essentially complete, whereas much is still unknown about the general weighted multivariate case.

The purpose of the following sections is to describe the sampling density used in step (a) above. The remaining steps (b) and (c) have straightforward and identical implementation in all cases below: perform the optimization (5) with the weight matrix entries given by (9).

3.1. Univariate sampling

In the univariate case, we have an essentially complete description of the various sampling measures. This is a direct result of historically successful analysis and characterization of the weighted potential equilibrium measure. For the bounded domain case, we sample from the (unweighted) potential equilibrium measure, and for the unbounded case we sample from expanded versions of the weighted potential equilibrium measure.

3.1.1. Bounded intervals

When is scalar and takes values on the finite interval , then we sample from the unweighted potential equilibrium measure, which is the arcsine measure having the “Chebyshev” density

(10)

Sampling from this density for the purpose of compressive sampling is not new [42]. This density corresponds to a standard symmetric Beta distribution, and is readily sampled from using available computational packages. Note that the sampling density here is independent of the weight function defining the PCE basis.

3.1.2. Exponential densities on

Now we consider the case that is unbounded with . For any scalar , we assume that is an exponential probability density weight of the form

Note that we need a normalization constant to make a probability density, but the constant does not affect any of the discussion below so we omit it. Here, we have . Our assumption is a standard assumption in the theory of orthogonal polynomials, and relates to conditions ensuring -completeness of polynomials. We use to denote the maximum polynomial degree of the index set .

Associated to the parameter (more pedantically, associated to ), we define the following constants:

(11)

These are the Mhaskar-Rahkmanov-Saff numbers associated with the weight [31, 39]. We add the superscript to indicate that these numbers are associated with weights having support on the Whole real line . These numbers delinate a compact interval on which a weighted polynomial achieves its -supremum. We define the following intervals:

(12)

The probability density function of the -weighted equilibrium measure is given by

(13)

where denotes Lebesgue measure on . See, e.g., [43]; a summary table of explicit weights for various is also given in [33].

Given this density, and the index set which forms our compressive sampling dictionary, the CSA sampling density is formed by linearly mapping the density of to :

(14)

We note that in the particularly important case of , corresponding to a PCE basis of Hermite polynomials, we have , and

which is another standard, symmetric Beta distribution, and so easily sampled.

While (13) seems relatively complicated, in fact this density behaves essentially like the semicircle density above. I.e., for any , there are positive constants depending only on such that

See Theorem 1.10 of [30].

3.1.3. Exponential densities on

In this section we consider the case of . We assume that is a one-sided exponential weight of the form

Again we have . We assume that , and again take .

Associated to the parameter , we define the following constants:

(15)

These are the Mhaskar-Rakhmanov-Saff numbers associated with the weight on .

The superscript indicates that these numbers are associated with weights having support on the Half real line. We define the following intervals:

(16)

Consider the probability density function of the weighted equilibrium measure, given by

(17)

Given this density, and the index set which forms our compressive sampling dictionary, we form the CSA sampling density by mapping the density of to :

(18)

We note in particular for case of corresponding to a PCE basis of Laguerre polynomials we have , and:

(19)

This is just an asymmetric Beta distribution on .

The formula (17) looks cumbersome, but this measure essentially behaves like the whole-line measure (13). In particular, we have

See [28]. Thus, we can actually sample from the measure by instead transforming samples from . Given , let be a random variable whose distribution is . Then has distribution . Therefore, sampling from the half-line equilibrium measures can be reduced to the problem of sampling from the whole-line measure (13).

3.2. Multivariate sampling

In general very little is known (other than existence/uniqueness) about the (un)weighted pluripotential equilibrium measure on sets in with . We detail some special cases below. We note in particular that essentially nothing is known about the weighted case, and so our descriptions of the sampling density on unbounded domains with exponential weights are conjectures.

3.2.1. The bounded domain

Let take values on the domain for any weight function . In this case, the CSA sampling density corresponds to the unweighted equilibrium measure on ; the density of this measure is given by

which is a tensor-product Chebyshev density on .

3.2.2. Convex, symmetric bounded domains

If takes values on a compact set , our CSA sampling density corresponds to the unweighted equilibrium measure on . There is little that can be said about this measure in general, but the following special cases are known:

  • [3]: if is convex, then the equilibrium measure Lebesgue density exists and is bounded above and below relative to , and is thus “Chebyshev-like”. (We use to denote Euclidean distance between sets, with the boundary of .)

  • [3]: if is the unit ball in , then the equilibrium measure density is given by

    where is a normalization constant. Note that sampling from this density is relatively straightforward: Let be a standard normal random variable in , and let be a univariate Beta random variable on with shape parameters and . Then has the desired distribution on the unit ball in .333To see this, note that needs to have density proportional to . Some manipulation on this distribution shows that has the distribution of . The factor is a directional factor, sampling uniformly on the boundary of the unit ball.

  • [48]: if is the unit simplex in , then , with a normalizing constant. This density may also be sampled: this is, in fact, the density of a -dimensional Dirichlet distribution with all parameters equal to and the ’st coordinate equal to . Therefore, let be the previously mentioned -dimensional Dirichlet random vector. Form by truncating the last entry in this vector; then is a -dimensional random variable with density .

To the best of our knowledge, these cases are essentially a complete description of the current state of knowledge about the unweighted equilibrium measure on domains in .

3.2.3. The domain with Gaussian density

Let with Gaussian weight with a normalizing constant. In [34], we prescribe sampling according to the density

with a normalization constant. We conjectured in [34] that this is the weighted equilibrium measure associated to this choice of . Like the univariate unbounded cases with , we expand the samples by the square root of the maximum polynomial degree in the index set . The following is a concrete way to sample from this expanded density:

  1. Compute , the maximum polynomial degree in the index set . This is equal to .

  2. Generate a vector of independent normally distributed random variables.

  3. Draw a scalar sample from the Beta distribution on , with distribution parameters and .

  4. Set

The above procedure generates samples on the Euclidean ball of radius in . We emphasize that our methodology samples from a density that is only a conjecture for the correct equilibrium measure.

3.2.4. The domain with exponential density

Let take values on with associated probability density , where is a normalizing constant. Here we sample from the density function

As we conjectured in [34], this is the equilibrium measure associated to this choice of . We expand the samples by the maximum polynomial degree in the index set (like the unbounded univariate case). The following is a concrete way to sample from this expanded density:

  1. Compute , the maximum polynomial degree in the index set . This is equal to .

  2. Generate a -variate Dirichlet random variable with parameters .

  3. Truncate the last (’th) entry of

  4. Set .

The above procedure generates samples on the set of Euclidean points in whose norm is less than or equal to .

4. Analysis for univariate systems

Our analysis in this section assumes that is a scalar. (That is, we consider the one-dimensional case here.) We show that a CSA-based algorithm can successfully recover sparse and compressible solutions. On bounded domains, we can accomplish this with the optimal samples. For unbounded domains, we pay an additional penalty, requiring . To establish these results, we use the analysis for bounded orthonormal systems presented in [40]. Using this analysis, the unbounded penalty of is sharp. The CSA method also introduces some error terms stemming from the fact that the PCE basis is only approximately orthonormal under our sampling strategy; these error terms are similar to those obtained for least-squares in [34].

With a scalar, we consider approximation with a dictionary of PCE basis elements up to degree , corresponding to a total of basis elements. To be explicit, in this section we assume

(20)

Recall that in the unbounded case our sampling depends on . When the number of dominant coefficients is , we will be concerned with recovery of the dominant coefficients using samples. We use the CSA procedure detailed at the beginning of Section 3 for sampling, which proposes the following sampling densities for :

  1. When has a Beta distribution with shape parameters (corresponding to Jacobi polynomial parameters ), we sample from the Chebyshev density (10). This corresponds to -independent sampling according to defined in (10) and .

  2. When is a two-sided exponential random variable () with density , for , we sample from the expanded equilibrium measure whose density is with support , defined in (14) and (12), respectively.

  3. When is a one-sided exponential random variable () with density , for , we sample from the expanded equilibrium measure whose density is with support , defined in (18) and (16), respectively.

In any of the three cases above, our convergence statement requires definition of an additional matrix, which is a Gramian matrix corresponding to the Christoffel function weighted PCE basis elements

(21)

Note that the matrix defined above is positive-definite, and any fixed entry of this matrix converges to the corresponding entry in the identity matrix as [34]. Our result below uses the induced norm for matrices, , which is the maximum vector norm of columns of . For the symmetric positive-definite matrix , we use to denote its unique symmetric positive definite square root.

Theorem 4.1.

Suppose that sampling points are drawn iid according to the equilibrium measure density associated with the probability measure , and consider the matrix with entries and the diagonal matrix with entries given by (9). Assume that the number of samples satisfies

(22)

where is defined in (21) and has the following behavior:

  1. There is a constant such that uniformly in ,

  2. There is a constant such that uniformly in ,

  3. There is a constant such that uniformly in ,

Then with probability exceeding the following holds for all polynomials . Suppose that noisy sample values are observed, with . Then the coefficient vector is recoverable to within a factor of its best -term approximation error and to a factor of the noise level by solving the inequality-constrained -minimization problem

(23)

The error between and the recovered solution satisfies \cref@addtoresetequationparentequation

(24a)
(24b)

where is the best -term approximation of a vector in norm, and are universal constants .

The proof is essentially just a convergence result for sparse solutions from a basis pursuit algorithm using bounds on an orthonormal system as established in [40], with the required bounds on Christoffel-weighted polynomials provided in Theorems 5 later. The full proof of the above Theorem is presented in Appendix D.

Remark 4.1.

We expect that the case corresponding to the one-dimensional Beta distribution (Jacobi polynomials) can actually be extended to almost any kind of bounded weight function on a compact interval. (The essential results are in, e.g., [19].)

Remark 4.2.

For the CSA-b case, the behavior of with respect to is sharp; i.e., when the behavior is sharp, and when the behavior is sharp. See the comments following Theorem 5.1.B. A similar statement holds for the CSA-c case.

Remark 4.3.

The above results generalize to tensor-product domains and weights, if one uses tensor-product sampling from the respective univariate densities. Then the sample count criterion in (22) is the same, but the factor is a product of such factors, each corresponding to the respective univariate bound.

Note that the result is stated in terms of recovery of , and not . This is done because the PCE basis is only orthonormal under when transformed according to the Gramian . Note that the actual CSA algorithm described in Section 3 performs recovery of , which is not the statement of the Theorem above.

Nevertheless, we empirically observe that is not only close to the identity [34], but also that has an approximately sparse representation. Thus, our empirical observation is that minimizing with the objective is similar to minimizing with objective . When is sparse and close to the identity, then and should likewise be similar.

The penalty factor appearing in both the sampling condition (22) and the estimate (24b) is likewise small. We show various values for this penalty in Figure 1, which we observe to be for various densities on both bounded and unbounded domains. The term appearing in (24a) is computed empirically in [34] and observed to be for essentially all values of and all relevant shape parameters.

Figure 1. Left: for Jacobi polynomials with symmetric parameters . Right: for two exponential-type densities: Hermite polynomials with on , and Laguerre polynomials with on .

To motivate our observation that is in fact quite close to the identity, we have the following optimal result for a bounded uniform random variable, corresponding to a PCE basis of Legendre polynomials.

Corollary 4.1.

Assume the setup of Thereom 4.1 in the CSA-a (bounded) case, with parameters . This corresponds to a uniform random variable with a Legendre PCE basis. Then assuming

the solution to

satisfies

Proof.

Under the assumption that is a scalar uniform random variable, the main result of [7] states that for all . Application of Theorem 4.1 with is the desired conclusion. ∎

5. Bounds on Christoffel-weighted polynomials

Theorem 4.1 requires knowledge of bounds on Christoffel-weighted polynomials. We provide these bounds here, but leave the proofs to the appendix due to their technical nature.

As with Section 4, we restrict our attention to the univariate case here, with a dictionary defined by (20). Given this definition of , the Christoffel function is uniquely defined by (9). For the univariate case, we express expicit dependence on the maximum polynomial degree by writing

We use the subscript to keep consistency with other literature. This notation is used in all of the following theorems. As before, we ignore any normalization factors that are necessary to make a probability density since their presence does not affect the results in any way.

Theorem 5.1.A.

Assume is a Beta-distributed random variable on , with shape parameters , having density

Then, uniformly in ,

(25)

The uniform boundedness of these weighted polynomials has already been investigated for compressive sampling using essentially the same kind of weight [42]. The proof is contained in Appendix A.

We now consider the more difficult CSA-b case where has a two-sided exponential density.

Theorem 5.1.B.

Assume is an exponentially-distributed random variable on , having density with shape parameter given by

Then, uniformly in ,

where

(26)

Some remarks on this are in order: first, in the important corresponding to a PCE basis of Hermite polynomials, we have the bound , and thus this weighting does not produce a bound that is uniform in . The bound above is sharp with respect to the prescription of . For example with , and using formulas (39a), (38), and (40) from Appendix B, we have

So that the exponent is a lower bound. However for the special case at , we use (40) and (39a) to conclude

and thus we have that must be larger than at least and . It turns out that these are in fact the dominating behaviors and so behaves exactly according to the maximum of the two bounds above. The proof of this is given in Appendix B. The one-sided CSA-c case is similar.

Theorem 5.1.C.

Assume is an exponentially-distributed random variable on , having density with shape parameter given by

Then, uniformly in ,

where is defined by (26) and satisfies

(27)

The proof is given in Appendix C. Just as with Theorem 5.1.B the dependence on the exponent is sharp, as can be observed by manipulating the cases , and at , by using (49a) and Lemmas C.3 and C.4. The result above holds for the more general case considering a family of polynomials orthonormal under the weight with and . (That is, the exponent on is still , independent of .) To establish this, one need only augment the proof in Appendix C to include the factor in the weight; the necessary results generalizing the cited Lemmas in C are in [26, 28, 29].

6. Results

In the following we will numerically compare the performance of CSA with other popular sampling strategies for compressive sampling of PCE. Recall that the random variable has probability density , and we attempt to recover a sparse expansion from a multi-index dictionary . We let denote the maximum univariate degree in . I.e.,

We use the following terms to describe the recovery procedures tested in this section.

  • CSA – This is the algorithm proposed in this paper. We sample as iid realizations of the weighted pluripotential equilibrium measure. I.e., we sample from the measure with associated density as described in Sections 3.1 and 3.2. We define weights as evaluations of the Christoffel function associated to , and are defined as in (9).

  • MC – This is the “naïve” approach where we sample iid from the orthogonality density , define the weights , and subsequently solve (5).

  • Asymptotic – This is a sampling procedure designed to approximate the asymptotic envelope of the PCE basis. Essentially, this method prescribes Chebyshev sampling on tensor-product bounded domains and uniform sampling inside a hypersphere of certain radius when is a Gaussian random variable. We detail these asymptotic cases below in Sections 6.1 and 6.2; these methods were proposed in [23] building on ideas from [42, 49].

The following sections describe the “asymptotic” sampling strategy mentioned above.

6.1. Asymptotic sampling for Beta random variables: Chebyshev sampling

In the pioneering work in [42] it was shown that for polynomials which are orthogonal to a certain class of weight functions , defined on bounded domains, that random samples drawn from the Chebyshev measure can be used with preconditioned -minimization to produce Vandermonde matrices with small restricted isometry properties. The major idea is that the weighted polynomials corresponding to the preconditioned have envelopes that are constant, and are thus bounded. Using results in [40], this boundedness can be used to show RIP properties. Since Jacobi polynomials (orthogonal with respect to the Beta distribution on with weight function for ) have an envelope that behaves in an analogous fashion, then the Chebyshev sampling and corresponding weighting likewise produces a sampling matrix with good RIP properties.

Let be a vector of independent uniform random variables on then the Chebyshev sampling method generates samples according to

These samples are then used with the preconditioning weights

to solve the preconditioned -minimization problem (5). This is the “asymptotic” sampling strategy prescribed in [23].

6.2. Asymptotic sampling for Gaussian random variables

The Chebyshev sampling method is not applicable to unbounded random variables. In [23] an asymptotic sampling scheme was proposed for sparse Hermite polynomial approximation of functions parameterized by Gaussian random variables. Asymptotic sampling draws random samples from an envelope that behaves like the asymptotic (in order) envelope for the Hermite polynomials as the polynomial degree goes to infinity.

Let be a vector of independent normally distributed random variables and let be a independent uniformly distributed random variable on then we generate samples according to

where and is the order of the total degree polynomial space. These asymptotic Gaussian samples are uniformly distributed on the -dimensional ball of radius and are used with precondition weights generated from

Note the radius here is essentially the same as the CSA radius prescription in Section 3.2.3. (In contrast to Gaussian asymptotic sampling, the CSA algorithm does not sample from the uniform density.)

6.3. Manufactured sparse solutions

In this section we investigate the performance of the CSA method when used to recover randomly generated -sparse vectors from noiseless data, such that . Specifically we construct a -sparse vector by selecting