# Generation and application of multivariate polynomial quadrature rules

## Abstract

The search for multivariate quadrature rules of minimal size with a specified polynomial accuracy has been the topic of many years of research. Finding such a rule allows accurate integration of moments, which play a central role in many aspects of scientific computing with complex models. The contribution of this paper is twofold. First, we provide novel mathematical analysis of the polynomial quadrature problem that provides a lower bound for the minimal possible number of nodes in a polynomial rule with specified accuracy. We give concrete but simplistic multivariate examples where a minimal quadrature rule can be designed that achieves this lower bound, along with situations that showcase when it is not possible to achieve this lower bound. Our second main contribution comes in the formulation of an algorithm that is able to efficiently generate multivariate quadrature rules with positive weights on non-tensorial domains. Our tests show success of this procedure in up to 20 dimensions. We test our method on applications to dimension reduction and chemical kinetics problems, including comparisons against popular alternatives such as sparse grids, Monte Carlo and quasi Monte Carlo sequences, and Stroud rules. The quadrature rules computed in this paper outperform these alternatives in almost all scenarios.

\corauthor

J.D Jakeman \coremailjdjakem@sandia.gov \fundingSee acknowledgements

Multivariate integration, Moment-matching quadrature, Uncertainty quantification, Dimension reduction

## 1 Introduction

Let be a domain with nonempty interior. Given a finite, positive measure on and a -measurable function , our main goal is computation of

 ∫Df(x)\dxμ(x)

The need to evaluate such integrals arises in many areas, including finance, stochastic programming, robust design and uncertainty quantification. Typically these integrals are approximated numerically by quadrature rules of the form

 ∫Df(x)\dxμ(x) ≈M∑m=1wmf(xm) (1)

Examples of common quadrature rules for high-dimensional integration are Monte Carlo and quasi Monte Carlo methods [12, 14, 20, 27], and Smolyak integration rules [7, 8, 26]. Quadrature rules are typically designed and constructed in deference to some notion of approximation optimality. The particular approximation optimality that we seek in this paper is based on polynomial approximation.

Our goal is to find a set of quadrature nodes , , and a corresponding set of weights such that

 M∑j=1wjp(xj) =∫Dp(x)\dxμ(x), p ∈PΛ, (2)

where is a multi-index set of size , and is an -dimensional polynomial space defined by . (We make this precise later.) can be a relatively “standard” space, such as the space of all -variate polynomials up to a given finite degree, or more intricate spaces such as those defined by balls in index space, or hyperbolic cross spaces.

In this paper we will present a method for numerically generating polynomial based cubature rules.1 This paper provides two major contributions to the existing literature. Firstly we provide a lower bound on the number points that make up a polynomial quadrature rule. Our analysis is straightforward, but to the authors knowledge this is the first reported bound of its kind. Our second contribution is a numerical method for generating quadrature rules that are exact for a set of arbitrary polynomial moments. Our method has the following features:

• Positive quadrature rules are generated.

• Any measure for which moments are computable are applicable. Many existing quadrature methods only apply to tensor-product measures. Our method constructs quadrature rules for measures with, for example, non-linear correlation between variables.

• Analytical or sample-based moments may be used. In some settings it may be possible to compute moments of a measure exactly, but in other settings only samples from the measure are available. For example, one may wish to integrate a function using Markov Chain Monte Carlo-generated samples from a posterior of a Bayesian inference problem.

• A quadrature that is faithful to arbitrary sets of moments may be generated. Many quadrature methods are exact for certain polynomial spaces, for example total-degree or sparse grid spaces. However, some functions may be more accurately represented by alternative polynomial spaces, such as hyperbolic cross spaces. In these situations it may be more prudent to construct rules that can match a customized set of moments.

• Efficient integration of ridge functions is possible. Some high-dimensional functions can be represented by a small number of linear combinations of the input variables. In this case it is more efficient to integrate these functions over this lower-dimensional coordinate space. Such a dimension-reducing transformation typically induces a new measure on a non-tensorial space of lower-dimensional variables. For example, a high-dimensional uniform probability measure on a hypercube may be transformed into a non-uniform measure on a zonotope (a multivariate polygon).

Our algorithm falls into the class of moment-matching methods. There have been some recent attempts at generating quadrature using moment matching via optimization apporaches. These methods frequently either start with a small candidate set and add points until moments are matched , or start with a large set of candidate points and reduce them until no more points can be removed without numerically violating the moment conditions [25, 16, 34]. These approaches sometimes go by other names, such as scenario generation or scenario reduction methods.

This paper presents a quadrature/scenario reduction moment matching method based upon the work in  . The method in   is comprised of two steps. The first step generates a quadrature rule with points, the existence of which is guaranteed by Tchakaloff’s theorem . The second step uses this quadrature rule as an initial guess for a local gradient-based optimization procedure that searches for a quadrature rule with points.

The initial quadrature rule is generated by drawing a large number of points from the domain of integration and then solving an inequality-constrained linear program to find a quadrature rule with positive weights that matches all desired moments. Similar approaches can be found in  [1, 4]. The points comprising this initial quadrature rule are then grouped into clusters. A new approximate quadrature rule is then formed by combining points and weights within a cluster into a single point and weight.

Our numerical method differs from   in the following ways: (i) we use a different formulation of the linear program to generate the initial quadrature rule; (ii) we show numerically that this quadrature only needs to be solved with very limited accuracy, (iii) we present an automated way of selecting the clusters from the initial rule – in   no method for clustering points is presented; (iv) we provide extensive numerical testing of our method in a number of settings.

The theoretical contributions of this paper include a lower bound on the number of points in the final quadrature rule, and a definition of quasi-optimality in terms of this bound. We also provide provide a simple means of testing whether the quadrature rules generated by any method are quasi-optimal.

The remainder of the paper is structured as following. In Section 2 we introduce some nomenclature, define quasi-optimality, and present a means to verify if a quadrature rule is quasi-optimal. We also use these theoretical tools to analytically derive quasi-optimal rule for additive functions. In Section 4 we detail out algorithm for generating quadrature rules. We then present a wide range of numerical examples, in Section 5, which explore the properties of the quadrature rules that our numerical algorithm can generate.

We give a concise description of some existing quadrature rules. There are numerous approaches for computing polynomial-based quadrature rules so our goal is not to be comprehensive, but instead to concentrate on rules that are related to our proposed method.

For univariate functions Gaussian quadrature is one of the most commonly used approaches. Nodes associated to Gaussian quadrature rules are prescribed by roots of the polynomials orthogonal to the measure . The resulting quadrature rule is always positive (meaning the weights are all positive) and the rules are optimal in the sense that given a Gaussian quadrature rule of degree of exactness , no rule with fewer points can be used to integrate all degree- polynomials.

When integrating multivariate functions with respect to tensor-product measures on a hypercube accurate and efficient quadrature rules can be found by taking tensor-product of one-dimensional Gaussian quadrature rules. These rules will optimal for functions that can be represented exactly by tensor-product polynomial spaces of degree . However the use of such quadrature rules is limited to a small number of dimensions, say 3-4, because the number of the points in the rule grows exponentially with dimension.

Sparse grid quadrature methods have been successfully used as an alternative to tensor-product quadrature for multivariate functions [7, 8, 26]. The number of points in sparse grid rules grow logarithmically with dimension for a fixed level of accuracy. Unlike tensor-product rules however the quadrature weights will not all be positive. Sparse grid quadrature delays the curse of dimensionality by focusing on integrating polynomial spaces that have high-degree univariate terms but low-degree interaction terms.

High-dimensional cubature rules can often be more effective than sparse grid rules when integrating functions that are well represented by total-degree polynomials. These rules have positive weights and typically consist of a very small number of points. However such highly effective cubature rules are difficult to construct and are have only been derived for a specific set of measures, integration domains and polynomial degree of exactness [13, 29, 36].

Tensor-product integration schemes generally produce approximations to the integral (1) whose accuracy scales like , where indicates the maximum order of continuous partial derivatives in any direction . This convergence rate illustrates both the blessing of smoothness – regularity accelerates convergence exponentially – along with the curse of dimensionality – convergence is exponentially hampered by large dimension. For sparse grids consisting of univariate quadrature rules with points have a similar error, scaling like  .

A contrasting approach is given by Monte Carlo (MC) and quasi Monte Carlo (QMC) approaches. These approximations produce convergence rates of and , respectively . MC points are random, selected as independent and identically-distributed realizations of a random variable, and QMC points are deterministically generated as sequences that minimize discrepancy.

## 2 Quasi-optimality in multivariate polynomial quadrature

The goal of this section is to mathematically codify relationships between the number of quadrature points and the dimension of the polynomial space . In particular, we provide a theoretical lower bound for for a fixed . This lower bound can be used to define a notion of optimality in polynomial quadrature rules. We also provide related characterizations of optimal quadrature rules in both one and several dimensions.

### 2.1 Notation

With fixed, we consider a positive measure on with support . This support may be unbounded. The inner product and norm are defined as

 ⟨f,g⟩μ \coloneqq∫Df(x)g(x)\dxμ(x), ∥f∥2μ =⟨f,f⟩μ, L2μ(D)={f:D→\R|∥f∥2μ<∞}

To prevent degeneracy of polynomials with respect to and to ensure finite moments of , we assume

 0<∥p∥μ<∞, (3)

for all algebraic polynomials . One can guarantee the lower inequality if, for example, there is any open Euclidean ball in inside which has a density function.

A point in has coordinate representation ; for a multi-index with coordinates , we have and . We let denote a multi-index set of finite size .

If are any two multi-indices and , we define , , and component wise. The partial ordering is true if all the component wise conditions are true. We define the following standard properties and operations on multi-index sets: {definition} Let and be two multi-index sets, and let .

[labelwidth=3.5cm,itemindent=1em,leftmargin=!]
1. The sum of two multi-index sets is

 Λ+Θ ={α+β|α∈Λ,β∈Θ}
2. The expression is defined as

 kΛ ={kα|α∈Λ}.

Note that this need not be a set of multi-indices.

3. is a downward closed set if implies that for all .

4. For any finite , is the smallest downward closed set containing ,

 ¯Λ={α∈\Nd0|α≤β for some β∈Λ}.
5. is convex if for any and any , then .

With our notation, scalar multiplication is not consistent with Minkowski addition. In particular we have in general. If is downward closed, then .

The polynomial space is defined by a given multi-index :

 PΛ =span{xα|α∈Λ}, |Λ| =N,

and has dimension in under the assumption (3). Note that we make no particular assumptions on the structure of . I.e., we do not assume is downward closed, but much of our theory and all our numerical examples use downward closed index sets.

On , we will make use of the norm for , and the associated ball of radius to define index sets. These sets are defined by

 Bp(r)={α∈\Nd0|∥α∥p≤r}.

The norms are defined for , , and by, respectively,

 ∥α∥0 =d∑j=1\mathbbm1αj≠0, ∥α∥pp =d∑j=1αpj, ∥α∥∞ =max1≤j≤dαj.

The index sets sets are all downward closed, and are convex if . The set equals when .

With fixed, the theoretical and computational tractability of computing a solution to (2) depends on , , and . In particular, it is unreasonable to expect that can be arbitrarily large; if this were true then can be approximated to arbitrary accuracy by a sum of Dirac delta distributions, which would allow us to violate the lower inequality in (3). There is a strong heuristic that motivates the possible size of : The set represents degrees of freedom, and varying () represents an additional degrees of freedom. For an -dimensional space , (2) can be ensured with constraints. Thus we expect for general that (and thus ) must be small enough to satisfy

 |Λ|=N≤(d+1)M. (4)

We will show that this heuristic does not always produce a faithful bound on sizes of quadrature rules; we provide instead a strict lower bound on the number of points in a quadrature rule for a given . To proceed we require the notion of ‘half-sets’. {definition} Let be a finite, nontrivial, downward-closed set. A multi-index set is

1. a half-set for if

2. a maximal half-set for if it is a half-set of maximal size. I.e, if , with

 L=L(Λ) =max{|Θ||Θ a multi-index set % satisfying Θ+Θ⊆Λ} (5)

We call the maximal half-set size of . Recall that so that the terminology “half” should not be conflated with the operation of halving each index in an index set. If is not downward closed, it may not have any half sets. However, all nontrivial downward-closed sets have at least one nontrivial half-set (the zero set is one such half set). Thus maximal half sets always exist in this case, but they are not necessarily unique. An example in illustrates this non-uniqueness:

 Λ=B0(1)∩B1(2)={(0,0),(0,1),(0,2),(1,0),(2,0)}, Θ1={(0,0),(1,0)},Θ2={(0,0),(0,1)}.

We have , and both and are maximal half sets for .

If is both downward-closed and convex, then its maximal half-set is unique and easily computed. {theorem} Let be convex and downward-closed. Then its maximal half-set is unique, given by .

###### Proof.

Let be any half-set for . Then for any , we have , so that . Thus , showing that any half-set must be contained in .

Now let . Then . Thus, by convexity

 θ1+θ2 =12(2θ1)+12(2θ2)∈Λ,

where the set inclusion holds by convexity of . Thus, is itself a half-set; by the previous observation that it also dominates any half-set, then it must be the unique largest (maximal) half-set. ∎

We can now state one of the main results of this section: The number in (5) is a lower bound on the size of any quadrature rule satisfying (2). {theorem} Let be a finite downward-closed set, and suppose that an -point quadrature rule exists satisfying (2). Then , with the maximal half-set size defined in (5).

###### Proof.

Let be any set satisfying and , with defined in (5). We choose any size- -orthonormal basis for :

 PΘ =span{qj}Lj=1, ∫qj(x)qk(x)\dxμ(x) =δk,j

Note that and have monomial expansions

 qj(x) =∑α∈Θcαxα, qk(x) =∑α∈Θdαxα,

for fixed and some constants and . This implies

 qj(x)qk(x)=∑α,β∈Θcαdβxα+β=∑α∈(Θ+Θ)fαxα.

Since is a half set for , then and is therefore exactly integrated by the quadrature rule (2).

Then consider the matrix defined as

 G =VTWV, (G)j,k =N∑n=1wnqj(xn)qk(xn) (V)j,k =qk(xj), (W)j,k =wjδj,k.

The matrices and are and , respectively. Since the quadrature rule exactly integrates , then , the identity matrix. Thus, the matrix product has rank . If then, e.g., and so it is not possible that . ∎

This result shows that if , then an -point quadrature rule satisfying (2) cannot exist. This is nontrivial information. As an example, let , and consider

 Λ=B∞(2)={(0,0),(0,1),(0,2),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2)},

corresponding to the tensor-product space of degree 2. Since , the heuristic (4) suggests that it is possible to find a rule with only 3 points. However, let . Then and . Thus, no 3-point rule that is accurate on can exist.

Another observation from Lemma 2.2 is that we may justifiably call an -point quadrature rule optimal if , in the sense that one cannot achieve the same accuracy with a smaller number of nodes. {definition} Let be a downward-closed multi-index set. We call an -point quadrature rule quasi-optimal for if it satisfies (2) with , with defined in (5). We call these sets quasi-optimal because, given a quasi-optimal rule for , it may be possible to generate a quadrature rule of equal size that is accurate on an index set that is strictly larger than (see Section 5.3.1). Quasi-optimal quadrature rules are not necessarily unique and sometimes do not exist. However, the weights for quasi-optimal quadrature rules have a precise behavior.

Under the assumption (3), we can find an orthonormal basis for :

 ⟨qj,qk⟩μ =δj,k, PΛ=span{qj}Nj=1

Given this orthonormal basis, define

 λΛ(x) =1∑Nj=1q2j(x).

The quantity depends on , , and , but not on the particular basis we have chosen: Through a unitary transform we may map into any other orthonormal basis for , but this transformation leaves the quantity above unchanged. The weights for a quasi-optimal quadrature rule on are evaluations of , where is a maximal half-set for . {theorem} Let be a finite downward-closed set, and let an -point quadrature rule be quasi-optimal for . Then

 wj =λΘ(xj), j =1,…,N, (6)

where is a(ny) maximal half set for .

###### Proof.

Let , , be the cardinal Lagrange interpolants from the space on the nodes .2 These cardinal interpolants satisfy . The weights must be positive since

 wj=N∑k=1wkℓ2j(xk)=∫ℓ2j(x)\dxμ(x)>0, (7)

where the second equality uses the fact that the quadrature rule is exact on . By (7), then is well-defined. With and defined in the proof of Lemma 2.2, then , and both matrices are square (). Thus, is an orthogonal matrix, so that

 √WVVT√WT =I VVT =W−1

Taking the diagonal components of the left- and right-hand side shows that . ∎

## 3 Theoretical counter/examples of quasi-optimal quadrature

To provide further insight into our definition of quasi-optimality, we investigate the consequences of the above theoretical results through some examples.

### 3.1 Univariate rules

The example in this section shows (i) that optimal quadarture rules in one dimension are Gauss quadrature rules, and (ii) among many quasi-optimal rules, some may be capable of accurately integrating more polynomials than the others.

Let with . The maximal half-set for is , and thus

 L(Λ)=⌊N−12⌋+1

We consider two cases, that of even , and of odd .

Suppose is even. Then a quasi-optimal quadrature rule has abscissae, and exactly integrates (linearly independent) polynomials. I.e., the -point rule exactly integrates polynomials up to degree . It is well-known that this quadrature rule is unique: it is the -Gauss-Christoffel quadrature rule .

With a family of -orthonormal polynomials, with , then the abscissae are the zeros of , and the weights are given by (6). This quadrature rule can be efficiently computed from eigenvalues and eigenvectors of the symmetric tridiagonal Jacobi matrix associated with . Thus, this quadrature rule can be computed without resorting to optimization routines.

Now suppose is odd. Then a quasi-optimal quadrature rule has abscissae, and exactly integrates polynomials. I.e., the -point rule exactly integrates polynomials up to degree . Clearly a Gauss-Christoffel rule is a quasi-optimal rule in this case. However, by our definition there are an (uncountably) infinite number of quasi-optimal rules in this case . In particular, for arbitrary , the zero set of the polynomial

 qM(x)−cqM−1(x), (8)

corresponds to the abscissae of a quasi-optimal quadrature rule, with the weights give by (6). Like the Gauss-Christoffel case, computation of these quadrature rules can be accomplished via eigendecompositions of symmetric tridiagonal matrices .

In classical scientific computing scenarios, sets of -point rules with polynomial accuracy of degree have been called Gauss-Radau rules, and are traditionally associated to situations when is an interval and one of the quadrature absciasse is collocated at one of the endpoints of this interval .

This section furnishes an example where a quasi-optimal multivariate quadrature rule can be explicitly constructed.

Consider a tensorial domain and probability measure, i.e., suppose

 μ =×dj=1μj, D =⊗dj=1\suppμj

where are univariate probability measures. We let for any . With the cardinal ’th unit vector in , then is the set of indices of of the form for ; the size of is . Note that this would arise in situations where one seeks to approximate an unknown multivariate function as a sum of univariate functions; this is rarely a reasonable assumption in practice. However, in this case we can explicitly construct optimal quadrature rules.

The heuristic (4) suggests that we can construct a rule satisfying (2) if we use

 M≥n(dd+1)+1d+1

nodes, which is approximately nodes. However, we can achieve this with fewer nodes, only , independent of . Here the heuristic (4) is too pessimistic when .

Associated to each , we need the corresponding system of orthonormal polynomials and the univariate function. For each , let , denote a -orthonormal polynomial family with . Define

 λn,j(⋅)=1∑ni=0q2i,j(⋅).

Note that for all since each is a probability measure.

Consider the index sets , for , where is the origin in . Each is a maximal half-set for , and in (5) is given by .

To construct a quasi-optimal quadrature rule achieving with nodes, we note that the weights , must be given by (6), which holds for any maximal half index set . I.e., it must simultaneously hold for all . Thus,

 Missing or unrecognized delimiter for \right (9)

for . This implies in particular that the coordinates for node must satisfy

 λ⌊n/2⌋,1(x(1)m)=λ⌊n/2⌋,2(x(2)m)=⋯=λ⌊n/2⌋,d(x(d)m).

We can satisfy this condition in certain cases. Suppose ; then and so we can satisfy (9) by setting for all . Thus nodes for a quasi-optimal quadrature rule nodes could lie in along the graph of the line defined by

 x(1)=x(2)=⋯=x(d).

In order to satisfy the integration condition (2) we need distribute the nodes in an appropriate way. Having effectively reduced the problem to one dimension, this is easily done: we choose a Gauss-type quadrature rule as described in the previous section. Let the th coordinate of the quadrature rule be the -point Gauss quadrature nodes for , i.e.,

 {x(j)1,x(j)2,…,x(j)M}=q−1M,j(0).

This then uniquely defines , and is likely uniquely defined since we have satisfied (9).

Thus a “diagonal” Gauss quadrature rule, , with equal coordinate values for each abscissa, is an optimal rule in this case.

### 3.3 Quasi-optimal multivariate quadrature: non-uniqeness

The previous example with an additional assumption allows to construct distinct quasi-optimal quadrature rules. Again take for , and let with identical univariate measures .

To this add the assumption that is a symmetric measure; i.e., for any set , then . In this case the univariate orthogonal polynomials are even (odd) functions if is even (odd). Thus, the set zero set is symmetric around 0, and is always an even function. With the quasi-optimal set of nodes defined in the previous section, let

 y(j)m =σjx(j)m, σj∈{−1,+1},

for a fixed but arbitrary sign train . Using the above properties, one can show that the nodes and the weights define a quadrature rule satisfying (2), and of course have the same number of nodes as the quasi-optimal rule from the previous section.

By varying the , we can create unique distributions of nodes, thus showing that at least this many quasi-optimal rules exist.

### 3.4 Quasi-optimal multivariate quadrature: non-existence

We again use the setup of Section 3.2, but this time to illustrate that it is possible for quasi-optimal to not exist. We consider , and take , and let for two univariate probability measures and . However, this time we let these measures be different:

 \dxμ1(t) =12\dxt, suppμ1 =[−1,1] \dxμ2(t) =34(1−t2)\dxt, suppμ2 =[−1,1]

With our choice of , we have with two maximal half-sets:

 Θ1 ={(0,0),(1,0)}, Θ2 ={(0,0),(0,1)}

In this simple case the explicit - and -orthonormal families take the expressions

 q0,1(t) =1, q1,1(t) =√3t, q0,2(t) =1, q1,2(t) =√52t

so that associated to and , respectively, we have the functions

 λ1,1(t) =11+3t2, λ1,2(t) =44+5t2.

Since by (9) we require , this implies that

 3(x(1)m)2 =54(x(2)m)2, m=1,2 (10a) However, the condition (2) also implies for our choice of Λ that ∫2−1p(t)12\dxt =2∑m=1wmp(x(1)m), p ∈span{1,t,t2,t3}, ∫\Rp(t)34(1−t2)\dxt =2∑m=1wmp(x(2)m), p ∈span{1,t,t2,t3}. The conditions above imply that {x(1)m}2m=1 and {x(2)m}2m=1 be nodes for the 2-point μ1- and μ2-Gauss quadrature rules, respectively, which are both unique. Thus, {x(1)m}2m=1={±√3/3}, and {x(2)m}2m=1={±1/√5}. Thus, we have the equality 3(x(1)m)2 =5(x(2)m)2, m=1,2 (10b)

We arrive at a contradiction: simultaneous satisfaction of (10a) and (10b) implies all coordinates of the abscissae are 0, which cannot satisfy (2). Thus, no quasi-optimal quadrature rule for this example exists.

## 4 Numerically generating multivariate quadrature rules

In this section we describe our proposed algorithm for generating multivariate quadrature rules; the algorithm generates rules having significantly less points than the number of moments being matched. We will refer to such rules as reduced quadrature rules. We will compare the number of nodes our rules can generate with the lower bound defined in Section 2, along with the heuristic bound (4).

Our method is an adaptation of the method presented in . The authors there showed that one can recover a univariate Gauss quadrature rule as the solution to an infinite-dimensional linear program (LP) over nonnegative measures. Let a finite index set be given with , and suppose that is any basis for . In addition, let be a polynomial on such that ; we seek a positive Borel measure on solving

 minimize ∫r(\Vx)\dxν(\Vx) (11) subject to ∫pj(\Vx)\dxν(\Vx)=∫pj(\Vx)\dxμ(\Vx),j=1,…,NΛ (12)

The restriction that is a positive measure will enter as a constraint into the optimization problem. A nontrivial result is that a positive measure solution to this problem exists with . Such a solution immediately yields a positive quadrature rule with nodes and weights given by . Unfortunately, the above optimization problem is NP-hard, and so the authors in  propose a two step procedure for computing an approximate solution. The first step solves a finite-dimensional linear problem similar to (11) to find a -point positive quadrature rule with . In the second step, the points are clustered into groups, where is automatically chosen by the algorithm. The clusters are then used to form a set of averaged points and weights, which form an approximate quadrature rule. This approximate quadrature rule is refined using a local gradient-based method to optimize a moment-matching objective. The precise algorithm is outlined in detail in Appendix A.

In this paper we also adopt a similar two step procedure to compute reduced quadrature rules. We outline these two steps in detail in the following section. Pseudo code for generating reduced quadrature rules is presented in Algorithm B contained in Appendix B.

### 4.1 Generating an initial condition

Let an index set be given with , and suppose that is a basis for . We seek to find a discrete measure by choosing a large candidates mesh, , with points on and solving the minimization problem

 minimize S∑k=1|vk|subject to S∑k=1vkpj(xk)=∫pj(x)\dxμ(x),j=1,…,NΛand vk≥0,k=1,…,S (13)

The optimization is over the scalars . With a vector containing the , the non-zero coefficients then define a quadrature rule with points, . The points corresponding to the non-zero coefficients are the quadrature points and the coefficients themselves are the weights.

This -minimization problem as well as the residual based linear program used by  become increasingly difficult to solve as the size of and dimension of the space increase. The ability to find a solution is highly sensitive to the candidate mesh. Through extensive experimentation we found that by solving (13) approximately via

 Missing dimension or its units for \hskip (14)

for some , we were able to find solutions that, although possibly inaccurate with respect to the moment matching criterion, could be used as an initial condition for the local optimization to find an accurate reduced quadrature rule. This is discussed further in Section 5.1. We were not able to find such approximate initial conditions using the linear program used in ; we show results supporting this in Section 5.

We solved (14) using least angle regression with a LASSO modification  whilst enforcing positivity of the coefficients. This algorithm iteratively adds and removes positive weights until , or until no new weights can be added without violating the positivity constraints. This allows one to drive to small values without requiring an a priori estimate of .

In our examples, the candidate mesh is selected by generating uniformly random samples over the integration domain , regardless of the integration measure. Better sampling strategies for generating the candidate mesh undoubtedly exist, but these strategies will be - and -dependent, and are not the central focus of this paper. Our limited experimentation here suggested that there was only marginal benefit from exploring this issue.

### 4.2 Finding a reduced quadrature rule

Once an initial condition has been found by solving  (14) we then use a simple greedy clustering algorithm to generate an initial condition for a local gradient-based moment-matching optimization.

#### Clustering

The greedy clustering algorithm finds the point with the smallest weight and combines it with its nearest neighbor. These two points are then replaced by a point whose coordinates are a convex combination of the two clustered points, where the convex weights correspond to the weights that are output from the LP algorithm. The newly clustered point set has one less point than before. This process is repeated. ,

We terminate the clustering iterative procedure when a desired number of points is reached. At the termination of the clustering algorithm a set of points and weights , defining an approximate quadrature rule are returned. The algorithm pseudo-code describing this greedy clustering algorithm is given in Algorithm B in Appendix B.

The ability to find an accurate reduced quadrature rule is dependent on the choice of the number of specified clusters . As stated in Section 2, there is a strong heuristic that couples the dimension , the number of matched moments , and the number of points . For general and given we set the number of clusters to be

 M=Nd+1. (15)

As shown in Section 2 this heuristic will not always produce a quadrature rule that exactly integrates all specified moments. Moreover, the heuristic may over estimate the actual number of points in a quasi-optimal quadrature rule.

It is tempting to set to the lower bound value , defined in (5). However, a sobering result of our experimentationl is that in all our experiments we were never able to numerically find a quadrature rule with fewer points than that specified by (15); therefore, we could not find any rules with points where . However, we were able to identify situations in which the heuristic underestimated the requisite number of points in a reduced quadrature rule. For example, if one wants to match the moments of a total degree basis of degree one must use at least points. This lower bound is typically violated by the heuristic for low-degree and high-dimension . E.g. for and we have using the heuristic yet the theoretical lower bound for from Theorem 2.2 requires . In this case our theoretical analysis sets a lower bound for that is more informative than the heuristic (15).

#### Local-optimization

The approximate quadrature rule , generated by the clustering algorithm is used as an initial guess for the following local optimization

 minimize N∑j=1(∫pj(x)\dxμ(x)−M∑m=1wmpj(xm))2 (16a) subject to xm∈D and wm≥0, (16b)

The objective defined by (16a) is a polynomial and its gradient can easily be computed

 dfdx(s)k =−n−1∑i=0⎡⎣wkdpi(xk)dx(s)k(q(pi)−M∑j=1wjpi(xj))⎤⎦ (17a) dfdwk =−n−1∑i=0[pi(xk)(q(pi)−M∑j=1wjpi(xj))], (17b)

for and , and where .

We use a gradient-based nonlinear least squares method to solve the local optimization problem. Defining the optimization tolerance , the procedure exits when either or , where and are the values of the objective at steps and respectively, and is the value of the gradient scaled to respect the constraints (16b).

This local optimization procedure in conjunction with the cluster based initial guess can frequently find a quadrature rule of size as determined by the degree of freedom heuristic (15). However in some cases a quadrature rule with points cannot be found for very high accuracy requirements (in all experiments we say that a quadrature rule is found found if the iterations yield ). In these situations one might be able to find an point rule using another initial condition. (Recall the initial condition provided by -minimization is found using a randomized candidate mesh.) However we found it more effective to simply increment the size of the desired quadrature rule to . This can be done repeatedly until a quadrature rule with the desired accuracy is reached. While one fears that one may increment by a large number using this procedure, we found that no more than a total of of increments were ever needed. This is described in more detail in Section 5.1.

Note that both generating the initial condition using (14) and solving the local optimization problem (16a) involve matching moments. We assume in this paper that moments are available and given; in our examples we compute these moments analytically (or to machine precision with high-order quadrature), unless otherwise specified.

## 5 Numerical results

In this section we will explore the numerical properties of the multivariate quadrature algorithm we have proposed in Section 4. First we will numerically compare the performance of our algorithm with other popular quadrature strategies for tensor product measures. We will then demonstrate the flexibility our our approach for computing quadrature rules for non-tensor-product measures, for which many existing approaches are not applicable. Finally we will investigate the utility of our reduced quadrature rules for high-dimensional integration by leveraging selected moment matching and dimension reduction.

Our tests will compare the method in this paper (Section 4), which we call Reduced Quadrature, to other standard quadrature methods. We summarize these methods below.

• Monte Carlo – The integration rule

 ∫Df(x)\dxμ(x)≈1MM∑m=1f(Xm),

where are independent and identically-distributed samples from the probability measure .

• Sparse grid – The integration rule for the uniform measure over given by a multivariate sparse grid rule generated from a univariate Clenshaw-Curtis quadrature rule .

• Cubature – Stroud cubature rules of degree 2, 3, and 5 . These rules again integrate on with respect to the uniform measure.

• Sobol – A quasi-Monte Carlo Sobol sequence  for approximating integrals on using the uniform measure.

• Reduced quadrature – The method in this paper, described in Section 4.

• quadrature – The “initial guess” for the Reduced quadrature algorithm, using the solution of the LP algorithm summarized in Section 4.1.

The Sparse grid, Sobol, and cubature methods are usually applicable only for integrating over with the uniform measure. When has a density with support , we will use these methods to evaluate by integrating with respect to the uniform measure on , where we assign on .

### 5.1 Tensor product measures

Our reduced quadrature approach can generate quadrature rules for non-tensor-product measures but in this section we investigate the performance of our algorithm in the more standard setting of tensor-product measures.

We begin by discussing the computational cost of computing our quadrature rules. Specifically we compute quadrature rules for the uniform probability measure on in up to 10 dimensions for all total-degree spaces with degree at most 20 or with subspace dimension at most 3003. In all cases we were able to generate an efficient quadrature rule using our approach for which the number of quadrature points was equal to or only slightly larger ( points) than the number of points suggested by the heuristic (15). The number of each points in the quadrature rule, the number of moments matched , the size of a quasi-optimal rule , the number of points suggested by the heuristic  (15), and the number of iterations taken by non-linear least squares algorithm, is presented in in Table 1.

The final two columns of Table 1, the number of points in the the reduced quadrature rules and the number of iterations used by the local optimization are given as ranges because the final result is sensitive to the random candidate sets used to generate the initial condition. The ranges presented in the table represent the minimum and maximum number of points and iterations used to generate the largest quadrature rules for each dimension for 10 different initial candidate meshes. We observe only very minor variation in the number of points, but more sensitivity in the number of iterations is observed.

The number of iterations needed by the local optimization to compute the quadrature rules was always a reasonable number: at most a small multiple of the number of moments being matched. However, the largest rules did take several hours to compute on a single core machine due to the cost of running the optimizer (we used scipy.optimize.least_squares) and forming the Jacobian matrix of the objective.

Profiling the simulation revelated that long run times were due to the cost of evaluating the Jacobian (17) of the objective function, and the singular value decomposition repeatedly called by the optimizer. This run-time could probably be reduced by using a Krylov-based nonlinear least squares algorithm, and by computing the Jacobian in parallel. We expect that such improvements would allow one to construct such rules in higher dimensions in a reasonable amount of time; however, even our unoptimized code was able to compute such rules on a single core in a moderate number of dimensions.

To compare the performance of our reduced quadrature rules to existing algorithms consider the the corner-peak function often used to test quadrature methods ,

 fCP(\Vx)=(1+d∑i=1cixi)−(d+1),\Vx∈Γ=[0,1]d (18)

The coefficients can be used to control the variability over and the effective dimensionality of this function. We generate each randomly from the interval and subsequently normalize them so that . For a uniform random variable on , the mean and variance of can be computed analytically; these values correspond to computing integrals over with the uniform measure.

Figure  1 plots the convergence of the error in the mean of the corner-peak function using the reduced quadrature rules generated for the uniform probability measure on with . Because generating the initial condition requires randomly sampling over the domain of integration for a given degree, we generate 10 quadrature rules and plot both the median error and the minimum and maximum errors. There is sensitivity to the random candidate set used to generate the initial condition, however for each of the 10 repetitions a quadrature rule was always found and the error in the approximation of the mean of using these rules converges exponentially fast.

For a given number of samples the error in the reduced quadrature rule estimate of the mean is significantly lower than the error of a Clenshaw-Curtis-based sparse grid. It is also more accurate than Sobol sequences up to 10 dimensions. Furthermore the reduced quadrature rule is competitive with the Stroud degree 2,3 and 5 cubature rules. However unlike the Stroud rules the polynomial exactness of the reduced quadrature rule is not restricted to low degrees (shown here) nor is it even resticted to total-degree spaces (as discussed in Section 5.3.1).

In Figure  1 we also plot the error in the quadrature rule used as the initial condition for the local optimization used to generated the reduced rule. As expected, the error of the initial condition is larger than the final reduced rule for a given number of points. Moreover it becomes increasingly difficult to find an accurate initial condition as the degree increases. In two-dimensions, as the degree is increased past 13, the error in the initial condition begins to increase. Similar degradation in accuracy also occurs in the other higher dimensional examples. However, even when the accuracy of the initial condition still is extremely low, it still provides a good starting point for the local optimization, allowing us to accurately generate the reduced quadrature rule. Figure 1: Convergence of the error in the mean of the corner-peak function fCP (24) computed using the reduced quadrature rule generated for the uniform measure. Convergence is shown for d=2,3,4,5,10 which are respectively shown left to right starting from the top left. Solid lines represent the median error of the total-degree quadrature rules for 10 different initial conditions. Transparent bands represent the minimum and maximum errors of the same 10 repetitions.

### 5.2 Beyond tensor product measures

Numerous methods have been developed for multivariate quadrature for tensor-product measures, however much less attention has been given to computing quadrature rules for arbitrary measures, for example those that exhibit non-linear correlation between dimensions. In this setting Monte Carlo quadrature is the most popular method of choice.

#### Chemical reaction model

The reduced quadrature method we have developed can generate quadrature rules to integrate over arbitrary measures. Consider the following model of competing species absorbing onto a surface out of a gas phase

 du1dt=az−cx1−4du1u2du2dt=2x2z2−4du1u2du3dt=ez−fu3z=u1+u2+u3,u1(0)=u2(0)=u3(0) (19)

The constants , , , and are fixed at the nominal values , , , and . The parameters and will vary over a domain endowed with a non-tensor-product probability measure . Viewing as a random variable with probability density , we are interested in computing the mean of the mass fraction of the third species at seconds.

We will consider two test problems, problem and problem , each defined by its own domain and measure . We therefore have two rectangular domains and with measures and , respectively. The domains and the measures are defined via affine mapping of a canonical domain and measure:

 \dxμ(x) =Cexp(−(1