Generation and application of multivariate polynomial quadrature rules
Abstract
The search for multivariate quadrature rules of minimal size with a specified polynomial accuracy has been the topic of many years of research. Finding such a rule allows accurate integration of moments, which play a central role in many aspects of scientific computing with complex models. The contribution of this paper is twofold. First, we provide novel mathematical analysis of the polynomial quadrature problem that provides a lower bound for the minimal possible number of nodes in a polynomial rule with specified accuracy. We give concrete but simplistic multivariate examples where a minimal quadrature rule can be designed that achieves this lower bound, along with situations that showcase when it is not possible to achieve this lower bound. Our second main contribution comes in the formulation of an algorithm that is able to efficiently generate multivariate quadrature rules with positive weights on nontensorial domains. Our tests show success of this procedure in up to 20 dimensions. We test our method on applications to dimension reduction and chemical kinetics problems, including comparisons against popular alternatives such as sparse grids, Monte Carlo and quasi Monte Carlo sequences, and Stroud rules. The quadrature rules computed in this paper outperform these alternatives in almost all scenarios.
J.D Jakeman \coremailjdjakem@sandia.gov \fundingSee acknowledgements
Multivariate integration, Momentmatching quadrature, Uncertainty quantification, Dimension reduction
1 Introduction
Let be a domain with nonempty interior. Given a finite, positive measure on and a measurable function , our main goal is computation of
The need to evaluate such integrals arises in many areas, including finance, stochastic programming, robust design and uncertainty quantification. Typically these integrals are approximated numerically by quadrature rules of the form
(1) 
Examples of common quadrature rules for highdimensional integration are Monte Carlo and quasi Monte Carlo methods [12, 14, 20, 27], and Smolyak integration rules [7, 8, 26]. Quadrature rules are typically designed and constructed in deference to some notion of approximation optimality. The particular approximation optimality that we seek in this paper is based on polynomial approximation.
Our goal is to find a set of quadrature nodes , , and a corresponding set of weights such that
(2) 
where is a multiindex set of size , and is an dimensional polynomial space defined by . (We make this precise later.) can be a relatively “standard” space, such as the space of all variate polynomials up to a given finite degree, or more intricate spaces such as those defined by balls in index space, or hyperbolic cross spaces.
In this paper we will present a method for numerically generating polynomial based cubature rules.

Positive quadrature rules are generated.

Any measure for which moments are computable are applicable. Many existing quadrature methods only apply to tensorproduct measures. Our method constructs quadrature rules for measures with, for example, nonlinear correlation between variables.

Analytical or samplebased moments may be used. In some settings it may be possible to compute moments of a measure exactly, but in other settings only samples from the measure are available. For example, one may wish to integrate a function using Markov Chain Monte Carlogenerated samples from a posterior of a Bayesian inference problem.

A quadrature that is faithful to arbitrary sets of moments may be generated. Many quadrature methods are exact for certain polynomial spaces, for example totaldegree or sparse grid spaces. However, some functions may be more accurately represented by alternative polynomial spaces, such as hyperbolic cross spaces. In these situations it may be more prudent to construct rules that can match a customized set of moments.

Efficient integration of ridge functions is possible. Some highdimensional functions can be represented by a small number of linear combinations of the input variables. In this case it is more efficient to integrate these functions over this lowerdimensional coordinate space. Such a dimensionreducing transformation typically induces a new measure on a nontensorial space of lowerdimensional variables. For example, a highdimensional uniform probability measure on a hypercube may be transformed into a nonuniform measure on a zonotope (a multivariate polygon).
Our algorithm falls into the class of momentmatching methods. There have been some recent attempts at generating quadrature using moment matching via optimization apporaches. These methods frequently either start with a small candidate set and add points until moments are matched [18], or start with a large set of candidate points and reduce them until no more points can be removed without numerically violating the moment conditions [25, 16, 34]. These approaches sometimes go by other names, such as scenario generation or scenario reduction methods.
This paper presents a quadrature/scenario reduction moment matching method based upon the work in [25]. The method in [25] is comprised of two steps. The first step generates a quadrature rule with points, the existence of which is guaranteed by Tchakaloff’s theorem [32]. The second step uses this quadrature rule as an initial guess for a local gradientbased optimization procedure that searches for a quadrature rule with points.
The initial quadrature rule is generated by drawing a large number of points from the domain of integration and then solving an inequalityconstrained linear program to find a quadrature rule with positive weights that matches all desired moments. Similar approaches can be found in [1, 4]. The points comprising this initial quadrature rule are then grouped into clusters. A new approximate quadrature rule is then formed by combining points and weights within a cluster into a single point and weight.
Our numerical method differs from [25] in the following ways: (i) we use a different formulation of the linear program to generate the initial quadrature rule; (ii) we show numerically that this quadrature only needs to be solved with very limited accuracy, (iii) we present an automated way of selecting the clusters from the initial rule – in [25] no method for clustering points is presented; (iv) we provide extensive numerical testing of our method in a number of settings.
The theoretical contributions of this paper include a lower bound on the number of points in the final quadrature rule, and a definition of quasioptimality in terms of this bound. We also provide provide a simple means of testing whether the quadrature rules generated by any method are quasioptimal.
The remainder of the paper is structured as following. In Section 2 we introduce some nomenclature, define quasioptimality, and present a means to verify if a quadrature rule is quasioptimal. We also use these theoretical tools to analytically derive quasioptimal rule for additive functions. In Section 4 we detail out algorithm for generating quadrature rules. We then present a wide range of numerical examples, in Section 5, which explore the properties of the quadrature rules that our numerical algorithm can generate.
1.1 Existing quadrature rules
We give a concise description of some existing quadrature rules. There are numerous approaches for computing polynomialbased quadrature rules so our goal is not to be comprehensive, but instead to concentrate on rules that are related to our proposed method.
For univariate functions Gaussian quadrature is one of the most commonly used approaches. Nodes associated to Gaussian quadrature rules are prescribed by roots of the polynomials orthogonal to the measure [31]. The resulting quadrature rule is always positive (meaning the weights are all positive) and the rules are optimal in the sense that given a Gaussian quadrature rule of degree of exactness , no rule with fewer points can be used to integrate all degree polynomials.
When integrating multivariate functions with respect to tensorproduct measures on a hypercube accurate and efficient quadrature rules can be found by taking tensorproduct of onedimensional Gaussian quadrature rules. These rules will optimal for functions that can be represented exactly by tensorproduct polynomial spaces of degree . However the use of such quadrature rules is limited to a small number of dimensions, say 34, because the number of the points in the rule grows exponentially with dimension.
Sparse grid quadrature methods have been successfully used as an alternative to tensorproduct quadrature for multivariate functions [7, 8, 26]. The number of points in sparse grid rules grow logarithmically with dimension for a fixed level of accuracy. Unlike tensorproduct rules however the quadrature weights will not all be positive. Sparse grid quadrature delays the curse of dimensionality by focusing on integrating polynomial spaces that have highdegree univariate terms but lowdegree interaction terms.
Highdimensional cubature rules can often be more effective than sparse grid rules when integrating functions that are well represented by totaldegree polynomials. These rules have positive weights and typically consist of a very small number of points. However such highly effective cubature rules are difficult to construct and are have only been derived for a specific set of measures, integration domains and polynomial degree of exactness [13, 29, 36].
Tensorproduct integration schemes generally produce approximations to the integral (1) whose accuracy scales like , where indicates the maximum order of continuous partial derivatives in any direction [21]. This convergence rate illustrates both the blessing of smoothness – regularity accelerates convergence exponentially – along with the curse of dimensionality – convergence is exponentially hampered by large dimension. For sparse grids consisting of univariate quadrature rules with points have a similar error, scaling like [7].
A contrasting approach is given by Monte Carlo (MC) and quasi Monte Carlo (QMC) approaches. These approximations produce convergence rates of and , respectively [3]. MC points are random, selected as independent and identicallydistributed realizations of a random variable, and QMC points are deterministically generated as sequences that minimize discrepancy.
2 Quasioptimality in multivariate polynomial quadrature
The goal of this section is to mathematically codify relationships between the number of quadrature points and the dimension of the polynomial space . In particular, we provide a theoretical lower bound for for a fixed . This lower bound can be used to define a notion of optimality in polynomial quadrature rules. We also provide related characterizations of optimal quadrature rules in both one and several dimensions.
2.1 Notation
With fixed, we consider a positive measure on with support . This support may be unbounded. The inner product and norm are defined as
To prevent degeneracy of polynomials with respect to and to ensure finite moments of , we assume
(3) 
for all algebraic polynomials . One can guarantee the lower inequality if, for example, there is any open Euclidean ball in inside which has a density function.
A point in has coordinate representation ; for a multiindex with coordinates , we have and . We let denote a multiindex set of finite size .
If are any two multiindices and , we define , , and component wise. The partial ordering is true if all the component wise conditions are true. We define the following standard properties and operations on multiindex sets: {definition} Let and be two multiindex sets, and let .
 [labelwidth=3.5cm,itemindent=1em,leftmargin=!]

The sum of two multiindex sets is

The expression is defined as
Note that this need not be a set of multiindices.

is a downward closed set if implies that for all .

For any finite , is the smallest downward closed set containing ,

is convex if for any and any , then .
With our notation, scalar multiplication is not consistent with Minkowski addition. In particular we have in general. If is downward closed, then .
The polynomial space is defined by a given multiindex :
and has dimension in under the assumption (3). Note that we make no particular assumptions on the structure of . I.e., we do not assume is downward closed, but much of our theory and all our numerical examples use downward closed index sets.
On , we will make use of the norm for , and the associated ball of radius to define index sets. These sets are defined by
The norms are defined for , , and by, respectively,
The index sets sets are all downward closed, and are convex if . The set equals when .
2.2 Quasioptimal quadrature
With fixed, the theoretical and computational tractability of computing a solution to (2) depends on , , and . In particular, it is unreasonable to expect that can be arbitrarily large; if this were true then can be approximated to arbitrary accuracy by a sum of Dirac delta distributions, which would allow us to violate the lower inequality in (3). There is a strong heuristic that motivates the possible size of : The set represents degrees of freedom, and varying () represents an additional degrees of freedom. For an dimensional space , (2) can be ensured with constraints. Thus we expect for general that (and thus ) must be small enough to satisfy
(4) 
We will show that this heuristic does not always produce a faithful bound on sizes of quadrature rules; we provide instead a strict lower bound on the number of points in a quadrature rule for a given . To proceed we require the notion of ‘halfsets’. {definition} Let be a finite, nontrivial, downwardclosed set. A multiindex set is

a halfset for if

a maximal halfset for if it is a halfset of maximal size. I.e, if , with
(5)
We call the maximal halfset size of . Recall that so that the terminology “half” should not be conflated with the operation of halving each index in an index set. If is not downward closed, it may not have any half sets. However, all nontrivial downwardclosed sets have at least one nontrivial halfset (the zero set is one such half set). Thus maximal half sets always exist in this case, but they are not necessarily unique. An example in illustrates this nonuniqueness:
We have , and both and are maximal half sets for .
If is both downwardclosed and convex, then its maximal halfset is unique and easily computed. {theorem} Let be convex and downwardclosed. Then its maximal halfset is unique, given by .
Proof.
Let be any halfset for . Then for any , we have , so that . Thus , showing that any halfset must be contained in .
Now let . Then . Thus, by convexity
where the set inclusion holds by convexity of . Thus, is itself a halfset; by the previous observation that it also dominates any halfset, then it must be the unique largest (maximal) halfset. ∎
We can now state one of the main results of this section: The number in (5) is a lower bound on the size of any quadrature rule satisfying (2). {theorem} Let be a finite downwardclosed set, and suppose that an point quadrature rule exists satisfying (2). Then , with the maximal halfset size defined in (5).
Proof.
Let be any set satisfying and , with defined in (5). We choose any size orthonormal basis for :
Note that and have monomial expansions
for fixed and some constants and . This implies
Since is a half set for , then and is therefore exactly integrated by the quadrature rule (2).
Then consider the matrix defined as
The matrices and are and , respectively. Since the quadrature rule exactly integrates , then , the identity matrix. Thus, the matrix product has rank . If then, e.g., and so it is not possible that . ∎
This result shows that if , then an point quadrature rule satisfying (2) cannot exist. This is nontrivial information. As an example, let , and consider
corresponding to the tensorproduct space of degree 2. Since , the heuristic (4) suggests that it is possible to find a rule with only 3 points. However, let . Then and . Thus, no 3point rule that is accurate on can exist.
Another observation from Lemma 2.2 is that we may justifiably call an point quadrature rule optimal if , in the sense that one cannot achieve the same accuracy with a smaller number of nodes. {definition} Let be a downwardclosed multiindex set. We call an point quadrature rule quasioptimal for if it satisfies (2) with , with defined in (5). We call these sets quasioptimal because, given a quasioptimal rule for , it may be possible to generate a quadrature rule of equal size that is accurate on an index set that is strictly larger than (see Section 5.3.1). Quasioptimal quadrature rules are not necessarily unique and sometimes do not exist. However, the weights for quasioptimal quadrature rules have a precise behavior.
Under the assumption (3), we can find an orthonormal basis for :
Given this orthonormal basis, define
The quantity depends on , , and , but not on the particular basis we have chosen: Through a unitary transform we may map into any other orthonormal basis for , but this transformation leaves the quantity above unchanged. The weights for a quasioptimal quadrature rule on are evaluations of , where is a maximal halfset for . {theorem} Let be a finite downwardclosed set, and let an point quadrature rule be quasioptimal for . Then
(6) 
where is a(ny) maximal half set for .
Proof.
Let , , be the cardinal Lagrange interpolants from the space on the nodes .
(7) 
where the second equality uses the fact that the quadrature rule is exact on . By (7), then is welldefined. With and defined in the proof of Lemma 2.2, then , and both matrices are square (). Thus, is an orthogonal matrix, so that
Taking the diagonal components of the left and righthand side shows that . ∎
3 Theoretical counter/examples of quasioptimal quadrature
To provide further insight into our definition of quasioptimality, we investigate the consequences of the above theoretical results through some examples.
3.1 Univariate rules
The example in this section shows (i) that optimal quadarture rules in one dimension are Gauss quadrature rules, and (ii) among many quasioptimal rules, some may be capable of accurately integrating more polynomials than the others.
Let with . The maximal halfset for is , and thus
We consider two cases, that of even , and of odd .
Suppose is even. Then a quasioptimal quadrature rule has abscissae, and exactly integrates (linearly independent) polynomials. I.e., the point rule exactly integrates polynomials up to degree . It is wellknown that this quadrature rule is unique: it is the GaussChristoffel quadrature rule [31].
With a family of orthonormal polynomials, with , then the abscissae are the zeros of , and the weights are given by (6). This quadrature rule can be efficiently computed from eigenvalues and eigenvectors of the symmetric tridiagonal Jacobi matrix associated with [10]. Thus, this quadrature rule can be computed without resorting to optimization routines.
Now suppose is odd. Then a quasioptimal quadrature rule has abscissae, and exactly integrates polynomials. I.e., the point rule exactly integrates polynomials up to degree . Clearly a GaussChristoffel rule is a quasioptimal rule in this case. However, by our definition there are an (uncountably) infinite number of quasioptimal rules in this case [31]. In particular, for arbitrary , the zero set of the polynomial
(8) 
corresponds to the abscissae of a quasioptimal quadrature rule, with the weights give by (6). Like the GaussChristoffel case, computation of these quadrature rules can be accomplished via eigendecompositions of symmetric tridiagonal matrices [19].
In classical scientific computing scenarios, sets of point rules with polynomial accuracy of degree have been called GaussRadau rules, and are traditionally associated to situations when is an interval and one of the quadrature absciasse is collocated at one of the endpoints of this interval [9].
3.2 Quasioptimal multivariate quadrature
This section furnishes an example where a quasioptimal multivariate quadrature rule can be explicitly constructed.
Consider a tensorial domain and probability measure, i.e., suppose
where are univariate probability measures. We let for any . With the cardinal ’th unit vector in , then is the set of indices of of the form for ; the size of is . Note that this would arise in situations where one seeks to approximate an unknown multivariate function as a sum of univariate functions; this is rarely a reasonable assumption in practice. However, in this case we can explicitly construct optimal quadrature rules.
The heuristic (4) suggests that we can construct a rule satisfying (2) if we use
nodes, which is approximately nodes. However, we can achieve this with fewer nodes, only , independent of . Here the heuristic (4) is too pessimistic when .
Associated to each , we need the corresponding system of orthonormal polynomials and the univariate function. For each , let , denote a orthonormal polynomial family with . Define
Note that for all since each is a probability measure.
Consider the index sets , for , where is the origin in . Each is a maximal halfset for , and in (5) is given by .
To construct a quasioptimal quadrature rule achieving with nodes, we note that the weights , must be given by (6), which holds for any maximal half index set . I.e., it must simultaneously hold for all . Thus,
(9) 
for . This implies in particular that the coordinates for node must satisfy
We can satisfy this condition in certain cases. Suppose ; then and so we can satisfy (9) by setting for all . Thus nodes for a quasioptimal quadrature rule nodes could lie in along the graph of the line defined by
In order to satisfy the integration condition (2) we need distribute the nodes in an appropriate way. Having effectively reduced the problem to one dimension, this is easily done: we choose a Gausstype quadrature rule as described in the previous section. Let the th coordinate of the quadrature rule be the point Gauss quadrature nodes for , i.e.,
This then uniquely defines , and is likely uniquely defined since we have satisfied (9).
Thus a “diagonal” Gauss quadrature rule, , with equal coordinate values for each abscissa, is an optimal rule in this case.
3.3 Quasioptimal multivariate quadrature: nonuniqeness
The previous example with an additional assumption allows to construct distinct quasioptimal quadrature rules. Again take for , and let with identical univariate measures .
To this add the assumption that is a symmetric measure; i.e., for any set , then . In this case the univariate orthogonal polynomials are even (odd) functions if is even (odd). Thus, the set zero set is symmetric around 0, and is always an even function. With the quasioptimal set of nodes defined in the previous section, let
for a fixed but arbitrary sign train . Using the above properties, one can show that the nodes and the weights define a quadrature rule satisfying (2), and of course have the same number of nodes as the quasioptimal rule from the previous section.
By varying the , we can create unique distributions of nodes, thus showing that at least this many quasioptimal rules exist.
3.4 Quasioptimal multivariate quadrature: nonexistence
We again use the setup of Section 3.2, but this time to illustrate that it is possible for quasioptimal to not exist. We consider , and take , and let for two univariate probability measures and . However, this time we let these measures be different:
With our choice of , we have with two maximal halfsets:
In this simple case the explicit  and orthonormal families take the expressions
so that associated to and , respectively, we have the functions
Since by (9) we require , this implies that
(10a)  
However, the condition (2) also implies for our choice of that  
The conditions above imply that and be nodes for the 2point  and Gauss quadrature rules, respectively, which are both unique. Thus, , and . Thus, we have the equality  
(10b) 
We arrive at a contradiction: simultaneous satisfaction of (10a) and (10b) implies all coordinates of the abscissae are 0, which cannot satisfy (2). Thus, no quasioptimal quadrature rule for this example exists.
4 Numerically generating multivariate quadrature rules
In this section we describe our proposed algorithm for generating multivariate quadrature rules; the algorithm generates rules having significantly less points than the number of moments being matched. We will refer to such rules as reduced quadrature rules. We will compare the number of nodes our rules can generate with the lower bound defined in Section 2, along with the heuristic bound (4).
Our method is an adaptation of the method presented in [25]. The authors there showed that one can recover a univariate Gauss quadrature rule as the solution to an infinitedimensional linear program (LP) over nonnegative measures. Let a finite index set be given with , and suppose that is any basis for . In addition, let be a polynomial on such that ; we seek a positive Borel measure on solving
minimize  (11)  
subject to  (12) 
The restriction that is a positive measure will enter as a constraint into the optimization problem. A nontrivial result is that a positive measure solution to this problem exists with . Such a solution immediately yields a positive quadrature rule with nodes and weights given by . Unfortunately, the above optimization problem is NPhard, and so the authors in [25] propose a two step procedure for computing an approximate solution. The first step solves a finitedimensional linear problem similar to (11) to find a point positive quadrature rule with . In the second step, the points are clustered into groups, where is automatically chosen by the algorithm. The clusters are then used to form a set of averaged points and weights, which form an approximate quadrature rule. This approximate quadrature rule is refined using a local gradientbased method to optimize a momentmatching objective. The precise algorithm is outlined in detail in Appendix A.
In this paper we also adopt a similar two step procedure to compute reduced quadrature rules. We outline these two steps in detail in the following section. Pseudo code for generating reduced quadrature rules is presented in Algorithm B contained in Appendix B.
4.1 Generating an initial condition
Let an index set be given with , and suppose that is a basis for . We seek to find a discrete measure by choosing a large candidates mesh, , with points on and solving the minimization problem
(13) 
The optimization is over the scalars . With a vector containing the , the nonzero coefficients then define a quadrature rule with points, . The points corresponding to the nonzero coefficients are the quadrature points and the coefficients themselves are the weights.
This minimization problem as well as the residual based linear program used by [25] become increasingly difficult to solve as the size of and dimension of the space increase. The ability to find a solution is highly sensitive to the candidate mesh. Through extensive experimentation we found that by solving (13) approximately via
(14) 
for some , we were able to find solutions that, although possibly inaccurate with respect to the moment matching criterion, could be used as an initial condition for the local optimization to find an accurate reduced quadrature rule. This is discussed further in Section 5.1. We were not able to find such approximate initial conditions using the linear program used in [25]; we show results supporting this in Section 5.
We solved (14) using least angle regression with a LASSO modification [33] whilst enforcing positivity of the coefficients. This algorithm iteratively adds and removes positive weights until , or until no new weights can be added without violating the positivity constraints. This allows one to drive to small values without requiring an a priori estimate of .
In our examples, the candidate mesh is selected by generating uniformly random samples over the integration domain , regardless of the integration measure. Better sampling strategies for generating the candidate mesh undoubtedly exist, but these strategies will be  and dependent, and are not the central focus of this paper. Our limited experimentation here suggested that there was only marginal benefit from exploring this issue.
4.2 Finding a reduced quadrature rule
Once an initial condition has been found by solving (14) we then use a simple greedy clustering algorithm to generate an initial condition for a local gradientbased momentmatching optimization.
Clustering
The greedy clustering algorithm finds the point with the smallest weight and combines it with its nearest neighbor. These two points are then replaced by a point whose coordinates are a convex combination of the two clustered points, where the convex weights correspond to the weights that are output from the LP algorithm. The newly clustered point set has one less point than before. This process is repeated. ,
We terminate the clustering iterative procedure when a desired number of points is reached. At the termination of the clustering algorithm a set of points and weights , defining an approximate quadrature rule are returned. The algorithm pseudocode describing this greedy clustering algorithm is given in Algorithm B in Appendix B.
The ability to find an accurate reduced quadrature rule is dependent on the choice of the number of specified clusters . As stated in Section 2, there is a strong heuristic that couples the dimension , the number of matched moments , and the number of points . For general and given we set the number of clusters to be
(15) 
As shown in Section 2 this heuristic will not always produce a quadrature rule that exactly integrates all specified moments. Moreover, the heuristic may over estimate the actual number of points in a quasioptimal quadrature rule.
It is tempting to set to the lower bound value , defined in (5). However, a sobering result of our experimentationl is that in all our experiments we were never able to numerically find a quadrature rule with fewer points than that specified by (15); therefore, we could not find any rules with points where . However, we were able to identify situations in which the heuristic underestimated the requisite number of points in a reduced quadrature rule. For example, if one wants to match the moments of a total degree basis of degree one must use at least points. This lower bound is typically violated by the heuristic for lowdegree and highdimension . E.g. for and we have using the heuristic yet the theoretical lower bound for from Theorem 2.2 requires . In this case our theoretical analysis sets a lower bound for that is more informative than the heuristic (15).
Localoptimization
The approximate quadrature rule , generated by the clustering algorithm is used as an initial guess for the following local optimization
minimize  (16a)  
subject to  (16b) 
The objective defined by (16a) is a polynomial and its gradient can easily be computed
(17a)  
(17b) 
for and , and where .
We use a gradientbased nonlinear least squares method to solve the local optimization problem. Defining the optimization tolerance , the procedure exits when either or , where and are the values of the objective at steps and respectively, and is the value of the gradient scaled to respect the constraints (16b).
This local optimization procedure in conjunction with the cluster based initial guess can frequently find a quadrature rule of size as determined by the degree of freedom heuristic (15). However in some cases a quadrature rule with points cannot be found for very high accuracy requirements (in all experiments we say that a quadrature rule is found found if the iterations yield ). In these situations one might be able to find an point rule using another initial condition. (Recall the initial condition provided by minimization is found using a randomized candidate mesh.) However we found it more effective to simply increment the size of the desired quadrature rule to . This can be done repeatedly until a quadrature rule with the desired accuracy is reached. While one fears that one may increment by a large number using this procedure, we found that no more than a total of of increments were ever needed. This is described in more detail in Section 5.1.
Note that both generating the initial condition using (14) and solving the local optimization problem (16a) involve matching moments. We assume in this paper that moments are available and given; in our examples we compute these moments analytically (or to machine precision with highorder quadrature), unless otherwise specified.
5 Numerical results
In this section we will explore the numerical properties of the multivariate quadrature algorithm we have proposed in Section 4. First we will numerically compare the performance of our algorithm with other popular quadrature strategies for tensor product measures. We will then demonstrate the flexibility our our approach for computing quadrature rules for nontensorproduct measures, for which many existing approaches are not applicable. Finally we will investigate the utility of our reduced quadrature rules for highdimensional integration by leveraging selected moment matching and dimension reduction.
Our tests will compare the method in this paper (Section 4), which we call Reduced Quadrature, to other standard quadrature methods. We summarize these methods below.

Monte Carlo – The integration rule
where are independent and identicallydistributed samples from the probability measure .

Sparse grid – The integration rule for the uniform measure over given by a multivariate sparse grid rule generated from a univariate ClenshawCurtis quadrature rule [7].

Cubature – Stroud cubature rules of degree 2, 3, and 5 [29]. These rules again integrate on with respect to the uniform measure.

Sobol – A quasiMonte Carlo Sobol sequence [27] for approximating integrals on using the uniform measure.

Reduced quadrature – The method in this paper, described in Section 4.

quadrature – The “initial guess” for the Reduced quadrature algorithm, using the solution of the LP algorithm summarized in Section 4.1.
The Sparse grid, Sobol, and cubature methods are usually applicable only for integrating over with the uniform measure. When has a density with support , we will use these methods to evaluate by integrating with respect to the uniform measure on , where we assign on .
5.1 Tensor product measures
Our reduced quadrature approach can generate quadrature rules for nontensorproduct measures but in this section we investigate the performance of our algorithm in the more standard setting of tensorproduct measures.
We begin by discussing the computational cost of computing our quadrature rules. Specifically we compute quadrature rules for the uniform probability measure on in up to 10 dimensions for all totaldegree spaces with degree at most 20 or with subspace dimension at most 3003. In all cases we were able to generate an efficient quadrature rule using our approach for which the number of quadrature points was equal to or only slightly larger ( points) than the number of points suggested by the heuristic (15). The number of each points in the quadrature rule, the number of moments matched , the size of a quasioptimal rule , the number of points suggested by the heuristic (15), and the number of iterations taken by nonlinear least squares algorithm, is presented in in Table 1.
The final two columns of Table 1, the number of points in the the reduced quadrature rules and the number of iterations used by the local optimization are given as ranges because the final result is sensitive to the random candidate sets used to generate the initial condition. The ranges presented in the table represent the minimum and maximum number of points and iterations used to generate the largest quadrature rules for each dimension for 10 different initial candidate meshes. We observe only very minor variation in the number of points, but more sensitivity in the number of iterations is observed.
The number of iterations needed by the local optimization to compute the quadrature rules was always a reasonable number: at most a small multiple of the number of moments being matched. However, the largest rules did take several hours to compute on a single core machine due to the cost of running the optimizer (we used scipy.optimize.least_squares) and forming the Jacobian matrix of the objective.
Profiling the simulation revelated that long run times were due to the cost of evaluating the Jacobian (17) of the objective function, and the singular value decomposition repeatedly called by the optimizer. This runtime could probably be reduced by using a Krylovbased nonlinear least squares algorithm, and by computing the Jacobian in parallel. We expect that such improvements would allow one to construct such rules in higher dimensions in a reasonable amount of time; however, even our unoptimized code was able to compute such rules on a single core in a moderate number of dimensions.
To compare the performance of our reduced quadrature rules to existing algorithms consider the the cornerpeak function often used to test quadrature methods [6],
(18) 
The coefficients can be used to control the variability over and the effective dimensionality of this function. We generate each randomly from the interval and subsequently normalize them so that . For a uniform random variable on , the mean and variance of can be computed analytically; these values correspond to computing integrals over with the uniform measure.
Figure 1 plots the convergence of the error in the mean of the cornerpeak function using the reduced quadrature rules generated for the uniform probability measure on with . Because generating the initial condition requires randomly sampling over the domain of integration for a given degree, we generate 10 quadrature rules and plot both the median error and the minimum and maximum errors. There is sensitivity to the random candidate set used to generate the initial condition, however for each of the 10 repetitions a quadrature rule was always found and the error in the approximation of the mean of using these rules converges exponentially fast.
For a given number of samples the error in the reduced quadrature rule estimate of the mean is significantly lower than the error of a ClenshawCurtisbased sparse grid. It is also more accurate than Sobol sequences up to 10 dimensions. Furthermore the reduced quadrature rule is competitive with the Stroud degree 2,3 and 5 cubature rules. However unlike the Stroud rules the polynomial exactness of the reduced quadrature rule is not restricted to low degrees (shown here) nor is it even resticted to totaldegree spaces (as discussed in Section 5.3.1).
In Figure 1 we also plot the error in the quadrature rule used as the initial condition for the local optimization used to generated the reduced rule. As expected, the error of the initial condition is larger than the final reduced rule for a given number of points. Moreover it becomes increasingly difficult to find an accurate initial condition as the degree increases. In twodimensions, as the degree is increased past 13, the error in the initial condition begins to increase. Similar degradation in accuracy also occurs in the other higher dimensional examples. However, even when the accuracy of the initial condition still is extremely low, it still provides a good starting point for the local optimization, allowing us to accurately generate the reduced quadrature rule.
5.2 Beyond tensor product measures
Numerous methods have been developed for multivariate quadrature for tensorproduct measures, however much less attention has been given to computing quadrature rules for arbitrary measures, for example those that exhibit nonlinear correlation between dimensions. In this setting Monte Carlo quadrature is the most popular method of choice.
Chemical reaction model
The reduced quadrature method we have developed can generate quadrature rules to integrate over arbitrary measures. Consider the following model of competing species absorbing onto a surface out of a gas phase
(19) 
The constants , , , and are fixed at the nominal values , , , and . The parameters and will vary over a domain endowed with a nontensorproduct probability measure . Viewing as a random variable with probability density , we are interested in computing the mean of the mass fraction of the third species at seconds.
We will consider two test problems, problem and problem , each defined by its own domain and measure . We therefore have two rectangular domains and with measures and , respectively. The domains and the measures are defined via affine mapping of a canonical domain and measure: