Adaptive-Sparse Polynomial Dimensional Decomposition Methods for High-Dimensional Stochastic Computing
This article presents two novel adaptive-sparse polynomial dimensional decomposition (PDD) methods for solving high-dimensional uncertainty quantification problems in computational science and engineering. The methods entail global sensitivity analysis for retaining important PDD component functions, and a full- or sparse-grid dimension-reduction integration or quasi Monte Carlo simulation for estimating the PDD expansion coefficients. A unified algorithm, endowed with two distinct ranking schemes for grading component functions, was created for their numerical implementation. The fully adaptive-sparse PDD method is comprehensive and rigorous, leading to the second-moment statistics of a stochastic response that converges to the exact solution when the tolerances vanish. A partially adaptive-sparse PDD method, obtained through regulated adaptivity and sparsity, is economical and is, therefore, expected to solve practical problems with numerous variables. Compared with past developments, the adaptive-sparse PDD methods do not require its truncation parameter(s) to be assigned a priori or arbitrarily. The numerical results reveal that an adaptive-sparse PDD method achieves a desired level of accuracy with considerably fewer coefficients compared with existing PDD approximations. For a required accuracy in calculating the probabilistic response characteristics, the new bivariate adaptive-sparse PDD method is more efficient than the existing bivariately truncated PDD method by almost an order of magnitude. Finally, stochastic dynamic analysis of a disk brake system was performed, demonstrating the ability of the new methods to tackle practical engineering problems.
keywords:ANOVA, HDMR, PDD, stochastic dynamics, uncertainty quantification
Uncertainty quantification, an emerging multidisciplinary field blending physical and mathematical sciences, characterizes the discrepancy between model-based simulations and physical reality in terms of the statistical moments, probability law, and other relevant properties of a complex system response. For practical applications, encountering a large number of input random variables is not uncommon, where an output function of interest, defined algorithmically via expensive finite-element analysis (FEA) or similar numerical calculations, is all too often expensive to evaluate. The most promising stochastic methods available today are perhaps the collocation deb01 (); zabaras07 () and polynomial chaos expansion (PCE) ghanem91 (); xiu02 () methods, including sparse-grid techniques holtz08 (), which have found many successful applications. However, for truly high-dimensional systems, they require astronomically large numbers of terms or coefficients, succumbing to the curse of dimensionality bellman57 (). Therefore, alternative computational methods capable of exploiting low effective dimensions of multivariate functions, such as the polynomial dimensional decomposition (PDD) method, are most desirable. Readers, not familiar with but interested in PDD, are referred to the authors’ past works rahman08 (); rahman09 (); rahman10 (); rahman11 ().
For practical applications, the PDD must be truncated with respect to and , where and define the largest degree of interactions among input variables and largest order of orthogonal polynomials, respectively, retained in a concomitant approximation. These truncation parameters depend on the dimensional structure and nonlinearity of a stochastic response. The higher the values of and , the higher the accuracy, but also the computational cost that is endowed with an th- or th-order polynomial computational complexity. However, the dimensional hierarchy or nonlinearity, in general, is not known a priori. Therefore, indiscriminately assigning the truncation parameters is not desirable, nor is it possible to do so when a stochastic solution is obtained via complex numerical algorithms. In which case, one must perform these truncations automatically by progressively drawing in higher-variate or higher-order contributions as appropriate. Furthermore, all -variate component functions of PDD may not contribute equally or even appreciably to be considered in the resulting approximation. Hence, a sparse approximation, expelling component functions with negligible contributions, should be considered as well.
Addressing some of the aforementioned concerns have led to adaptive versions of the cut-high-dimensional model representation (cut-HDMR) zabaras10 () and the anchored decomposition karniadakis12 (), employed in conjunction with the sparse-grid collocation methods, for solving stochastic problems in fluid dynamics. Several adaptive variants of the PCE blatman08 (); li98 (); wan05 () method have also appeared. It is important to clarify that the cut-HDMR and anchored decompositions are the same as the referential dimensional decomposition (RDD) rahman11c (); rahman11b (). Therefore, both adaptive methods essentially employ RDD for multivariate function approximations, where the mean values of random input are treated as the reference or anchor point a premise originally proposed by Xu and Rahman xu05 (). The developments of these adaptive methods were motivated by the fact that an RDD approximation requires only function evaluations, as opposed to high-dimensional integrals required for an ANOVA Dimensional Decomposition (ADD) approximation. However, a recent error analysis rahman11b () reveals sub-optimality of RDD approximations, meaning that an RDD approximation, regardless of how the reference point is chosen, cannot be better than an ADD approximation for identical degrees of interaction. The analysis also finds ADD approximations to be exceedingly more precise than RDD approximations at higher-variate truncations. In addition, the criteria implemented in existing adaptive methods are predicated on retaining higher-variate component functions by examining the second-moment properties of only univariate component functions, where the largest degree of interaction and polynomial order in the approximation are still left to the user’s discretion, instead of being determined automatically based on the problem being solved. Therefore, more intelligently derived adaptive-sparse approximations and decompositions rooted in ADD or PDD should be explored by developing relevant criteria and acceptable error thresholds. These enhancements, some of which are indispensable, should be pursued without sustaining significant additional cost.
This paper presents two new adaptive-sparse versions of the PDD method – the fully adaptive-sparse PDD method and a partially adaptive-sparse PDD method – for solving high-dimensional stochastic problems commonly encountered in computational science and engineering. The methods are based on (1) variance-based global sensitivity analysis for defining the pruning criteria to retain important PDD component functions; (2) a full- or sparse-grid dimension-reduction integration or quasi Monte Carlo simulation (MCS) for estimating the PDD expansion coefficients. Section 2 briefly describes existing dimensional decompositions, including PDD and its -variate, th-order approximation, to be contrasted with the proposed methods. Two adaptive-sparse PDD methods are formally presented in Section 3, along with a computational algorithm and a flowchart for numerical implementation of the method. Two different approaches for calculating the PDD coefficients, one emanating from dimension-reduction integration and the other employing quasi MCS, are explained in Section 4. Section 5 presents three numerical examples for probing the accuracy, efficiency, and convergence properties of the proposed methods, including a comparison with the existing PDD methods. Section 6 reports a large-scale stochastic dynamics problem solved using a proposed adaptive-sparse method. Finally, conclusions are drawn in Section 7.
2 Dimensional Decompositions
Let , , , and represent the sets of positive integer (natural), non-negative integer, real, and non-negative real numbers, respectively. For , denote by the -dimensional Euclidean space, by the -dimensional multi-index space, and by the set of real-valued matrices. These standard notations will be used throughout the paper.
Let be a complete probability space, where is a sample space, is a -field on , and is a probability measure. With representing the Borel -field on , , consider an -valued random vector , which describes the statistical uncertainties in all system and input parameters of a high-dimensional stochastic problem. The probability law of is completely defined by its joint probability density function . Assuming independent coordinates of , its joint probability density is expressed by a product of marginal probability density functions of , , defined on the probability triple with a bounded or an unbounded support on . For a given , defines the marginal density function of .
2.1 ANOVA Dimensional Decomposition
Let ), a real-valued, measurable transformation on , define a stochastic response to a high-dimensional random input and represent a Hilbert space of square-integrable functions with respect to the induced generic measure supported on . The ANOVA dimensional decomposition, expressed by the recursive form efron81 (); rahman11b (); sobol03 ()
is a finite, hierarchical expansion in terms of its input variables with increasing dimensions, where is a subset with the complementary set and cardinality , and is a -variate component function describing a constant or the interactive effect of , , a subvector of , on when or . The summation in Equation (1) comprises terms, with each term depending on a group of variables indexed by a particular subset of , including the empty set .
The ADD component functions , , have two remarkable properties: (1) the component functions, , , have zero means; and (2) two distinct component functions and , where , , and , are orthogonal rahman11b (). However, the ADD component functions are difficult to obtain, because they require calculation of high-dimensional integrals.
2.2 Referential Dimensional Decomposition
Consider a reference point and the associated Dirac measure . The referential dimensional decomposition is created when replaces the probability measure in Equations (1)-(3), leading to the recursive form
also known as cut-HDMR rabitz99 (), anchored decomposition hickernell96 (), and anchored-ANOVA decomposition griebel10 (), with the latter two referring to the reference point as the anchor. Xu and Rahman introduced Equations (4)-(6) with the aid of Taylor series expansion, calling them dimension-reduction xu04 () and decomposition xu05 () methods for statistical moment and reliability analyses, respectively, of mechanical systems. Compared with ADD, RDD lacks orthogonal features, but its component functions are easier to obtain as they only involve function evaluations at a chosen reference point.
2.3 Polynomial Dimensional Decomposition
Let be a set of orthonormal polynomial basis functions in the Hilbert space that is consistent with the probability measure of , where . For a given , , , denote a product probability triple by , and the associated space of square integrable -variate component functions of by , which is a Hilbert space. Since the joint density of is separable (independence), i.e., , the product polynomial , where , a -dimensional multi-index with -norm , constitutes an orthonormal basis in .
providing an exact, hierarchical expansion of in terms of an infinite number of coefficients or orthonormal polynomials. All component functions , , in Equation (7) have zero means and satisfy the orthogonal properties of the ADD. Therefore, PDD can be viewed as the polynomial version of ADD, inheriting all desirable properties of ADD.
2.4 Truncated Dimensional Decompositions
The three dimensional decompositions ADD, RDD and PDD are grounded on a fundamental conjecture known to be true in many real-world applications: given a high-dimensional function , its -variate component functions decay rapidly with respect to , leading to accurate lower-variate approximations of . Indeed, given the integers and for all , the truncated dimensional decompositions
respectively, describe -variate ADD, RDD, and PDD approximations, which for include interactive effects of at most input variables , , on . It is elementary to show that when and/or , , , and converge to in the mean-square sense, generating a hierarchical and convergent sequence of approximation of from each decomposition.
2.4.1 ADD and RDD Errors
For ADD or RDD to be useful, what are the approximation errors committed by and in Equations (10) and (11)? More importantly, for a given , which approximation between ADD and RDD is better? Since the RDD approximation depends on the reference point , no analytical error analysis is possible if is deterministic or arbitrarily chosen. However, if follows the same probability measure of , then the error committed by an RDD approximation on average can be compared with the error from an ADD approximation, as follows.
Let be a random vector with the joint probability density function of the form , where is the marginal probability density function of the th coordinate of . Define two second-moment errors
committed by the -variate ADD and RDD approximations, respectively, of . Then the lower and upper bounds of the expected error from the -variate RDD approximation, expressed in terms of the error from the -variate ADD approximation, are
See Theorem 4.12 and Corollary 4.13 of Rahman rahman11b (). ∎
Theorem 1 reveals that the expected error from the univariate () RDD approximation is at least four times larger than the error from the univariate ADD approximation. In contrast, the expected error from the bivariate () RDD approximation can be eight or more times larger than the error from the bivariate ADD approximation. Given an arbitrary truncation, an ADD approximation is superior to an RDD approximation. In addition, the RDD approximations may perpetrate very large errors at upper bounds when there exist a large number of variables and appropriate conditions. Therefore, existing adaptive methods zabaras10 (); karniadakis12 () anchored in RDD approximations should be used with caveat. Furthermore, the authors advocate using PDD for adaptivity, but doing so engenders its own computational challenges, to be explained in the forthcoming sections.
2.4.2 Statistical Moments of PDD
Applying the expectation operator on and and noting the zero-mean and orthogonal properties of PDD component functions, the mean rahman10 ()
of the -variate, th-order PDD approximation matches the exact mean , regardless of or , and the approximate variance rahman10 ()
is calculated as the sum of squares of the expansion coefficients from the -variate, th-order PDD approximation of . Clearly, the approximate variance approaches the exact variance rahman10 ()
of when and . The mean-square convergence of is guaranteed as , and its component functions are all members of the associated Hilbert spaces.
The -variate, th-order PDD approximation in Equation (12) contains
number of PDD coefficients and corresponding orthonormal polynomials. Therefore, the computational complexity of a truncated PDD is polynomial, as opposed to exponential, thereby alleviating the curse of dimensionality to some extent.
Constructing a PDD approximation by pre-selecting and/or , unless they are quite small, is computationally intensive, if not impossible, for high-dimensional uncertainty quantification. In other words, the existing PDD is neither scalable nor adaptable, which is crucial for solving industrial-scale stochastic problems. A requisite theoretical basis and innovative numerical algorithms for overcoming these limitations are presented in Section 3.
The PDD approximation and its second-moment analysis require the expansion coefficients , which, according to their definition in Equation (8), involve various -dimensional integrals over . For large , a full numerical integration employing an -dimensional tensor product of a univariate quadrature formula is computationally prohibitive. This is one drawback of ADD and PDD, since their component functions entail calculating high-dimensional integrals. Therefore, novel dimension-reduction integration schemes or sampling techniques, to be described in Section 4, are needed to estimate the coefficients efficiently.
2.4.3 PDD versus PCE Approximations
in terms of random orthonormal polynomials , , , of input variables with increasing dimensions, where and , , , are the PDD expansion coefficients. In contrast, a th-order PCE approximation of , where , has the representation ghanem91 ()
in terms of random polynomial chaoses , , of input variables with increasing orders, where and are the PCE expansion coefficients. The polynomial chaoses are various combinations of tensor products of sets of univariate orthonormal polynomials. Therefore, both expansions share the same orthonormal polynomials, and their coefficients require evaluating similar high-dimensional integrals.
The PDD and PCE when truncated are not the same. In fact, two important observations jump out readily. First, the terms in the PCE approximation are organized with respect to the order of polynomials. In contrast, the PDD approximation is structured with respect to the degree of interaction between a finite number of random variables. Therefore, significant differences may exist regarding the accuracy, efficiency, and convergence properties of their truncated sum or series. Second, if a stochastic response is highly nonlinear, but contains rapidly diminishing interactive effects of multiple random variables, the PDD approximation is expected to be more effective than the PCE approximation. This is because the lower-variate (univariate, bivariate, etc.) terms of the PDD approximation can be just as nonlinear by selecting appropriate values of in Equation (20). In contrast, many more terms and expansion coefficients are required to be included in the PCE approximation to capture such high nonlinearity.
In reference to a past study rahman11 (), consider two mean-squared errors, and , owing to the -variate, th-order PDD approximation and th-order PCE approximation , respectively, of . For a class of problems where the interactive effects of input variables on a stochastic response get progressively weaker as , then the PDD and PCE errors for identical expansion orders can be weighed against each other. For this special case, set and assume that , where , , . Then it can be shown that , demonstrating larger error from the PCE approximation than from the PDD approximation rahman11 (). In the limit, when , , regardless of the values of the expansions coefficients. In other words, the -variate, th-order PDD approximation cannot be worse than the th-order PCE approximation. When and , , , , are not negligible and arbitrary, numerical convergence analysis is required for comparing these two errors. Indeed, numerical analyses of mathematical functions or simple dynamic systems reveal markedly higher convergence rates of the PDD approximation than the PCE approximation rahman11 (). From the comparison of computational efforts, required to estimate with the same precision the frequency distributions of complex dynamic systems, the PDD approximation can be significantly more efficient than the PCE approximation rahman11 ().
3 Proposed Adaptive-Sparse PDD Methods
3.1 Global Sensitivity Indices
The global sensitivity analysis quantifies how an output function of interest is influenced by individual or subsets of input variables, illuminating the dimensional structure lurking behind a complex response. Indeed, these sensitivity indices have been used to rank variables, fix unessential variables, and reduce dimensions of large-scale problems rahman11d (); sobol01 (). The authors propose to exploit these indices, developed in conjunction with PDD, for adaptive-sparse PDD approximations as follows.
representing the fraction of the variance of contributed by the ADD component function . Since , there exist such indices, adding up to . Applying the Fourier-polynomial approximation of , that is, Equation (7), and noting the properties of orthonormal polynomials, the component variance
of is the sum of squares of its PDD expansion coefficients. When the right side of Equation (23) is truncated at , where , and then used to replace the numerator of Equation (22), the result is an th-order approximation
which approaches as . Given , consider two approximate global sensitivity indices and for such that . Then the normalized index, defined by
represents the relative change in the approximate global sensitivity index when the largest polynomial order increases from to . The sensitivity indices and provide an effective means to truncate the PDD in Equation (9) both adaptively and sparsely.
3.2 The Fully Adaptive-Sparse PDD Method
Let and denote two non-negative error tolerances that specify the minimum values of and , respectively. Then a fully adaptive-sparse PDD approximation
of is formed by the subset of PDD component functions, satisfying two inclusion criteria: (1) , and (2) for all and . The first criterion requires the contribution of an -th order polynomial approximation of towards the variance of to exceed in order to be accommodated within the resultant truncation. The second criterion identifies the augmentation in the variance contribution from evoked by a single increment in the polynomial order and determines if it surpasses . In other words, these two criteria ascertain which interactive effects between two or more input random variables are retained and dictate the largest order of polynomials in a component function, formulating a fully adaptive-sparse PDD approximation.
When compared with the PDD in Equation (9), the adaptive-sparse PDD approximation in Equation (26) filters out the relatively insignificant component functions with a scant compromise on the accuracy of the resulting approximation. Furthermore, there is no need to pre-select the truncation parameters of the existing PDD approximation. The level of accuracy achieved by the fully adaptive-sparse PDD is meticulously controlled through the tolerances and . The lower the tolerance values, the higher the accuracy of the approximation. It is elementary to show that the mean-squared error in the fully adaptive-sparse PDD approximation disappears when the tolerances vanish, that is, approaches as , .
3.3 A Partially Adaptive-Sparse PDD Method
Based on the authors’ past experience, an -variate PDD approximation, where , is adequate, when solving real-world engineering problems, with the computational cost varying polynomially (-order) with respect to the number of variables rahman08 (); rahman09 (). As an example, consider the selection of for solving a stochastic problem in 100 dimensions by a bivariate PDD approximation, comprising bivariate component functions. If all such component functions are included, then the computational effort for even a full bivariate PDD approximation may exceed the computational budget allocated to solving this problem. But many of these component functions contribute little to the probabilistic characteristics sought and can be safely ignored. Similar conditions may prevail for higher-variate component functions. Henceforth, define an -variate, partially adaptive-sparse PDD approximation
of , which is attained by subsuming at most -variate component functions, but fulfilling two relaxed inclusion criteria: (1) for , and (2) for . Again, the same two criteria are used for the degree of interaction and the order of orthogonal polynomial, but the truncations are restricted to at most -variate component functions of .
An -variate, partially adaptive-sparse PDD approximation behaves differently from the -variate, th-order PDD approximation. While the latter approximation includes a sum containing at most -variate component functions, the former approximation may or may not include all such component functions, depending on the tolerance . For , an -variate, partially adaptive-sparse PDD will again trim the component functions with meager contributions. However, unlike converging to , converges to the -variate ADD approximation , when , . If , then both partially and fully adaptive-sparse PDD approximations coincide for identical tolerances.
As , in the mean square sense. Given a rate at which , the variance of an -variate ADD component function, decreases with , what can be inferred on how fast converges to ? Proposition 1 and subsequent discussions provide some insights.
If the variance of a zero-mean ADD component function diminishes according to , where , and and are two real-valued constants, then the mean-squared error committed by , , is
When the equality holds, decays strictly monotonically with respect to for any rate parameter . The higher the value of , the faster converges to in the mean-square sense.
3.4 Stochastic Solutions
3.4.1 Second-Moment Properties
Applying the expectation operator on and and recognizing the zero-mean and orthogonal properties of PDD component functions, the means
of fully and partially adaptive-sparse PDD approximations both also agree with the exact mean for any , , and . However, the respective variances, obtained by applying the expectation operator on and , vary according to
where the squares of the expansion coefficients are summed following the same two pruning criteria discussed in the preceding subsections. Equations (29)-(31) provide closed-form expressions of the approximate second-moment properties of any square-integrable function in terms of the PDD expansion coefficients.
When , the right sides of Equations (30) and (18) coincide, whereas the right side of Equation (31) approaches that of Equation (17) for . As a consequence, the variance from the fully adaptive-sparse PDD approximation converges to the exact variance of as and . In contrast, the variance from the -variate, partially adaptive-sparse PDD approximation does not follow suit, as it converges to the variance of the -variate, th-order PDD approximation as and , provided that . Therefore, the fully adaptive-sparse PDD approximation is more rigorous than a partially adaptive-sparse PDD approximation, but the latter can be more useful than the former when solving practical engineering problems and will be demonstrated in the Numerical Examples and Application sections.
3.4.2 Probability Distribution
Although the PDD approximations are mean-square convergent, Equations (26) and (27) can also be used to estimate higher-order moments and probability distributions, including rare-event probabilities, of sufficiently smooth stochastic responses. In this paper, the probability distribution of was approximated by performing Monte Carlo simulation of and/or . This simulation of the PDD approximation should not be confused with crude Monte Carlo simulation. The crude Monte Carlo method, which commonly requires numerical calculations of for input samples can be expensive or even prohibitive, particularly when the sample size needs to be very large for estimating small failure probabilities. In contrast, the Monte Carlo simulation embedded in a PDD approximation requires evaluations of simple analytical functions. Therefore, an arbitrarily large sample size can be accommodated in the PDD approximation.
It is also possible to estimate the probability distribution of from the knowledge of the cumulant generating function of a PDD approximation, provided that it exists, and then exploit the saddle point approximation for obtaining an exponential family of approximate distributions. Readers interested in this alternative approach are referred to the authors’ ongoing work on stochastic sensitivity analysis ren13 ().
It is important to emphasize that the two truncation criteria proposed are strictly based on variance as a measure of output uncertainty. They are highly relevant when the second-moment properties of complex response is desired. For higher-order moments or rare-event probabilities, it is possible to develop alternative sensitivity indices and related pruning criteria. They are not considered here.
3.5 Numerical Implementation
The application of fully and partially adaptive-sparse PDD approximations described by Equations (26) and (27) requires selecting PDD component functions , and assigning largest orders of their orthogonal polynomial expansions efficiently such that and . This section presents a unified computational algorithm and an associated flowchart developed to accomplish numerical implementation of the two proposed methods.
3.5.1 A Unified Algorithm
The iterative process for constructing an adaptive-sparse PDD approximation, whether full or partial, comprises two main stages: (1) continue incrementing the polynomial order for a chosen component function unless the criterion fails; and (2) continue selecting the component functions , , unless the criterion fails. These two stages are first executed over all univariate PDD component functions , , before progressing to all bivariate component functions , , and so on, until for the fully adaptive-sparse PDD approximation or until for a partially adaptive-sparse PDD approximation, where is specified by the user. The implementation details of the iterative process is described in Algorithm 1 and through the flowchart in Figure 1.
The first stage of the algorithm presented is predicated on accurate calculations of the sensitivity indices and , which require the variance of as noted by Equations (24) and (25). Since there exist an infinite number of expansion coefficients emanating from all PDD component functions, calculating the variance exactly from Equation (18) is impossible. To overcome this quandary, the authors propose to estimate the variance by utilizing all PDD expansion coefficients available at a juncture of the iterative process. For instance, let be an element of the index set , which comprises the subsets of selected so far at a given step of the iterative process. Then the approximate variance
replacing the exact variance in Equations (24) and (25) facilitates an effective iterative scheme for estimating and as well. Equation (32) was implemented in the proposed algorithm, as explained in Algorithm 1 and Figure 1.