Stochastic Leastsquares Petrov–Galerkin Method for Parameterized Linear Systems^{1}
Abstract
We consider the numerical solution of parameterized linear systems where the system matrix, the solution, and the righthand side are parameterized by a set of uncertain input parameters. We explore spectral methods in which the solutions are approximated in a chosen finitedimensional subspace. It has been shown that the stochastic Galerkin projection technique fails to minimize any measure of the solution error [20]. As a remedy for this, we propose a novel stochastic leastsquares Petrov–Galerkin (LSPG) method. The proposed method is optimal in the sense that it produces the solution that minimizes a weighted norm of the residual over all solutions in a given finitedimensional subspace. Moreover, the method can be adapted to minimize the solution error in different weighted norms by simply applying a weighting function within the leastsquares formulation. In addition, a goaloriented seminorm induced by an output quantity of interest can be minimized by defining a weighting function as a linear functional of the solution. We establish optimality and error bounds for the proposed method, and extensive numerical experiments show that the weighted LSPG methods outperforms other spectral methods in minimizing corresponding target weighted norms.
claimClaim \newsiamremarkremRemark \newsiamremarkexplExample \newsiamremarkhypothesisHypothesis \headersStochastic LeastSquares Petrov–Galerkin Method K. Lee, K. Carlberg, and H. C. Elman
tochastic Galerkin, leastsquares Petrov–Galerkin projection, residual minimization, spectral projection
35R60, 60H15, 60H35, 65C20, 65N30, 93E24
1 Introduction
Forward uncertainty propagation for parameterized linear systems is important in a range of applications to characterize the effects of uncertainties on the output of computational models. Such parameterized linear systems arise in many important problems in science and engineering, including stochastic partial differential equations (SPDEs) where uncertain input parameters are modeled as a set of random variables (e.g., diffusion/ground water flow simulations where diffusivity/permeability is modeled as a random field [15, 30]). It has been shown [11] that intrusive methods (e.g., stochastic Galerkin [2, 10, 13, 19, 28]) for uncertainty propagation can lead to smaller errors for a fixed basis dimension, compared with nonintrusive methods [27] (e.g., samplingbased methods [3, 14, 17], stochastic collocation [1, 21]).
The stochastic Galerkin method combined with generalized polynomial chaos (gPC) expansions [29] seeks a polynomial approximation of the numerical solution in the stochastic domain by enforcing a Galerkin orthogonality condition, i.e., the residual of the parameterized linear system is forced to be orthogonal to the span of the stochastic polynomial basis with respect to an inner product associated with an underlying probability measure. The Galerkin projection scheme is popular for its simplicity (i.e., the trial and test bases are the same) and its optimality in terms of minimizing an energy norm of solution errors when the underlying PDE operator is symmetric positive definite. In many applications, however, it has been shown that the stochastic Galerkin method does not exhibit any optimality property [20]. That is, it does not produce solutions that minimize any measure of the solution error. In such cases, the stochastic Galerkin method can lead to poor approximations and nonconvergent behavior.
To address this issue, we propose a novel optimal projection technique, which we refer to as the stochastic leastsquares Petrov–Galerkin (LSPG) method. Inspired by the successes of LSPG methods in nonlinear model reduction [8, 9, 7], finite element methods [4, 5, 16], and iterative linear solvers (e.g., GMRES, GCR) [22], we propose, as an alternative to enforcing the Galerkin orthogonality condition, to directly minimize the residual of a parameterized linear system over the stochastic domain in a (weighted) norm. The stochastic LSPG method produces an optimal solution for a given stochastic subspace and guarantees that the norm of the residual monotonically decreases as the stochastic basis is enriched. In addition to producing monotonically convergent approximations as measured in the chosen weighted norm, the method can also be adapted to target output quantities of interest (QoI); this can be accomplished by employing a weighted norm used for leastsquares minimization that coincides with the (semi)norm of the error in the chosen QoI.
In addition to proposing the stochastic LSPG method, this study shows that specific choices of weighting functions lead to equivalence between the stochastic LSPG method and both the stochastic Galerkin method and the pseudospectral method [26, 27]. We demonstrate the effectiveness of this method with extensive numerical experiments on various SPDEs. The results show that the proposed LSPG technique significantly outperforms the stochastic Galerkin when the solution error is measured in different weighted norms. We also show that the proposed method can effectively minimize the error in target QoIs.
An outline of the paper is as follows. Section 2 formulates parameterized linear algebraic systems and reviews conventional spectral approaches for computing numerical solutions. Section 3 develops a residual minimization formulation based on leastsquares methods and its adaptation to the stochastic LSPG method. We also provide proofs of optimality and monotonic convergence behavior of the proposed method. Section 4 provides error analysis for stochastic LSPG methods. Section 5 demonstrates the efficiency and the effectiveness of the proposed methods by testing them on various benchmark problems. Finally, Section 6 outlines some conclusions.
2 Spectral methods for parameterized linear systems
We begin by introducing a mathematical formulation of parameterized linear systems and briefly reviewing the stochastic Galerkin and the pseudospectral methods , which are spectral methods for approximating the numerical solutions of such systems.
2.1 Problem formulation
Consider a parameterized linear system
(1) 
where , and . The system is parameterized by a set of stochastic input parameters . Here, is an elementary event in a probability space and the stochastic domain is denoted by where . We are interested in computing a spectral approximation of the numerical solution in an dimensional subspace spanned by a finite set of polynomials such that , i.e.,
(2) 
where with are unknown coefficient vectors, is the vertical concatenation of these coefficient vectors, is a concatenation of the polynomial basis, denotes the Kronecker product, and denotes the identity matrix of dimension . Note that . Typically, the “stochastic” basis consists of products of univariate polynomials: where are univariate polynomials, is a multiindex and represents the degree of a polynomial in . The dimension of the stochastic subspace depends on the number of random variables , the maximum polynomial degree , and a construction of the polynomial space (e.g., a totaldegree space that contains polynomials with total degree up to , ). By substituting with in (1), the residual can be defined as
(3) 
where .
It follows from (2) and (3) that our goal now is to compute the unknown coefficients of the solution expansion. We briefly review two conventional approaches for doing so: the stochastic Galerkin method and the pseudospectral method. In the following, denotes an underlying measure of the stochastic space and
(4)  
(5) 
define an inner product between scalarvalued functions and with respect to and the expectation of , respectively. The norm of a vectorvalued function is defined as
(6) 
Typically, the polynomial basis is constructed to be orthogonal in the inner product, i.e., , where denotes the Kronecker delta.
2.2 Stochastic Galerkin method
The stochastic Galerkin method computes the unknown coefficients of in (2) by imposing orthogonality of the residual (3) with respect to the inner product in the subspace . This Galerkin orthogonality condition can be expressed as follows: Find such that
(7) 
where . The condition (7) can be represented in matrix notation as
(8) 
From the definition of the residual (3), this gives a system of linear equations
(9) 
of dimension . This yields an algebraic expression for the stochasticGalerkin approximation
(10) 
If is symmetric positive definite, the solution of linear system (9) minimizes the solution error in the induced energy norm , i.e.,
(11) 
In general, however, the stochasticGalerkin approximation does not minimize any measure of the solution error.
2.3 Pseudospectral method
The pseudospectral method directly approximates the unknown coefficients of in (2) by exploiting orthogonality of the polynomial basis . That is, the coefficients can be obtained by projecting the numerical solution onto the orthogonal polynomial basis as
(12) 
which can be expressed as
(13) 
or equivalently
(14) 
The associated optimality property of the approximation, which can be derived from optimality of orthogonal projection, is
(15) 
In practice, the coefficients are approximated via numerical quadrature as
(16) 
where are the quadrature points and weights.
While stochastic Galerkin leads to an optimal approximation (11) under certain conditions and pseudospectral projection minimizes the norm of the solution error (15), neither approach provides the flexibility to tailor the optimality properties of the approximation. This may be important in applications where, for example, minimizing the error in a quantity of interest is desired. To address this, we propose a general optimizationbased framework for spectral methods that enables the choice of a targeted weighted norm in which the solution error is minimized.
3 Stochastic leastsquares Petrov–Galerkin method
As a starting point, we propose a residualminimizing formulation that computes the coefficients by directly minimizing the norm of the residual, i.e.,
(17) 
where . Thus, the norm of the residual is equivalent to a weighted norm of the solution error. Using (2) and (3), we have
(18) 
The definition of the residual (3) allows the objective function in (18) to be written in quadratic form as
(19) 
Noting that the mapping is convex, the (unique) solution to (18) is a stationary point of and thus satisfies
(20) 
which can be interpreted as the normalequations form of the linear leastsquares problem (18).
Consider a generalization of this idea that minimizes the solution error in a targeted weighted norm by choosing a specific weighting function. Let us define a weighting function , where and . Then, the stochastic LSPG method can be written as
(21) 
with . Algebraically, this is equivalent to
(22) 
We will restrict our attention to the case and denote by for simplicity. Now, the algebraic stochastic LSPG problem (22) simplifies to
(23) 
The objective function in (23) can be written in quadratic form as
(24) 
As before, because the mapping is convex, the unique solution of (23) corresponds to a stationary point of and thus satisfies
(25) 
which is the normalequations form of the linear leastsquares problem (23). This yields the following algebraic expression for the stochasticLSPG approximation:
(26) 
Petrov–Galerkin projection. Another way of interpreting the normal equations (25) is that the (weighted) residual is enforced to be orthogonal to the subspace spanned by the optimal test basis with and . That is, this projection is precisely the (leastsquares) Petrov–Galerkin projection,
(27) 
where .
Monotonic Convergence. The stochastic leastsquares PetrovGalerkin is monotonically convergent. That is, as the trial subspace is enriched (by adding polynomials to the basis), the optimal value of the convex objective function monotonically decreases. This is apparent from the LSPG optimization problem (21): Defining
(28) 
we have (and ) if .
Weighting strategies. Different choices of weighting function allow LSPG to minimize different measures of the error. We focus on four particular choices:

, where is a Cholesky factor of , i.e., . This decomposition exists if and only if is symmetric positive semidefinite. In this case, LSPG minimizes the energy norm of the solution error () and is mathematically equivalent to the stochastic Galerkin method described in Section 2.2, i.e., . This can be seen by comparing (11) and (21) with , as in this case.

, where is the identity matrix of dimension . In this case, LSPG minimizes the norm of the residual .

where is a linear functional of the solution associated with a vector of output quantities of interest. In this case, LSPG minimizes the norm of the error in the output quantities of interest .
We again emphasize that two particular choices of the weighting function lead to equivalence between LSPG and existing spectralprojection methods (stochastic Galerkin and pseudospectral projection), i.e.,
(29) 
where the first equality is valid (i.e., the Cholesky decomposition can be computed) if and only if is symmetric positive semidefinite. Table 1 summarizes the target quantities to minimize (i.e., , the corresponding LSPG weighting functions, and the method names LSPG().
Quantity minimized by LSPG  Weighting function  Method name  

Quantity  Expression  
Energy norm of error  LSPG()/SG  
norm of residual  LSPG()  
norm of solution error  LSPG()/PS  
norm of error in quantities of interest  LSPG() 
4 Error analysis
If an approximation satisfies an optimalprojection condition
(30) 
then
(31) 
Using norm equivalence
(32) 
we can characterize the solution error in any alternative norm as
(33) 
Thus, the error in an alternative norm is controlled by the optimal objectivefunction value (which can be made small if the trial space admits accurate solutions) and the stability constant .
Table 2 reports normequivalence constants for the norms considered in this work. Here, we have defined
(34) 
1  
1  
1  
1 
This exposes several interesting conclusions. First, if , then the null space of is nontrivial and so . This implies that LSPG(), for which , will have an undefined value of when the solution error is measured in other norms, i.e., for , , and . It will have controlled errors only for , in which case . Second, note that for problems with small , the norm in the quantities of interest may be large for the LSPG()/SG, or LSPG(), while it will remain well behaved for LSPG()/PS and LSPG().
5 Numerical experiments
This section explores the performance of the LSPG methods for solving elliptic SPDEs parameterized by one random variable (i.e., ). The maximum polynomial degree used in the stochastic space is ; thus, the dimension of is . In physical space, the SPDE is defined over a twodimensional rectangular bounded domain , and it is discretized using the finite element method with bilinear () elements as implemented in the Incompressible Flow and Iterative Solver Software (IFISS) package [24]. Sixteen elements are employed in each dimension, leading to degrees of freedom excluding boundary nodes. All numerical experiments are performed on an Intel 3.1 GHz i7 CPU, 16 GB RAM using Matlab R2015a.
Measuring weighted norms. For all LSPG methods, the weighted norms can be measured by evaluating the expectations in the quadratic form of the objective function shown in (24). This requires evaluation of three expectations
(35) 
with
(36)  
(37)  
(38) 
Note that does not depend on the stochasticspace dimension . These quantities can be evaluated by numerical quadrature or analytically if closedform expressions for those expectations exist. Unless otherwise specified, we compute these quantities using the integral function in Matlab, which performs adaptive numerical quadrature based on the 15point Gauss–Kronrod quadrature formula [23].
Error measures. In the experiments, we assess the error in approximate solutions computed using various spectralprojection techniques using four relative error measures (see Table 1):
(39) 
5.1 Stochastic diffusion problems
Consider the steadystate stochastic diffusion equation with homogeneous boundary conditions,
(40) 
where the diffusivity is a random field and . The random field is specified as an exponential of a truncated KarhunenLoève (KL) expansion [18] with covariance kernel, , where is the correlation length, i.e.,
(41) 
where are the mean and variance of and is the first eigenfunction in the KL expansion. After applying the spatial (finiteelement) discretization, the problem can be reformulated as a parameterized linear system of the form (1), where is a parameterized stiffness matrix obtained from the weak form of the problem whose element is (with standard finite element basis functions) and is a parameterized righthand side whose th element is . Note that is symmetric positive definite for this problem; thus LSPG()/SG is a valid projection scheme (the Cholesky factorization exists) and is equal to stochastic Galerkin projection.
Output quantities of interest. We consider output quantities of interest () that are random linear functionals of the solution and is of dimension having the form:

with : each entry of is drawn uniformly from and is a scalarvalued function of . The resulting output QoI, , is a vectorvalued function of dimension .

: is a mass matrix defined via . The output QoI is a scalarvalued function , which approximates a spatial average .
Diffusion problem 1: Lognormal random coefficient and deterministic forcing
In this example, we take in (41) to follow a standard normal distribution (i.e., and ) and is deterministic. Because is normally distributed, normalized Hermite polynomials (orthogonal with respect to ) are used as polynomial basis .
Figure 1 reports the relative errors (39) associated with solutions computed using four LSPG methods (LSPG()/SG, LSPG(), LSPG()/PS, and LSPG()) for varying polynomial degree . Here, we consider the random output QoI, i.e., , , and . This result shows that three methods (LSPG()/SG, LSPG(), and LSPG()/PS) monotonically converge in all four error measures, whereas LSPG() does not. This is an artifact of rank deficiency in , which leads to ; as a result, all stability constants for which in Table 2 are unbounded, implying lack of error control. Figure 1 also shows that each LSPG method minimizes its targeted error measure for a given stochasticsubspace dimension (e.g., LSPG minimizes the norm of the residual); this is also evident from Table 2, as the stability constant realizes its minimum value () for . Table 3 shows actual values of the stability constant of this problem and well explains the behaviors of all LSPG methods. For example, the first column of Table 3 shows that the stability constant is increasing in the order (LSPG()/SG, LSPG(), LSPG()/PS, and LSPG()), which is represented in Figure (a)a.
1  26.43  2.06  11644.22  
2.06  1  4.25  24013.48  
26.43  698.53  1  5646.32  
1 
The results in Figure 1 do not account for computational costs. This point is addressed in Figure 2, which shows the relative errors as a function of CPU time. As we would like to devise a method that minimizes both the error and computational time, we examine a Pareto front (black dotted line) in each error measure. For a fixed value of , LSPG()/PS is the fastest method because it does not require solution of a coupled system of linear equations of dimension which is required by the other three LSPG methods (LSPG()/SG, LSPG(), and LSPG()). As a result, pseudospectral projection (LSPG()/PS) generally yields the best overall performance in practice, even when it produces larger errors than other methods for a fixed value of . Also, for a fixed value of , LSPG()/SG is faster than LSPG() because the weighted stiffness matrix obtained from the finite element discretization is sparser than . That is, the number of nonzero entries to be evaluated for LSPG()/SG in numerical quadrature is smaller than the ones for LSPG(), and exploiting this sparsity structure in the numerical quadrature causes LSPG()/SG to be faster than LSPG().
Diffusion problem 2: Lognormal random coefficient and random forcing
This example uses the same random field
(41), but instead employs a random forcing term
Figure 4 shows that this trend applies to other methods as well when effectiveness is viewed with respect to CPU time; each technique exhibits monotonic convergence in its tailored error measure only. Moreover, the Pareto fronts (black dotted lines) in each subgraph of Figure 4 shows that the LSPG method tailored for a particular error measure is Pareto optimal in terms of minimizing the error and computational wall time. In the next experiments, we examine goaloriented LSPG() for varying number of output quantities of interest and its effect on the stability constant . Figure 5 reports three error measures computed using all four LSPG methods. For LSPG(), the first linear function is applied with and a varying number of outputs . When , LSPG() and LSPG()/PS behave similarly in all three weighted norms. This is because when , then , so the stability constants for in Table 2 are bounded. Figure 6 reports relative errors in the quantity of interest associated with linear functionals for two different functions , and . Note that LSPG()/SG and LSPG() fail to converge, whereas LSPG()/PS and LSPG() converge, which can be explained by the stability constant in Table 2 where and for the linear operator of this problem. LSPG() converges monotonically and produces the smallest error (for a fixed polynomial degree ) of all the methods as expected.