Stochastic Least-squares Petrov–Galerkin Method for Parameterized Linear SystemsThis work was supported by the U.S. Department of Energy Office of Advanced Scientific Computing Research, Applied Mathematics program under award DEC-SC0009301 and by the U.S. National Science Foundation under grant DMS1418754.

# Stochastic Least-squares Petrov–Galerkin Method for Parameterized Linear Systems1

## Abstract

We consider the numerical solution of parameterized linear systems where the system matrix, the solution, and the right-hand side are parameterized by a set of uncertain input parameters. We explore spectral methods in which the solutions are approximated in a chosen finite-dimensional subspace. It has been shown that the stochastic Galerkin projection technique fails to minimize any measure of the solution error [20]. As a remedy for this, we propose a novel stochastic least-squares Petrov–Galerkin (LSPG) method. The proposed method is optimal in the sense that it produces the solution that minimizes a weighted -norm of the residual over all solutions in a given finite-dimensional subspace. Moreover, the method can be adapted to minimize the solution error in different weighted -norms by simply applying a weighting function within the least-squares formulation. In addition, a goal-oriented semi-norm induced by an output quantity of interest can be minimized by defining a weighting function as a linear functional of the solution. We establish optimality and error bounds for the proposed method, and extensive numerical experiments show that the weighted LSPG methods outperforms other spectral methods in minimizing corresponding target weighted norms.

s
\newsiamthm

claimClaim \newsiamremarkremRemark \newsiamremarkexplExample \newsiamremarkhypothesisHypothesis \headersStochastic Least-Squares Petrov–Galerkin Method K. Lee, K. Carlberg, and H. C. Elman

tochastic Galerkin, least-squares Petrov–Galerkin projection, residual minimization, spectral projection

{AMS}

35R60, 60H15, 60H35, 65C20, 65N30, 93E24

## 1 Introduction

Forward uncertainty propagation for parameterized linear systems is important in a range of applications to characterize the effects of uncertainties on the output of computational models. Such parameterized linear systems arise in many important problems in science and engineering, including stochastic partial differential equations (SPDEs) where uncertain input parameters are modeled as a set of random variables (e.g., diffusion/ground water flow simulations where diffusivity/permeability is modeled as a random field [15, 30]). It has been shown [11] that intrusive methods (e.g., stochastic Galerkin [2, 10, 13, 19, 28]) for uncertainty propagation can lead to smaller errors for a fixed basis dimension, compared with non-intrusive methods [27] (e.g., sampling-based methods [3, 14, 17], stochastic collocation [1, 21]).

The stochastic Galerkin method combined with generalized polynomial chaos (gPC) expansions [29] seeks a polynomial approximation of the numerical solution in the stochastic domain by enforcing a Galerkin orthogonality condition, i.e., the residual of the parameterized linear system is forced to be orthogonal to the span of the stochastic polynomial basis with respect to an inner product associated with an underlying probability measure. The Galerkin projection scheme is popular for its simplicity (i.e., the trial and test bases are the same) and its optimality in terms of minimizing an energy norm of solution errors when the underlying PDE operator is symmetric positive definite. In many applications, however, it has been shown that the stochastic Galerkin method does not exhibit any optimality property [20]. That is, it does not produce solutions that minimize any measure of the solution error. In such cases, the stochastic Galerkin method can lead to poor approximations and non-convergent behavior.

To address this issue, we propose a novel optimal projection technique, which we refer to as the stochastic least-squares Petrov–Galerkin (LSPG) method. Inspired by the successes of LSPG methods in nonlinear model reduction [8, 9, 7], finite element methods [4, 5, 16], and iterative linear solvers (e.g., GMRES, GCR) [22], we propose, as an alternative to enforcing the Galerkin orthogonality condition, to directly minimize the residual of a parameterized linear system over the stochastic domain in a (weighted) -norm. The stochastic LSPG method produces an optimal solution for a given stochastic subspace and guarantees that the -norm of the residual monotonically decreases as the stochastic basis is enriched. In addition to producing monotonically convergent approximations as measured in the chosen weighted -norm, the method can also be adapted to target output quantities of interest (QoI); this can be accomplished by employing a weighted -norm used for least-squares minimization that coincides with the -(semi)norm of the error in the chosen QoI.

In addition to proposing the stochastic LSPG method, this study shows that specific choices of weighting functions lead to equivalence between the stochastic LSPG method and both the stochastic Galerkin method and the pseudo-spectral method [26, 27]. We demonstrate the effectiveness of this method with extensive numerical experiments on various SPDEs. The results show that the proposed LSPG technique significantly outperforms the stochastic Galerkin when the solution error is measured in different weighted -norms. We also show that the proposed method can effectively minimize the error in target QoIs.

An outline of the paper is as follows. Section 2 formulates parameterized linear algebraic systems and reviews conventional spectral approaches for computing numerical solutions. Section 3 develops a residual minimization formulation based on least-squares methods and its adaptation to the stochastic LSPG method. We also provide proofs of optimality and monotonic convergence behavior of the proposed method. Section 4 provides error analysis for stochastic LSPG methods. Section 5 demonstrates the efficiency and the effectiveness of the proposed methods by testing them on various benchmark problems. Finally, Section 6 outlines some conclusions.

## 2 Spectral methods for parameterized linear systems

We begin by introducing a mathematical formulation of parameterized linear systems and briefly reviewing the stochastic Galerkin and the pseudo-spectral methods , which are spectral methods for approximating the numerical solutions of such systems.

### 2.1 Problem formulation

Consider a parameterized linear system

 A(ξ)u(ξ)=b(ξ), (1)

where , and . The system is parameterized by a set of stochastic input parameters . Here, is an elementary event in a probability space and the stochastic domain is denoted by where . We are interested in computing a spectral approximation of the numerical solution in an -dimensional subspace spanned by a finite set of polynomials such that , i.e.,

 u(ξ)≈~u(ξ)=nψ∑i=1¯uiψi(ξ)=(ψT(ξ)⊗Inx)¯u, (2)

where with are unknown coefficient vectors, is the vertical concatenation of these coefficient vectors, is a concatenation of the polynomial basis, denotes the Kronecker product, and denotes the identity matrix of dimension . Note that . Typically, the “stochastic” basis consists of products of univariate polynomials: where are univariate polynomials, is a multi-index and represents the degree of a polynomial in . The dimension of the stochastic subspace depends on the number of random variables , the maximum polynomial degree , and a construction of the polynomial space (e.g., a total-degree space that contains polynomials with total degree up to , ). By substituting with in (1), the residual can be defined as

 r(¯u;ξ) \coloneqqb(ξ)−A(ξ)nψ∑i=1¯uiψi(ξ)=b(ξ)−(ψT(ξ)⊗A(ξ))¯u, (3)

where .

It follows from (2) and (3) that our goal now is to compute the unknown coefficients of the solution expansion. We briefly review two conventional approaches for doing so: the stochastic Galerkin method and the pseudo-spectral method. In the following, denotes an underlying measure of the stochastic space and

 ⟨g,h⟩ρ ≡∫Γg(ξ)h(ξ)ρ(ξ)dξ, (4) E[g] ≡∫Γg(ξ)ρ(ξ)dξ, (5)

define an inner product between scalar-valued functions and with respect to and the expectation of , respectively. The -norm of a vector-valued function is defined as

 ∥v∥22 ≡nx∑i=1∫Γv2i(ξ)ρ(ξ)dξ=E[vTv]. (6)

Typically, the polynomial basis is constructed to be orthogonal in the inner product, i.e., , where denotes the Kronecker delta.

### 2.2 Stochastic Galerkin method

The stochastic Galerkin method computes the unknown coefficients of in (2) by imposing orthogonality of the residual (3) with respect to the inner product in the subspace . This Galerkin orthogonality condition can be expressed as follows: Find such that

 ⟨ri(¯uSG),ψj⟩ρ=E[ri(¯uSG)ψj]=0,i=1,…,nx, j=1,…,nψ, (7)

where . The condition (7) can be represented in matrix notation as

 E[ψ⊗r(¯uSG)]=0. (8)

From the definition of the residual (3), this gives a system of linear equations

 E[ψψT⊗A]¯uSG=E[ψ⊗b], (9)

of dimension . This yields an algebraic expression for the stochastic-Galerkin approximation

 ~uSG(ξ)=(ψ(ξ)T⊗Inx)E[ψψT⊗A]−1E[ψ⊗Au]. (10)

If is symmetric positive definite, the solution of linear system (9) minimizes the solution error in the -induced energy norm , i.e.,

 ~uSG(ξ)=argminx∈(Snψ)nx∥e(x)∥2A. (11)

In general, however, the stochastic-Galerkin approximation does not minimize any measure of the solution error.

### 2.3 Pseudo-spectral method

The pseudo-spectral method directly approximates the unknown coefficients of in (2) by exploiting orthogonality of the polynomial basis . That is, the coefficients can be obtained by projecting the numerical solution onto the orthogonal polynomial basis as

 ¯uPSi=E[uψi],i=1,…,nψ, (12)

which can be expressed as

 ¯uPS=E[ψ⊗A−1b], (13)

or equivalently

 ~uPS(ξ)=(ψ(ξ)T⊗Inx)E[ψ⊗u]. (14)

The associated optimality property of the approximation, which can be derived from optimality of orthogonal projection, is

 ~uPS(ξ)=argminx∈(Snψ)nx∥e(x)∥22. (15)

In practice, the coefficients are approximated via numerical quadrature as

 ¯uPSi=E[uψi]=nq∑k=1u(ξ(k))ψi(ξ(k))wk=nq∑k=1(A−1(ξ(k))f(ξ(k)))ψi(ξ(k))wk, (16)

where are the quadrature points and weights.

While stochastic Galerkin leads to an optimal approximation (11) under certain conditions and pseudo-spectral projection minimizes the -norm of the solution error (15), neither approach provides the flexibility to tailor the optimality properties of the approximation. This may be important in applications where, for example, minimizing the error in a quantity of interest is desired. To address this, we propose a general optimization-based framework for spectral methods that enables the choice of a targeted weighted -norm in which the solution error is minimized.

## 3 Stochastic least-squares Petrov–Galerkin method

As a starting point, we propose a residual-minimizing formulation that computes the coefficients by directly minimizing the -norm of the residual, i.e.,

 ~uLSPG(ξ)=argminx∈(Snψ)nx∥f−Ax∥22=argminx∈(Snψ)nx∥e(x)∥2ATA, (17)

where . Thus, the -norm of the residual is equivalent to a weighted -norm of the solution error. Using (2) and (3), we have

 ¯uLSPG=argmin¯x∈Rnxnψ∥r(¯x)∥22. (18)

The definition of the residual (3) allows the objective function in (18) to be written in quadratic form as

 ∥r(¯x)∥22=∥f−(ψT⊗A)¯x∥22=¯xTE[ψψT⊗ATA]¯x−2E[ψ⊗ATf]T¯x+E[fTf]. (19)

Noting that the mapping is convex, the (unique) solution to (18) is a stationary point of and thus satisfies

 E[ψψT⊗ATA]¯uLSPG=E[ψ⊗ATf], (20)

which can be interpreted as the normal-equations form of the linear least-squares problem (18).

Consider a generalization of this idea that minimizes the solution error in a targeted weighted -norm by choosing a specific weighting function. Let us define a weighting function , where and . Then, the stochastic LSPG method can be written as

 ~uLSPG(M)(ξ)=argminx∈(Snψ)nx∥M(b−Ax)∥22=argminx∈(Snψ)nx∥e(x)∥2ATMTMA, (21)

with . Algebraically, this is equivalent to

 ¯uLSPG(M)=argmin¯x∈Rnxnψ∥Mr(¯x)∥22=argmin¯x∈Rnxnψ∥(Mξ⊗Mx)(1⊗b−(ψT⊗A)¯x)∥22=argmin¯x∈Rnxnψ∥Mξ⊗(Mxb)−((MξψT)⊗(MxA))¯x∥22. (22)

We will restrict our attention to the case and denote by for simplicity. Now, the algebraic stochastic LSPG problem (22) simplifies to

 ¯uLSPG(M) =argmin¯x∈Rnxnψ∥Mr(¯x)∥22=argmin¯x∈Rnxnψ∥Mb−(ψT⊗MA)¯x∥22. (23)

The objective function in (23) can be written in quadratic form as

 ∥Mr(¯x)∥22=¯xTE[(ψψT⊗MATMTMA)]¯x−2(E[ψ⊗ATMTMf])T¯x+E[bTMTMb]. (24)

As before, because the mapping is convex, the unique solution of (23) corresponds to a stationary point of and thus satisfies

 E[ψψT⊗ATMTMA]¯uLSPG(M)=E[ψ⊗ATMTMf], (25)

which is the normal-equations form of the linear least-squares problem (23). This yields the following algebraic expression for the stochastic-LSPG approximation:

 ~uLSPG(M)(ξ)=(ψ(ξ)T⊗Inx)E[ψψT⊗ATMTMA]−1E[ψ⊗ATMTMAu]. (26)

Petrov–Galerkin projection. Another way of interpreting the normal equations (25) is that the (weighted) residual is enforced to be orthogonal to the subspace spanned by the optimal test basis with and . That is, this projection is precisely the (least-squares) Petrov–Galerkin projection,

 E[ϕT(b−(ψT⊗MA)¯uLSPG(M))]=0, (27)

where .

Monotonic Convergence. The stochastic least-squares Petrov-Galerkin is monotonically convergent. That is, as the trial subspace is enriched (by adding polynomials to the basis), the optimal value of the convex objective function monotonically decreases. This is apparent from the LSPG optimization problem (21): Defining

 ~uLSPG′(M)(ξ)=argminx∈(Snψ+1)nx∥M(b−Ax)∥22, (28)

we have (and ) if .

Weighting strategies. Different choices of weighting function allow LSPG to minimize different measures of the error. We focus on four particular choices:

1. , where is a Cholesky factor of , i.e., . This decomposition exists if and only if is symmetric positive semidefinite. In this case, LSPG minimizes the energy norm of the solution error () and is mathematically equivalent to the stochastic Galerkin method described in Section 2.2, i.e., . This can be seen by comparing (11) and (21) with , as in this case.

2. , where is the identity matrix of dimension . In this case, LSPG minimizes the -norm of the residual .

3. . In this case, LSPG minimizes the -norm of solution error . This is mathematically equivalent to the pseudo-spectral method described in Section 2.3, i.e., , which can be seen by comparing (15) and (21) with .

4. where is a linear functional of the solution associated with a vector of output quantities of interest. In this case, LSPG minimizes the -norm of the error in the output quantities of interest .

We again emphasize that two particular choices of the weighting function lead to equivalence between LSPG and existing spectral-projection methods (stochastic Galerkin and pseudo-spectral projection), i.e.,

 ~uLSPG(C−1)=~uSG,~uLSPG(A−1)=~uPS, (29)

where the first equality is valid (i.e., the Cholesky decomposition can be computed) if and only if is symmetric positive semidefinite. Table 1 summarizes the target quantities to minimize (i.e., , the corresponding LSPG weighting functions, and the method names LSPG().

## 4 Error analysis

If an approximation satisfies an optimal-projection condition

 ~u=argminx∈(Snψ)nx∥e(x)∥2Θ, (30)

then

 ∥e(~u)∥2Θ=minx∈(Snψ)nx∥e(x)∥2Θ. (31)

Using norm equivalence

 ∥x∥2Θ′≤C∥x∥2Θ, (32)

we can characterize the solution error in any alternative norm as

 ∥e(~u)∥2Θ′≤Cminx∈(Snψ)nx∥e(x)∥2Θ. (33)

Thus, the error in an alternative norm is controlled by the optimal objective-function value (which can be made small if the trial space admits accurate solutions) and the stability constant .

Table 2 reports norm-equivalence constants for the norms considered in this work. Here, we have defined

 σmin(M)≡infx∈(L2(Γ))nx∥Mx∥2/∥x∥2,σmax(M)≡supx∈(L2(Γ))nx∥Mx∥2/∥x∥2. (34)

This exposes several interesting conclusions. First, if , then the null space of is nontrivial and so . This implies that LSPG(), for which , will have an undefined value of when the solution error is measured in other norms, i.e., for , , and . It will have controlled errors only for , in which case . Second, note that for problems with small , the norm in the quantities of interest may be large for the LSPG()/SG, or LSPG(), while it will remain well behaved for LSPG()/PS and LSPG().

## 5 Numerical experiments

This section explores the performance of the LSPG methods for solving elliptic SPDEs parameterized by one random variable (i.e., ). The maximum polynomial degree used in the stochastic space is ; thus, the dimension of is . In physical space, the SPDE is defined over a two-dimensional rectangular bounded domain , and it is discretized using the finite element method with bilinear () elements as implemented in the Incompressible Flow and Iterative Solver Software (IFISS) package [24]. Sixteen elements are employed in each dimension, leading to degrees of freedom excluding boundary nodes. All numerical experiments are performed on an Intel 3.1 GHz i7 CPU, 16 GB RAM using Matlab R2015a.

Measuring weighted -norms. For all LSPG methods, the weighted -norms can be measured by evaluating the expectations in the quadratic form of the objective function shown in (24). This requires evaluation of three expectations

 ∥Mr(¯x)∥22\coloneqq¯xTT1¯x−2TT2¯x+T3, (35)

with

 T1\coloneqq E[(ψψT⊗ATMTMTA)]∈Rnxnψ×nxnψ, (36) T2\coloneqq E[ψ⊗ATMTMb]∈Rnxnψ, (37) T3\coloneqq E[bTMTMb]∈R. (38)

Note that does not depend on the stochastic-space dimension . These quantities can be evaluated by numerical quadrature or analytically if closed-form expressions for those expectations exist. Unless otherwise specified, we compute these quantities using the integral function in Matlab, which performs adaptive numerical quadrature based on the 15-point Gauss–Kronrod quadrature formula [23].

Error measures. In the experiments, we assess the error in approximate solutions computed using various spectral-projection techniques using four relative error measures (see Table 1):

 ηr(x)\coloneqq∥e(x)∥2ATA∥f∥22,ηe(x)\coloneqq∥e(x)∥22∥u∥22,ηA(x)\coloneqq∥e(x)∥2A∥u∥2A,ηQ(x)\coloneqq∥Fe(x)∥22∥Fu∥22. (39)

### 5.1 Stochastic diffusion problems

Consider the steady-state stochastic diffusion equation with homogeneous boundary conditions,

 {−∇⋅(a(x,ξ)∇u(x,ξ))=f(x,ξ) in D×Γu(x,ξ)=0 on ∂D×Γ, (40)

where the diffusivity is a random field and . The random field is specified as an exponential of a truncated Karhunen-Loève (KL) expansion [18] with covariance kernel, , where is the correlation length, i.e.,

 a(x,ξ)≡exp(μ+σa1(x)ξ), (41)

where are the mean and variance of and is the first eigenfunction in the KL expansion. After applying the spatial (finite-element) discretization, the problem can be reformulated as a parameterized linear system of the form (1), where is a parameterized stiffness matrix obtained from the weak form of the problem whose -element is (with standard finite element basis functions) and is a parameterized right-hand side whose th element is . Note that is symmetric positive definite for this problem; thus LSPG()/SG is a valid projection scheme (the Cholesky factorization exists) and is equal to stochastic Galerkin projection.

Output quantities of interest. We consider output quantities of interest () that are random linear functionals of the solution and is of dimension having the form:

• with : each entry of is drawn uniformly from and is a scalar-valued function of . The resulting output QoI, , is a vector-valued function of dimension .

• : is a mass matrix defined via . The output QoI is a scalar-valued function , which approximates a spatial average .

#### Diffusion problem 1: Lognormal random coefficient and deterministic forcing

In this example, we take in (41) to follow a standard normal distribution (i.e., and ) and is deterministic. Because is normally distributed, normalized Hermite polynomials (orthogonal with respect to ) are used as polynomial basis .

Figure 1 reports the relative errors (39) associated with solutions computed using four LSPG methods (LSPG()/SG, LSPG(), LSPG()/PS, and LSPG()) for varying polynomial degree . Here, we consider the random output QoI, i.e., , , and . This result shows that three methods (LSPG()/SG, LSPG(), and LSPG()/PS) monotonically converge in all four error measures, whereas LSPG() does not. This is an artifact of rank deficiency in , which leads to ; as a result, all stability constants for which in Table 2 are unbounded, implying lack of error control. Figure 1 also shows that each LSPG method minimizes its targeted error measure for a given stochastic-subspace dimension (e.g., LSPG minimizes the -norm of the residual); this is also evident from Table 2, as the stability constant realizes its minimum value () for . Table 3 shows actual values of the stability constant of this problem and well explains the behaviors of all LSPG methods. For example, the first column of Table 3 shows that the stability constant is increasing in the order (LSPG()/SG, LSPG(), LSPG()/PS, and LSPG()), which is represented in Figure (a)a.

The results in Figure 1 do not account for computational costs. This point is addressed in Figure 2, which shows the relative errors as a function of CPU time. As we would like to devise a method that minimizes both the error and computational time, we examine a Pareto front (black dotted line) in each error measure. For a fixed value of , LSPG()/PS is the fastest method because it does not require solution of a coupled system of linear equations of dimension which is required by the other three LSPG methods (LSPG()/SG, LSPG(), and LSPG()). As a result, pseudo-spectral projection (LSPG()/PS) generally yields the best overall performance in practice, even when it produces larger errors than other methods for a fixed value of . Also, for a fixed value of , LSPG()/SG is faster than LSPG() because the weighted stiffness matrix obtained from the finite element discretization is sparser than . That is, the number of nonzero entries to be evaluated for LSPG()/SG in numerical quadrature is smaller than the ones for LSPG(), and exploiting this sparsity structure in the numerical quadrature causes LSPG()/SG to be faster than LSPG().

#### Diffusion problem 2: Lognormal random coefficient and random forcing

This example uses the same random field (41), but instead employs a random forcing term2 . Again, follows a standard normal distribution and normalized Hermite polynomials are used as polynomial basis. We consider the second output QoI, . As shown in Figure 3, the stochastic Galerkin method fails to converge monotonically in three error measures as the stochastic polynomial basis is enriched. In fact, it exhibits monotonic convergence only in the error measure it minimizes (for which monotonic convergence is guaranteed).

Figure 4 shows that this trend applies to other methods as well when effectiveness is viewed with respect to CPU time; each technique exhibits monotonic convergence in its tailored error measure only. Moreover, the Pareto fronts (black dotted lines) in each subgraph of Figure 4 shows that the LSPG method tailored for a particular error measure is Pareto optimal in terms of minimizing the error and computational wall time. In the next experiments, we examine goal-oriented LSPG() for varying number of output quantities of interest and its effect on the stability constant . Figure 5 reports three error measures computed using all four LSPG methods. For LSPG(), the first linear function is applied with and a varying number of outputs . When , LSPG() and LSPG()/PS behave similarly in all three weighted -norms. This is because when , then , so the stability constants for in Table 2 are bounded. Figure 6 reports relative errors in the quantity of interest associated with linear functionals for two different functions , and . Note that LSPG()/SG and LSPG() fail to converge, whereas LSPG()/PS and LSPG() converge, which can be explained by the stability constant in Table 2 where and for the linear operator of this problem. LSPG() converges monotonically and produces the smallest error (for a fixed polynomial degree ) of all the methods as expected.