# A Low-Rank Multigrid Method for the Stochastic Steady-State Diffusion Problem^{†}^{†}thanks: This work was supported by the U.S. Department of Energy Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0009301 and by the U.S. National Science Foundation under grant DMS1418754.

###### Abstract

We study a multigrid method for solving large linear systems of equations with tensor product structure. Such systems are obtained from stochastic finite element discretization of stochastic partial differential equations such as the steady-state diffusion problem with random coefficients. When the variance in the problem is not too large, the solution can be well approximated by a low-rank object. In the proposed multigrid algorithm, the matrix iterates are truncated to low rank to reduce memory requirements and computational effort. The method is proved convergent with an analytic error bound. Numerical experiments show its effectiveness in solving the Galerkin systems compared to the original multigrid solver, especially when the number of degrees of freedom associated with the spatial discretization is large.

remarkRemark \headersLow-Rank Multigrid for Stochastic Diffusion Problem H. C. Elman and T. Su

tochastic finite element method, multigrid, low-rank approximation

35R60, 60H15, 60H35, 65F10, 65N30, 65N55

## 1 Introduction

Stochastic partial differential equations (SPDEs) arise from physical applications where the parameters of the problem are subject to uncertainty. Discretization of SPDEs gives rise to large linear systems of equations which are computationally expensive to solve. These systems are in general sparse and structured. In particular, the coefficient matrix can often be expressed as a sum of tensor products of smaller matrices [6, 13, 14]. For such systems it is natural to use an iterative solver where the coefficient matrix is never explicitly formed and matrix-vector products are computed efficiently. One way to further reduce costs is to construct low-rank approximations to the desired solution. The iterates are truncated so that the solution method handles only low-rank objects in each iteration. This idea has been used to reduce the costs of iterative solution algorithms based on Krylov subspaces. For example, a low-rank conjugate gradient method was given in [9], and low-rank generalized minimal residual methods have been studied in [2, 10].

In this study, we propose a low-rank multigrid method for solving the Galerkin systems. We consider a steady-state diffusion equation with random diffusion coefficient as model problem, and we use the stochastic finite element method (SFEM, see [1, 7]) for the discretization of the problem. The resulting Galerkin system has tensor product structure and moreover, quantities used in the computation, such as the solution sought, can be expressed in matrix format. It has been shown that such systems admit low-rank approximate solutions [3, 9]. In our proposed multigrid solver, the matrix iterates are truncated to have low rank in each iteration. We derive an analytic bound for the error of the solution and show the convergence of the algorithm. We demonstrate using benchmark problems that the low-rank multigrid solver is often more efficient than a solver that does not use truncation, and that it is especially advantageous in reducing computing time for large-scale problems.

An outline of the paper is as follows. In Section 2 we state the problem and briefly review the stochastic finite element method and the multigrid solver for the stochastic Galerkin system from which the new technique is derived. In Section 3 we discuss the idea of low-rank approximation and introduce the multigrid solver with low-rank truncation. A convergence analysis of the low-rank multigrid solver is also given in this section. The results of numerical experiments are shown in Section 4 to test the performance of the algorithm, and some conclusions are drawn in the last section.

## 2 Model problem

Consider the stochastic steady-state diffusion equation with homogeneous Dirichlet boundary conditions

(2.1) |

Here is a spatial domain and is a sample space with -algebra and probability measure . The diffusion coefficient is a random field. We consider the case where the source term is deterministic. The stochastic Galerkin formulation of Eq. 2.1 uses a weak formulation: find satisfying

(2.2) |

for all . The problem is well posed if is bounded and strictly positive, i.e.,

so that the Lax-Milgram lemma establishes existence and uniqueness of the weak solution.

We will assume that the stochastic coefficient is represented as a truncated Karhunen-Love (KL) expansion [11, 12], in terms of a finite collection of uncorrelated random variables :

(2.3) |

where is the mean function, is the th eigenpair of the covariance function , and the eigenvalues are assumed to be in non-increasing order. In Section 4 we will further assume these random variables are independent and identically distributed. Let be the joint density function and be the joint image of . The weak form of Eq. 2.1 is then given as follows: find s.t.

(2.4) |

for all .

### 2.1 Stochastic finite element method

We briefly review the stochastic finite element method as described in [1, 7]. This method approximates the weak solution of Eq. 2.1 in a finite-dimensional subspace

(2.5) |

where and are finite-dimensional subspaces of and . We will use quadrilateral elements and piecewise bilinear basis functions for the discretization of the physical space , and generalized polynomial chaos [17] for the stochastic basis functions . The latter are -dimensional orthogonal polynomials whose total degree doesn’t exceed . The orthogonality relation means

For instance, Legendre polynomials are used if the random variables have uniform distribution with zero mean and unit variance. The number of degrees of freedom in is

Given the subspace, now one can write the SFEM solution as a linear combination of the basis functions,

(2.6) |

where is the dimension of the subspace . Substituting Eqs. 2.6 and 2.3 into Eq. 2.4, and taking the test function as any basis function results in the Galerkin system: find , s.t.

(2.7) |

The coefficient matrix can be represented in tensor product notation [14],

(2.8) |

where are the stiffness matrices and correspond to the stochastic part, with entries

(2.9) | ||||

. The right-hand side can be written as a tensor product of two vectors:

(2.10) |

where

(2.11) | ||||

Note that in the Galerkin system Eq. 2.7, the matrix is symmetric and positive definite. It is also blockwise sparse (see Fig. 2.1) due to the orthogonality of . The size of the linear system is in general very large (). For such a system it is suitable to use an iterative solver. Multigrid methods are among the most effective iterative solvers for the solution of discretized elliptic PDEs, capable of achieving convergence rates that are independent of the mesh size, with computational work growing only linearly with the problem size [8, 15].

### 2.2 Multigrid

In this subsection we discuss a geometric multigrid solver proposed in [4] for the solution of the stochastic Galerkin system Eq. 2.7. For this method, the mesh size varies for different grid levels, while the polynomial degree is held constant, i.e., the fine grid space and coarse grid space are defined as

(2.12) |

respectively. Then the prolongation and restriction operators are of the form

(2.13) |

where is the same prolongation matrix as in the deterministic case. On the coarse grid we only need to construct matrices , and

(2.14) |

The matrices are the same for all grid levels.

LABEL:alg:mg describes the complete multigrid method. In each iteration, we apply one multigrid cycle (Vcycle) for the residual equation

(2.15) |

and update the solution and residual . The Vcycle function is called recursively. On the coarsest grid level () we form matrix and solve the linear system directly. The system is of order since where is a very small number on the coarsest grid. The smoothing function (Smooth) is based on a matrix splitting and stationary iteration

(2.16) |

which we assume is convergent, i.e., the spectral radius The algorithm is run until the specified relative tolerance or maximum number of iterations is reached. It is shown in [4] that for , the convergence rate of this algorithm is independent of the mesh size , the number of random variables , and the polynomial degree .

algocf[htbp] \end@float

## 3 Low-rank approximation

In this section we consider a technique designed to reduce computational effort, in terms of both time and memory use, using low-rank methods. We begin with the observation that the solution vector of the Galerkin system Eq. 2.7

can be restructured as a matrix

(3.1) |

Then (2.7) is equivalent to a system in matrix format,

(3.2) |

where

(3.3) | ||||

It has been shown in [3, 9] that the “matricized” version of the solution can be well approximated by a low-rank matrix when is large. Evidence of this can be seen in Fig. 3.1, which shows the singular values of the exact solution for the benchmark problem discussed in Section 4. In particular, the singular values decay exponentially, and low-rank approximate solutions can be obtained by dropping terms from the singular value decomposition corresponding to small singular values.

Now we use low-rank approximation in the multigrid solver for Eq. 3.2. Let be the th iterate, expressed in matricized format^{1}^{1}1In the sequel, we use and interchangeably to represent the equivalent vectorized or matricized quantities., and suppose is represented as the outer product of two rank- matrices, i.e., , where , . This factored form is convenient for implementation and can be readily used in basic matrix operations. For instance, the sum of two matrices gives

(3.4) |

Similarly, can also be written as an outer product of two matrices:

(3.5) | ||||

If are used to represent iterates in the multigrid solver and , then both memory and computational (matrix-vector products) costs can be reduced, from to . Note, however, that the ranks of the iterates may grow due to matrix additions. For example, in Eq. 3.5 the rank may increase from to in the worst case. A way to prevent this from happening, and also to keep costs low, is to truncate the iterates and force their ranks to remain low.

### 3.1 Low-rank truncation

Our truncation strategy is derived using an idea from [9]. Assume , , , and is truncated to rank with , , and . First, compute the QR factorization for both and ,

(3.6) |

The matrices and are of size . Next, compute a singular value decomposition (SVD) of the small matrix :

(3.7) |

where are the singular values in descending order. We can truncate to a rank- matrix where is specified using either a relative criterion for singular values,

(3.8) |

or an absolute one,

(3.9) |

Then the truncated matrices can be written in MATLAB notation as

(3.10) |

Note that the low-rank matrices obtained from Eq. 3.8 and Eq. 3.9 satisfy

(3.11) |

and

(3.12) |

respectively. The right-hand side of Eq. 3.12 is bounded by since in general . The total cost of this computation is . In the case where becomes larger than , we compute instead a direct SVD for , which requires a matrix-matrix product to compute and an SVD, with smaller total cost .

### 3.2 Low-rank multigrid

The multigrid solver with low-rank truncation is given in LABEL:alg:low-rank. It uses truncation operators and , which are defined using a relative and an absolute criterion, respectively. In each iteration, one multigrid cycle (Vcycle) is applied to the residual equation. Since the overall magnitudes of the singular values of the correction matrix decrease as converges to the exact solution (see Fig. 3.2 for example), it is suitable to use a relative truncation tolerance inside the Vcycle function. In the smoothing function (Smooth), the iterate is truncated after each smoothing step using a relative criterion

(3.13) |

where , , and are arguments of the Vcycle function, and is the residual at the beginning of each V-cycle. In Line LABEL:lst:line:13, the residual is truncated via a more stringent relative criterion

(3.14) |

where is the mesh size. In the main while loop, an absolute truncation criterion Eq. 3.9 with tolerance is used and all the singular values of below are dropped. The algorithm is terminated either when the largest singular value of the residual matrix is smaller than or when the multigrid solution reaches the specified accuracy.

algocf[htbp] \end@float

Note that the post-smoothing is not explicitly required in LABEL:alg:low-rank and LABEL:alg:mg, and we include it just for sake of completeness. Also, in LABEL:alg:low-rank, if the smoothing operator has the form , then for any matrix with a low-rank factorization , application of the smoothing operator gives

(3.15) |

so that the result is again the outer product of two matrices of the same low rank. The prolongation and restriction operators Eq. 2.13 are implemented in a similar manner. Thus, the smoothing and grid-transfer operators do not affect the ranks of matricized quantities in LABEL:alg:low-rank.

### 3.3 Convergence analysis

In order to show that LABEL:alg:low-rank is convergent, we need to know how truncation affects the contraction of error. Consider the case of a two-grid algorithm for the linear system , where the coarse-grid solve is exact and no post-smoothing is done. Let be the coefficient matrix on the coarse grid, let be the error associated with , and let be the residual. It is shown in [4] that if no truncation is done, the error after a two-grid cycle becomes

(3.16) |

and

(3.17) |

where is the number of pre-smoothing steps, is a constant, and as . The proof consists of establishing the smoothing property

(3.18) |

and the approximation property

(3.19) |

and applying these bounds to Eq. 3.16.

Now we derive an error bound for LABEL:alg:low-rank. The result is presented in two steps. First, we consider the Vcycle function only; the following lemma shows the effect of the relative truncations defined in Eqs. 3.14 and 3.13.

###### Lemma 1

Let and let be the associated error. Then

(3.20) |

where, for small enough and large enough , independent of the mesh size .

###### Proof

For , let be the quantity computed after application of the smoothing operator at step before truncation, and let be the modification obtained from truncation by of Eq. 3.13. For example,

(3.21) |

Denote the associated error as . From Eq. 3.13, we have

(3.22) |

Similarly, after smoothing steps,

(3.23) | ||||

where

(3.24) |

In Line LABEL:lst:line:13 of LABEL:alg:low-rank, the residual is truncated to via Eq. 3.14, so that

(3.25) |

Let . Referring to Eqs. 3.23 and 3.16, we can write the error associated with as

(3.26) | ||||

Applying the approximation property Eq. 3.19 gives

(3.27) |

Using the fact that for any matrix ,

(3.28) |

we get

(3.29) | ||||

where is the spectral radius. We have used the fact that is a symmetric matrix (assuming is symmetric). Define . Then Eqs. 3.25 and 3.24 imply that

(3.30) | ||||

On the other hand,

(3.31) | ||||

Combining Eqs. 3.31, 3.30, 3.27, 3.26 and 3.17, we conclude that

(3.32) |

where

(3.33) |

Note that , is bounded by a constant, and is of order [14]. Thus, for small enough and large enough , is bounded below 1 independent of .

Next, we adjust this argument by considering the effect of the absolute truncations in the main while loop. In LABEL:alg:low-rank, the Vcycle is used for the residual equation, and the updated solution and residual are truncated to and , respectively, using an absolute truncation criterion as in Eq. 3.9. Thus, at the th iteration (), the residual passed to the Vcycle function is in fact a perturbed residual, i.e.,

(3.34) |

It follows that in the first smoothing step,

(3.35) |

and this introduces an extra term in (see Eq. 3.23),

(3.36) |

As in the derivation of Eq. 3.29, we have

(3.37) |

In the case of a damped Jacobi smoother (see Eq. 4.4), is bounded by a constant. Denote . Also note that . Then Eqs. 3.31 and 3.30 are modified to

(3.38) | ||||

and

(3.39) |

As we truncate the updated solution , we have

(3.40) |

Let

(3.41) |

From Eqs. 3.41, 3.40, 3.39 and 3.38, we conclude with the following theorem:

###### Theorem 3.1

Let denote the error at the th iteration of LABEL:alg:low-rank. Then

(3.42) |

where for large enough and small enough , and is bounded by a constant. Also, Eq. 3.42 implies that

(3.43) |

i.e., the -norm of the error for the low-rank multigrid solution at the th iteration is bounded by . Thus, LABEL:alg:low-rank converges until the -norm of the error becomes as small as .

It can be shown that the same result holds if post-smoothing is used. Also, the convergence of full (recursive) multigrid with these truncation operations can be established following an inductive argument analogous to that in the deterministic case (see, e.g., [5, 8]). Besides, in LABEL:alg:low-rank, the truncation on imposes a stopping criterion, i.e.,

(3.44) | ||||

In Section 4 we will vary the value of and see how the low-rank multigrid solver works compared with LABEL:alg:mg where no truncation is done.

###### Remark 1

It is shown in [14] that for Eq. 2.7, with constant mean and standard deviation ,

(3.45) |

where is the maximal root of an orthogonal polynomial of degree , and is a constant independent of , , and . If Legendre polynomials on the interval are used, . Since both and in Theorem 3.1 are related to , the convergence rate of LABEL:alg:low-rank will depend on . However, if the eigenvalues decay fast, this dependence is negligable.

###### Remark 2

If instead a relative truncation is used in the while loop so that

(3.46) |

then a similar convergence result can be derived, and the algorithm stops when

(3.47) |

However, the relative truncation in general results in a larger rank for , and the improvement in efficiency will be less significant.