Reduction-based Algebraic Multigridfor Upwind DiscretizationsThis work was performed under the auspices of the U.S. Department of Defense through the National Defense Science & Engineering Graduate Fellowship Program, the U.S. Department of Energy under grant numbers (SC) DE-FC02-03ER25574 and (NNSA) DE-NA0002376, and Lawrence Livermore National Laboratory under contract B614452.

# Reduction-based Algebraic Multigrid for Upwind Discretizations††thanks: This work was performed under the auspices of the U.S. Department of Defense through the National Defense Science & Engineering Graduate Fellowship Program, the U.S. Department of Energy under grant numbers (SC) DE-FC02-03ER25574 and (NNSA) DE-NA0002376, and Lawrence Livermore National Laboratory under contract B614452.

Thomas A. Manteuffel Department of Applied Mathematics, University of Colorado at Boulder.    Stephen McCormick 22footnotemark: 2    Steffen Münzenmaier Fakultät für Mathematik, Universität Duisburg-Essen.    John Ruge 22footnotemark: 2    Ben S. Southworth Department of Applied Mathematics, University of Colorado at Boulder ().
###### Abstract

Algebraic muligrid (AMG) is a go-to solver for symmetric positive definite linear systems resulting from the discretization of elliptic PDEs, or the spatial discretization of parabolic PDEs. For diffusion-like problems, the time to solution typically scales linearly with the number of unknowns. However, convergence theory and most variations of AMG rely on being symmetric positive definite. Hyperbolic PDEs, which arise often in large-scale scientific simulations, remain a challenge for AMG, as well as other fast linear solvers, in part because the resulting linear systems are often highly nonsymmetric. Here, a new reduction-based AMG method is developed for upwind discretizations of hyperbolic PDEs and other matrices with triangular structure, based on the concept of “ideal restriction” (AMGir). Theory is presented motivating the reduction-based approach and connecting it in a big-picture sense to other nonsymmetric AMG methods. The model problems considered are variations of a steady-state transport equation, with source-terms, discontinuities, and non-constant flow. AMGir is shown to be an effective and scalable solver for various discontinuous, upwind discretizations, with unstructured meshes, and up to 6th-order finite elements, expanding the applicability of AMG to a new class of problems. AMGir is implemented in PyAMG [8] and available at https://github.com/ben-s-southworth/pyamg/tree/air.

\newsiamthm

claimClaim \newsiamremarkremarkRemark \newsiamremarkexplExample \newsiamremarkconjectureConjecture \headersReduction-based AMG for Upwind DiscretizationsSouthworth et al.

## 1 Introduction

Algebraic multigrid (AMG) is among the fastest class of algorithms for solving linear systems that result from the discretization of partial differential equations (PDEs) of elliptic type [10]. When applicable, AMG converges in linear complexity with the number of degrees-of-freedom (DOFs), and scales in parallel like , up to hundreds of thousands of processors, [7]. Originally, AMG was designed for symmetric positive definite (SPD) linear systems, and performs best when applied to the discretization of elliptic PDEs, or the spatial discretization of a parabolic PDE in time. Although this constitutes a large class of problems that arise in scientific simulations, many problems of interest remain that are difficult to solve for AMG, or any other “fast” solver. Because of this, solutions to such problems are generally obtained through Krylov methods or direct solves, each of which are more computationally expensive than desired. Efforts have been made to develop more robust AMG solvers applicable to difficult SPD systems such as anisotropic diffusion [33, 51], and nonsymmetric matrix equations [13, 33, 38, 50, 56], among others. Despite improvements in the state-of-the-art, one class of problems that comes up in scientific computation, but lacks robust algebraic solvers, are upwind discretizations of hyperbolic PDEs.

In contrast to elliptic and parabolic PDEs, the solution of hyperbolic PDEs lies on characteristic curves, and the solution at any point depends only on the solution upwind along its given characteristic. This allows for very steep gradients or “fronts” to propagate through the domain in the direction of characteristics. Typical continuous finite-element or finite-difference discretizations often struggle to capture such behavior because fronts are effectively discontinuities in the discrete setting, and not necessarily grid-aligned. It is worth noting that recent work introducing a functional scaling for a continuous least-squares finite-element discretization suggests that continuous discretizations can effectively capture fronts in the steady-state transport equation [32]. Nevertheless, due to the discontinuous-like behavior and the flow of information in a single direction along characteristics, discontinuous upwind discretizations are a natural approach to discretizing many hyperbolic PDEs [14, 35, 36, 45]. For a fully upwinded discretization, the resulting matrix has a block-triangular structure in some (although potentially unknown) ordering. Although solving a block-triangular matrix is an easy task in the serial setting, current approaches in parallel are limited by scalability.

Solving triangular systems in a traditional context using a forward or backward solve is a strictly sequential operation and, thus, does not scale well in parallel. In the dense matrix setting, this offers limited opportunity for parallelism. However, for sparse matrices, scheduling algorithms have been developed that can add some level parallelism to this process [3, 5, 28, 29]. In a sparse triangular matrix, there exists at least one DOF, say , that is independent of other DOFs and can be eliminated from the system. Then, by nature of a sparse matrix, we expect multiple DOFs to depend only on , and can then be computed in parallel. In a discretization sense, these would be DOFs immediately downwind from . Once new DOFs have been computed, DOFs downwind of them can be computed, and so on until the solution is obtained. This algorithm requires only floating point operations for a sparse matrix with nonzero entries, but the parallelism depends on the structure of the matrix. Much of the research on increased parallel performance for such problems is focused on shared memory environments such as GPUs [28, 29].

One problem and discretization that requires many parallel sparse triangular solves is the so-called “diffusion synthetic acceleration” (DSA) approach to solving the linear Boltzmann transport equation (so-called “synthetic acceleration” is simply preconditioning) [1, 16, 26, 45]. The full linear problem is seven dimensional, necessitating highly parallel solvers for even moderately refined grids, and is fundamental to studying neutron and radiation transport. DSA is effectively a two-level multigrid algorithm. A “transport sweep” acts as a surrogate for relaxation, and consists of discretizing in angle and solving a steady-state transport equation over all angles in the spatial domain. An accurate transport sweep puts error in the range of a surrogate “coarse-grid” diffusion solve, and the process iterates back and forth [1]. AMG is a fast and scalable solver for the diffusion solve, but the transport sweep consists of solving many discretizations of a hyperbolic PDE, often based on an upwind discretization [14, 35, 36, 45], and is difficult for most iterative solvers [52].

Similar scheduling algorithms to those used for sparse triangular solves on GPUs have been developed in distributed memory environments to perform “transport sweeps” [2, 6, 16, 26]. On a structured mesh, these algorithms were shown to have an optimal theoretical scaling over all angles of , for processors and a problem of dimension [2]. Although this is reasonable scaling in parallel, it is suboptimal on the so-called “road to exascale,” as a steadily increasing number of processors become available. On irregular meshes, scheduling for the transport sweep becomes more difficult and this theoretical bound cannot be achieved. The sweeping algorithm has been shown to be a bottleneck in large simulation codes [44]. On the other hand, AMG scales like in parallel [7] and with respect to DOFs, even on irregular meshes. An effective AMG algorithm for triangular systems could prove faster than known scheduling algorithms for transport sweeps in a highly parallel setting. For example, for and , we have , while , yielding a factor of 15 improvement in parallel efficiency. Of course the leading constants may change this balance in either direction, especially on irregular meshes, but this suggests there is a reasonable for which AMG will outperform a traditional transport sweep.

Traditional AMG is based on minimization principles in the -norm or energy-norm, . If is not SPD, such a norm is not well-defined, and the bulk of theory motivating AMG (for example, [10, 22, 55]) no longer holds. Nevertheless, AMG has still been applied to nonsymmetric linear systems. For some nonsymmetric problems, letting restriction be the transpose of interpolation in a traditional AMG sense remains a robust solver [48]. However, at times this can produce coarse-grid matrices that are not amenable to recursive application of the method; for example, the diagonals could change sign or even be zero. Such situations were the original motivation for this work. There have also been efforts to develop AMG methods and underlying theory targeting general nonsymmetric matrices [13, 33, 39, 50, 56]. Nevertheless, when each of these approaches is appropriate requires experience and experimentation, which can limit their use and reliability outside of the AMG community.

In this work, we develop a novel, reduction-based AMG method to solve systems with triangular structure, with a particular focus on upwind discretizations of hyperbolic PDEs. Although geometric multigrid has been applied to upwind discretizations using the well-known line-smoother approach [43] and other hyperbolic-type problems [4, 57], an effective AMG algorithm does not need geometric information, a structured grid, or a fixed/known direction over which to relax. For these reasons, an effective AMG solver is much more robust and non-intrusive, in the sense that it can easily be used by software packages, regardless of their meshing, discretization, etc.. The model problem considered here is the steady-state transport equation, which comes up in large-scale simulation of neutron and radiation transport [2, 6, 16, 26, 44]. Although this work is focused on algorithm theory and development, the long-term goal is a highly-scalable parallel solver for upwind discretizations and other systems with triangular structure.

Results of our new algorithm significantly out-perform current state-of-the-art iterative solvers in the context of steady-state transport [52]. Even for high-order, discontinuous discretizations on unstructured grids, our method is able to achieve an order-of-magnitude residual reduction in 1–2 iterations. The method likely extends to other systems with triangular structure such as directed acyclic graphs, other hyperbolic PDEs, or certain full space-time discretizations of hyperbolic PDEs as well, but this is a topic for future study.

Background on AMG, along with theoretical motivation for the new method and its relation to existing algorithms, is discussed in Section 2. The underlying premise is that certain “ideal” operators can lead to a reduction-based AMG method; in particular, we develop an AMG algorithm based on approximating ideal restriction (AMGir). Section 3 introduces a natural way to approximate ideal operators for matrices with triangular structure, as well as theoretical results on the corresponding multigrid error propagation. The algorithm and possible variations are presented in Section 4, and numerical results and scaling studies in Section 5. Finally, conclusions and future directions of research will be discussed in Section 6.

## 2 Background and motivation

### 2.1 Algebraic multigrid

Algebraic multigrid applied to the linear system , , consists of two processes: relaxation and coarse-grid correction, typically designed to be complementary in the sense that they reduce error associated with different parts of the spectrum. Relaxation takes the form

 xk+1=xk+M−1(b−Axk), (1)

where is some approximation to such that the action of can be easily computed. For example, may be the diagonal of (Jacobi) or the upper or lower triangular block of (Gauss-Seidel). Error associated with large eigenvalues of , generally corresponding to geometrically high-frequency error that cannot be well-represented on a coarse grid, is targeted through relaxation. Coarse-grid correction is then built to reduce error not reduced by relaxation, typically algebraically smooth error, that is, modes associated with small eigenvalues. Coarse-grid correction takes the form

 xk+1=xk+P(RAP)−1R(b−Axk), (2)

where and are interpolation and restriction operators, respectively, between and the next coarser grid in the AMG hierarchy, .

A two-level -cycle is then given by combining the coarse-grid correction in (2) with pre- and post-relaxation steps as in (1), resulting in a two-grid error propagation operator of the form

 ETG=(I−M−1A)(I−P(RAP)−1RA)(I−M−TA). (3)

The goal in building an AMG solver is to bound in some norm, such that AMG iterations are guaranteed to converge quickly. For SPD problems, ( denoting the Euclidean inner product) defines a norm, referred to as the -norm or energy norm. Typically, restriction is then defined as , wherein the coarse-grid correction is exactly an -orthogonal projection onto , the range of . For nonsymmetric systems considered here, no longer defines a valid norm. Various efforts have been made to develop a framework for convergence of nonsymmetric problems [13, 39], but choosing such that coarse-grid correction is an orthogonal projection (in some inner-product) is often not feasible in practice. Thus, a restriction operator must instead be chosen such that the oblique projection, , has desirable properties for AMG convergence. Here, the concept of “ideal restriction” is theoretically motivated in Sections 2.2, followed by a practical approximation in Section 3 and an algorithm outline in Section 4.

The computational cost or complexity of an AMG algorithm is typically measured in work units (WU), where one WU is the cost to perform a sparse matrix-vector multiply on the finest level in the AMG hierarchy. This motivates two additional complexity measures in AMG, operator complexity (OC) and cycle complexity (CC). Operator complexity gives the cost in WUs to perform a sparse matrix-vector multiply on each level in the hierarchy, while CC gives the cost in WUs to perform one AMG iteration, including pre- and post-relaxation, computing and restricting the residual, and coarse-grid correction. Let , and be operators on the th level of the AMG hierarchy, and denote the number of non-zeros in matrix . Then

 OC =∑ℓ|Aℓ||A0|, CC =∑ℓ(νpre+νpost+1)|Aℓ|+|Pℓ|+|Rℓ||A0|,

where and are the number of pre- and post-relaxations, respectively, and is the finest level in the hierarchy. Such measures lead to two additional objective measures of AMG performance, the effective convergence factor (ECF), , which measures the residual reduction factor for the equivalent of one WU, and work-per-digit-of-accuracy, , which measures the WUs necessary to achieve an order-of-magnitude reduction in the residual. For convergence factor and cycle complexity CC,

 ρeff :=ρ1CC,χwpd:=−CClog10(ρ)=−1log10(ρeff).

We use these objective measures to motivate parts of the AMGir algorithm, and to consider the performance and scaling of AMGir in Section 5. Although these are good measures of serial performance, they do not reflect parallel efficiency of the algorithm.

### 2.2 Ideal restriction

In a classical AMG sense, suppose that a CF-splitting has been performed on the current grid, that is, all degrees-of-freedom (DOFs) have been designated as either a coarse-point (C-point) or a fine-point (F-point). C-points each correspond to a DOF on the next coarser grid in the AMG hierarchy, and are interpolated and restricted between grids by injection (i.e., by value) [10]. Let denote the number of F-points and the number of C-points, and, for example, let denote the block of F-to-F connections in . Then, operators , and can be written in the following block forms:

 A=(AffAfcAcfAcc),P=(WI),R=(ZI), (4)

where interpolates to F-points on the fine grid via linear combinations of coarse-grid DOFs, and restricts F-point residuals. Note that (4) implicitly assumes the same CF-splitting for and , although the sparsity patterns for nonzero elements of and may be different. For notation, define the coarse-grid operator and oblique projection onto , respectively, as

 K: =RAP =ZAffW+AcfW+ZAfc+Acc, (5) π :=P(RAP)−1RA.

Here, is used to denote the coarse-grid operator instead of the traditional notation, to avoid confusion with subscripts denoting C-points.

###### Lemma 1 (Schur-complement coarse grid)

Let take the block form as in (4) and be nonsingular. Define

 Rideal:=(−AcfA−1ffI),Pideal:=(−A−1ffAfcI).

Then, for all and of the form in (4),

 RAPideal=RidealAP=RidealAPideal=SA,

where is the Schur-complement of in . That is, the coarse-grid operator, , is given by the Schur complement of .

###### Proof

Suppose . Then, and

 RidealAP =ZAffW+AcfW+ZAfc+Acc =−AcfW+AcfW−AcfA−1ffAfc+Acc =SA.

Identical steps show the equivalent result for .

The operator defined in Lemma 1 is referred to here as “ideal restriction,” a natural extension of the well-known “ideal interpolation” operator, , also defined in Lemma 1. Note that if is symmetric, . Ideal interpolation is well-motivated under classical AMG theory (for example, see [22]). Here, we show that ideal restriction and ideal interpolation each have significance in the nonsymmetric setting as well. In Theorem 1, ideal restriction is shown to be ideal in a certain sense, albeit not the same sense for which ideal interpolation acquired its name. Thus, consider error-propagation for a coarse-grid correction applied to the current error vector, :

 (I−π)e =[(I00I)−(WI)K−1(ZI)(AffAfcAcfAcc)](efec) (6) =⎛⎜⎝[I−WK−1(ZAff+Acf)]ef−WK−1(ZAfc+Acc)ec[I−K−1(ZAfc+Acc)]ec−K−1(ZAff+Acf)ef⎞⎟⎠.

Now, let , that is, let be interpolated from with some error term , and recall . Then, plugging into the fine-grid block, and into the coarse-grid block gives:

 (I−π)e =⎛⎝[I−WK−1(ZAff+Acf)](Wec+δf)−WK−1(ZAfc+Acc)ecK−1(ZAff+Acf)(Wec−ef)⎞⎠ =⎛⎝[I−WK−1(ZAff+Acf)]δf+Wec−WK−1Kec−K−1(ZAff+Acf)δf⎞⎠ =⎛⎝[I−WK−1(ZAff+Acf)]δf−K−1(ZAff+Acf)δf⎞⎠. (7)

The following theorem follows from (7).

###### Theorem 1 (Ideal restriction)

For a given CF-splitting, assume that is full rank and let take the block form as given in (4). Assume that C-points are interpolated and restricted by injection in a classical-AMG sense (4). Then, an exact coarse-grid correction at C-points is attained for all if and only if

 R =Rideal:=(−AcfA−1ffI). (8)

Furthermore, the error in the coarse-grid correction is given by

 (I−π)e =(ef−Wec0), (9)

where . Finally, a coarse-grid correction using followed by an exact solve on F-points results in an exact two-grid method, independent of .

###### Proof

From (7), coarse-grid correction as applied to some vector and restricted to C-points is given by . Noting that there does not exist a such that for all vectors , it follows that if and only if . Given that is nonsingular, if and only if . The error shown in (9) follows directly from plugging into (7). An exact solve on F-points would eliminate this error, providing an exact two-grid method.

###### Corollary 1 (Ideal interpolation)

An exact solve on F-points followed by a coarse-grid correction using so-called “ideal interpolation,” , as defined in Lemma 1 gives an exact two-level method, independent of .

###### Proof

An initial exact solve on F-point rows of the equation means that the residual at F-point rows is zero. For given error vector , this is equivalent to updating such that , or . Plugging this into the coarse-grid correction expansion in (6), along with and (see Lemma 1) gives

 (I−π)e =0.

It follows from Theorem 1 and Corollary 1 that and each lead to an exact two-level method, independent of the accompanying interpolation and restriction operators, respectively, when coupled with an exact solve on F-points. Such a two-level method is in fact a reduction algorithm, as solving is reduced to solving two smaller systems. Note that the ordering of solving the coarse- and fine-grid problems is fundamental to achieving reduction. That is, to achieve reduction, the F-point solve must follow coarse-grid correction with , while the F-point solve must precede coarse-grid correction with . Background on reduction algorithms and ideal restriction in the multigrid context is given in Sections 2.3 and 3.

A natural corollary of Theorem 1 is that if we define “trivial interpolation” as , then error propagation of coarse-grid correction with and is an -orthogonal projection. Similarly, if we define “trivial restriction” as , then residual propagation of coarse-grid correction with and is an -orthogonal projection. This is consistent with the proofs of Theorem 1 and Corollary 1, that is, is based on eliminating residuals and on eliminating error. Despite orthogonality, in practice, choosing for interpolation of F-points does not always provide the best convergence factors. This is discussed in greater detail in Section 4.6.

### 2.3 Relation to existing AMG methods

The goal of a reduction-based algorithm is simply to take a problem and split its solving into smaller sub-problems. One way to enable reduction is through block row-elimination, where can be written as

 (AffAfcAcfAcc) =(I0AcfA−1ffI)(Aff00S)(IA−1ffAfc0I). (10)

Using (10), solving is reduced to solving two systems of size and , the latter of which is the Schur complement, . Unfortunately, (10) is generally impractical as shown because of the need to compute the inverse of and solve the typically dense Schur-complement system. However, such block decompositions have been used to motivate many algorithms, and a new approximation to inverting (10) can be found in Section 3.3.

The goal of this paper is to develop a reduction-based AMG algorithm for highly nonsymmetric linear systems. However, Schur-complement coarse-grid operators and reduction-based multigrid are not new to the multigrid literature. Reduction-based multigrid has been considered in many contexts, starting in the geometric multigrid setting [24, 47]. A high-performance variation of geometric multigrid based on a reduction point-of-view is the SMG solver in the Hypre library, which uses a sparse approximation to that reproduces the action of on the constant vector [23]. The SMG solver, however, assumes knowledge of an underlying structured grid, and forms symmetric, Galerkin coarse grids. More recently, an adaptive, two-level reduction-based AMG method was proposed in [11, 30], where an approximation to is developed through an adaptive process. This work provided interesting theoretical results; however, they are based in the -norm, and thus again confined to the SPD setting. As seen in Section 2.2, a reduction-based algorithm is invariably linked to a Schur-complement coarse-grid operator. Other literature exists that is not explicitly marketed as reduction-based, but targets a Schur-complement coarse grid, for example, [15, 31, 34, 37, 38, 49]. Finally, the multigrid reduction-in-time parallel-in-time algorithm [18, 19, 20] is also based on (10), and in [20, 25] it was shown that the well-known parareal parallel-in-time method [25] is equivalent to a two-level multigrid reduction-in-time method.

Algebraic multigrid methods have also been developed that target nonsymmetric systems, but the theory is limited and the development of effective solvers is ongoing. The reduction-based decomposition given in (10) was considered in [38] for nonsymmetric M-matrices resulting from convection-diffusion discretizations. An AMG algorithm for nonsymmetric systems was proposed in [56] based on (10), where it is shown that, coupled with an exact solve on F-points, convergence in the spectral sense is bounded based on how well the coarse grid approximates the Schur complement.

Ideal restriction and interpolation are approximated in [56] by performing a constrained minimization over a fixed sparsity pattern for and , where known near-null space modes are interpolated exactly, and the remaining DOFs used to approximate the ideal operators. Such an approach is similar to the constrained minimization approach for nonsymmetric systems used in [33, 42], which approximates the ideal operators of and for and , respectively. Although reduction is not achieved for ideal operators based on and , such operators are well-motivated under traditional AMG convergence theory [33]. In fact, in the symmetric setting, approximating ideal operators for vs. through constrained energy-minimization produce nearly identical performance [33]. As an extension to the nonsymmetric setting, it is hypothesized that the AMG solvers in [33, 56] would achieve similar performance, although such a comparison has not yet been done. Regardless, the constrained approximation of ideal operators such as in [33, 56] is a broadly applicable solver that relies heavily on constraints for good convergence [33, 53].

This leads to the novelty of this work. Here, we show that matrices with triangular structure are amenable to a reduction-type algorithm (Section 3), and we develop a natural way to approximate ideal operators for triangular matrices. The resulting solver does not rely on constraints for strong convergence, and is based only on an appropriate CF-splitting and the approximation of , which are discussed in Section 3.

## 3 Triangular structure and approximate ideal restriction

In Section 2.2, theory was developed to motivate ideal restriction and ideal interpolation for nonsymmetric linear systems. However, ideal operators are typically not formed in practice due to the complexity of forming . This section shows that for block-triangular matrices, there is a natural way to approximate . Furthermore, approximating ideal operators in coarse-grid correction, coupled with Jacobi F-relaxation, gives a nilpotent multigrid error-propagation operator, that is, convergence to the exact solution is ultimately guaranteed (Section 3.1). Coupled with an appropriate CF-splitting (Section 3.2), this leads to a fast and practical solver for matrices with triangular structure.

Although direct solves of triangular matrices do not scale well in parallel, direct solve considerations suggest that a triangular matrix is amenable to a reduction-based algorithm, because each step in a forward or backward solve is effectively reduction by eliminating one DOF. Thus, the goal here is to develop a reduction-based solver for triangular systems, independent of matrix ordering, based on the concept of “approximate ideal restriction” (AIR). Moving forward, assume that is lower triangular with unit diagonal in some ordering. For theoretical purposes, we assume that is actually ordered to be lower triangular, but it is important to note that results presented here are independent of this ordering, and it is only used to simplify proofs. An in-depth discussion on handling of block-triangular matrices is given in Section 4.1.

Let , where is the strictly lower triangular part of , and is, thus, also nilpotent. Hence, can be written as a finite Neumann expansion:

 A−1ff=(I−Lff)−1=n∑i=0Liff. (11)

An order- approximation to is then given by truncating (11): , for some . The error in can be measured as . Noting that , let and consider its action on :

 RA =(Acf(I−Δ(k)Aff)Acc−AcfΔ(k)Afc) =(AcfLk+1ffAcc−AcfΔ(k)Afc).

Since is nilpotent, approximating ideal restriction via a truncated Neumann expansion of can be seen as equivalent to approximating the action of on , namely, trying to set equal to zero within F-point columns. For certain structured matrices, the th-order Neumann expansion is exactly eliminating the contribution of F-points within distance for a given C-point (row of ). In fact, this was the original motivation for AIR – eliminating the contribution of error at F-points to the coarse-grid right-hand side.

Numerical results presented in Section 5 are based on two properties of the AMGir error-propagation operator. In a reduction context, the error-propagation operator in the multilevel setting is nilpotent, ensuring an (eventually) exact method (Section 3.1). In addition to the nilpotency, an appropriate CF-splitting such that off-diagonal elements of are small additionally ensures effective error reduction at indices that are not “reduced” in a given iteration (Section 3.2). The latter is likely the primary reason for strong convergence, although both are important.

### 3.1 Nilpotency of error propagation

Lemma 2 and Corollary 2 below are presented as important pieces in motivating AMGir. Corollary 2 is fundamental to a multilevel implementation of AMGir, showing that as long as in ideal restriction and interpolation is approximated with a lower-triangular matrix, the coarse-grid operator retains the same triangular structure as the fine grid. Thus, the same algorithms for forming and are appropriate in a recursive and multilevel fashion. For given ideal restriction and interpolation approximations, coupled with Jacobi relaxation on F-points, Lemma 3 shows that the two-grid error propagation operator is nilpotent. Theorem 2 is the primary convergence result of this section, showing that a full multilevel implementation has a nilpotent error-propagation operator. A trivial corollary is that convergence to the exact solution is guaranteed within iterations. Although bounds of iterations of complexity is far removed from the typical linear or near-linear complexity of AMG, such a bound is at least of theoretical interest.

###### Lemma 2

Let be lower triangular and suppose the DOFs of have been partitioned into C-points and F-points. Then, if is lower triangular, is nilpotent and strictly lower triangular. The same holds for , where .

###### Proof

Let C-points be given by , where denotes the global index of the th C-point, and likewise for F-points. Furthermore, assume and are in increasing order, that is, , and that global indices of are ordered such that is lower triangular, that is, if . In terms of paths in the graph of , this means that from a given node , there only exist paths to nodes .

For to be strictly lower triangular, we must have for all . This can be shown by considering paths between the th and th C-points in the graph of .

1. Starting at , can only contain paths from to F-points .

2. Then, can only contain paths from to F-points .

3. Finally, maps from F-points to C-points .

Thus, the only possible paths in the graph of are from points to . It follows that if , or, equivalently, , then .

###### Corollary 2

Let be lower triangular and and be lower triangular approximations to , and define and . Then, the coarse-grid operator (5) is lower triangular with diagonal given by that of .

###### Proof

First note that

 K:=RAP=Acc−Acf(ΔR+ΔP−ΔRAffΔP)Afc.

We know that is lower triangular. It follows from Lemma 2 that is strictly lower triangular. Thus, is lower triangular with the same diagonal as .

###### Lemma 3 (Nilpotent two-grid AMG)

Let be lower triangular in some ordering and and be lower triangular approximations to , and define and . Then, the two-grid AMG preconditioner with coarse-grid correction and Jacobi F-relaxation is strictly lower triangular, and thus nilpotent.

###### Proof

Define C-points as the set , where is the global index of the th C-point, and likewise for F-points. Furthermore, let and be in increasing order, that is, , and let global indices of be ordered such that is lower triangular, that is, if . Then, for a given node in , there only exists paths to nodes . It follows that off-diagonal block only maps from to , and similarly for .

The two-grid AMG error-propagation operator with relaxation scheme looks like . First consider the coarse-grid correction term:

 I−P(RAP)−1RA =(I00I)−(−ΔPAfcI)K−1(−AcfΔRI)(AffAfcAcfAcc) =(I+ΔPAfcK−1Acf(I−ΔRAff)ΔPAfcK−1(Acc−AcfΔRAfc)−K−1Acf(I−ΔRAff)I−K−1(Acc−AcfΔRAfc)). (12)

Note that is strictly lower triangular, and and are lower triangular. Then, the action of the lower-left block in (12) can be seen as three steps, mapping , where . A similar result holds for the upper-right block, showing that, in the global ordering, the off-diagonal blocks of (12) are strictly lower triangular. For the lower-right block, it follows from Corollary 2 and Lemma 2 that is lower triangular with unit diagonal. Subtracting from the identity gives a strictly lower triangular block. Finally, given that the product of a strictly lower triangular matrix and lower triangular matrix is strictly lower triangular, it follows from Lemma 2 that is strictly lower triangular. Then, the upper left block is lower triangular with unit diagonal.

Thus, the error-propagation operator for coarse-grid correction is lower triangular with unit diagonal in the FF-block, and strictly lower triangular in other blocks. Furthermore, the error-propagation matrix for Jacobi relaxation on F-points is strictly lower triangular in the FF-block and lower triangular elsewhere. It follows that the AIR two-grid error-propagation operator given by the product of F-Jacobi relaxation and coarse-grid correction is a strictly lower triangular and, thus, nilpotent, operator.

###### Theorem 2 (Nilpotent AMG)

Let be lower triangular in some ordering and and be lower triangular approximations to , and define and . Then, the multilevel AMG preconditioner with coarse-grid correction and Jacobi F-relaxation is strictly lower triangular, and thus nilpotent.

###### Proof

In proving a nilpotent error-propagation for a multilevel algorithm, we proceed in a recursive fashion. Consider the error-propagation of coarse-grid correction with an inexact coarse-grid solve. Error propagation of coarse-grid correction takes the form , for some coarse-grid correction . In the case of an exact coarse-grid solve, and . For an inexact coarse-grid solve, the correction can be written as some perturbation of the exact solve, , where the error in the correction is given by . The error propagation of the inexact coarse-grid solve, that is, the coarse-grid V-cycle, then operates on , that is, , for some initial guess . Assuming a zero initial guess on the coarse-grid problem as is standard in AMG, then

 enew =eold−P(¯¯¯xc−Ec¯¯¯xc) =eold−P(I−Ec)¯¯¯xc =eold−P(I−Ec)K−1RAeold =[(I−PK−1RA)+PEcK−1RA]eold.

Lemma 3 showed that is strictly lower triangular when coupled with Jacobi F-relaxation. Now, we must show that the error term is also strictly lower triangular. Given that is lower triangular, it is sufficient to show that is strictly lower triangular. Expanding, we get

 PEcK−1R =(ΔPAfcEcK−1AcfΔR−ΔPAfcEcK−1−EcK−1AcfΔREcK−1). (13)

An analogous result to Lemma 3 confirms that if is strictly lower triangular, then so is (13).

Let be the error-propagation operator for a V-cycle starting on the th level of the hierarchy. On the coarsest level of the hierarchy, say , because the solve is exact. Then, on level , is strictly lower triangular. It follows that the error-propagation operator on level , , is strictly lower-triangular. On level , is again strictly lower triangular, because is strictly lower triangular and lower triangular. By induction, the error-propagation operator for a multilevel hierarch with levels is strictly lower triangular and, thus, nilpotent.

For triangular systems, it is easy to see that simple Jacobi relaxation also produces a nilpotent error-propagation operator. The significance of Lemma 3 and Theorem 2 is in showing analytically that (at least in the spectral sense) coarse-grid correction does not cause divergent behavior, and AMGir is asymptotically robust for triangular systems.

### 3.2 CF-splitting and Lff

Although nilpotency of error-propagation is a nice property to have, it does not necessarily imply fast convergence. To understand what else is contributing to good convergence, we further examine the error-propagation operator. First note that error propagation for Jacobi F-relaxation is given by . Letting for some , then the following identities hold, which we will need later:

 K =Acc−AcfΔ(2k+1)Afc, (Δ(2k+1))−1 =(I+Lk+1ff)−1(Δ(k))−1, (Δ(2k+1))−1Δ(k) =I+n∑j=1(−Lk+1ff)j. (14)

For ease of notation, let and . Then, using (14) and a Woodbury inverse expansion for , we get

 K−1(Acc−AcfΔAfc) =(A−1cc+A−1ccAcf(~Δ−1−AfcA−1ccAcf)−1AfcA−1cc)(Acc−AcfΔAfc) =(I+A−1ccAcf(~Δ−1−AfcA−1ccAcf)−1Afc)(I−A−1ccAcfΔAfc) =I+A−1ccAcf(~Δ−1−AfcA−1ccAcf)−1(I−~Δ−1Δ)Afc =I−A−1ccAcf(~Δ−1−AfcA−1ccAcf)−1n∑j=1(−Lk+1ff)jAfc. (15)

Plugging (15) and the identity into the coarse-grid correction (12) shows that error propagation for coarse-grid correction can be written as

 e(1)f =(I+ΔAfcK−1AcfLk+1ff)e(0)f+ ΔAfc(I−A−1ccAcf(~Δ−1−AfcA−1ccAcf)−1n∑j=1(−Lk+1ff)jAfc)e(0)c, e(1)c =−K−1AcfLk+1ffe(0)f+A−1ccAcf(~Δ−1−AfcA−1ccAcf)−1n∑j=1(−Lk+1ff)jAfce(0)c.

Define , , and . Then, if we consider the highest-order term of , error propagation takes the form

 (16)

The key thing to notice in (16) is that each error update on C-points is hit with . Note that iterations of F-Jacobi relaxation has error propagation

 (e(i+ℓ)fe(i+ℓ)c) =(Lℓff−Δ(ℓ−1)Afc0I)(e(i)fe(i)c). (17)

Then, coupling coarse-grid correction (16) with iterations of pre- or post-F-relaxation scales the error at all points by .

Such a scaling of error by is a complementary contribution to strong convergence factors achieved by AMGir. Ensuring that is not only nilpotent, but also has very small elements, is key to achieving a fast reduction in error. Here, we focus on matrices that result from the discretization of differential operators, in which case off-diagonal entries are typically expected to be smaller in magnitude than the diagonal. In fact, the traditional AMG motivation for a CF-splitting, picking F-points such that is well-conditioned, is also appropriate for AMGir. A classical AMG coarsening approach targets to be strictly diagonally dominant, which is equivalent to minimizing the size of elements in . Figure 1 demonstrates that for a model problem introduced in Section 5, has very small row sums, although very few rows of are diagonal (in which case the row of is empty). Similar results hold on all levels of the hierarchy, as well as for other discretizations and finite element orders.

Small row sums of reduce error efficiently for most rows. For rows without small row sums in , we rely on the redistribution of error through operators , and , as well as the nilpotency of error propagation to reduce error. Specifically, larger row sums in typically occur near the boundary of the domain. Near the boundary, a nilpotent error-propagation operator reaches the exact solution in iterations.

### 3.3 Schur-complement preconditioning

This work focuses on a traditional AMG framework. However, preconditioners are often developed by approximating the block decomposition of in (10). An analogous result to Theorem 2 holds for the block preconditioner

 ˆM =(I0AcfΔRI)(Δ−100ˆK)(IΔPAfc0I), (18)

where , and are lower-triangular approximations to and some approximation to the Schur complement (not necessarily ), that is, the multilevel error-propagation operator, , is nilpotent. Letting be an order- truncated Neumann expansion as in (11) and gives the relation , where . Some linear algebra then shows that

 I−ˆM−1A =(I+ΔAfcK−1Acf0−K−1Acf0)((D−1ffLff)k+1000). (19)

Note that here we have only assumed to be nonsingular. An interesting result of (19) is that the error after iterations only depends on previous error at F-points, that is, at each iteration, C-point error is eliminated and then acquired based on error at F-points. In the case of lower triangular , (18) can act as a preconditioner with very similar properties to AMGir. Although we do not numerically explore preconditioning with here, its theoretical similarity to AMGir is worth demonstrating for future work.

###### Remark 1

An infinite Neumann expansion (11) can be used to approximate for general non-triangular matrices, and similar algorithms developed (16) and (19). However, in the case of triangular matrices, an order- Neumann approximation to is exact on the diagonal and first sub-diagonals. For general matrices, such accuracy cannot be guaranteed. Even in the case of symmetric positive definite matrices, where off-diagonal entries decay exponentially fast for well-conditioned [12], a truncated Neumann expansion does not necessarily provide a good approximation to (or its diagonal for that matter). The application of AMGir to matrices with non-triangular components is on-going work.

## 4 Algorithm

Algorithmically, AMGir takes on the traditional structure of an AMG method. Building on the theoretical motivation provided in Section