Reductionbased Algebraic Multigrid
for Upwind Discretizations^{†}^{†}thanks: This work was performed under the auspices of the U.S. Department of Defense
through the National Defense Science & Engineering Graduate Fellowship Program, the U.S. Department
of Energy under grant numbers (SC) DEFC0203ER25574 and (NNSA) DENA0002376, and
Lawrence Livermore National Laboratory under contract B614452.
Abstract
Algebraic muligrid (AMG) is a goto solver for symmetric positive definite linear systems resulting from the discretization of elliptic PDEs, or the spatial discretization of parabolic PDEs. For diffusionlike problems, the time to solution typically scales linearly with the number of unknowns. However, convergence theory and most variations of AMG rely on being symmetric positive definite. Hyperbolic PDEs, which arise often in largescale scientific simulations, remain a challenge for AMG, as well as other fast linear solvers, in part because the resulting linear systems are often highly nonsymmetric. Here, a new reductionbased AMG method is developed for upwind discretizations of hyperbolic PDEs and other matrices with triangular structure, based on the concept of “ideal restriction” (AMGir). Theory is presented motivating the reductionbased approach and connecting it in a bigpicture sense to other nonsymmetric AMG methods. The model problems considered are variations of a steadystate transport equation, with sourceterms, discontinuities, and nonconstant flow. AMGir is shown to be an effective and scalable solver for various discontinuous, upwind discretizations, with unstructured meshes, and up to 6thorder finite elements, expanding the applicability of AMG to a new class of problems. AMGir is implemented in PyAMG [8] and available at https://github.com/benssouthworth/pyamg/tree/air.
claimClaim \newsiamremarkremarkRemark \newsiamremarkexplExample \newsiamremarkconjectureConjecture \headersReductionbased AMG for Upwind DiscretizationsSouthworth et al.
1 Introduction
Algebraic multigrid (AMG) is among the fastest class of algorithms for solving linear systems that result from the discretization of partial differential equations (PDEs) of elliptic type [10]. When applicable, AMG converges in linear complexity with the number of degreesoffreedom (DOFs), and scales in parallel like , up to hundreds of thousands of processors, [7]. Originally, AMG was designed for symmetric positive definite (SPD) linear systems, and performs best when applied to the discretization of elliptic PDEs, or the spatial discretization of a parabolic PDE in time. Although this constitutes a large class of problems that arise in scientific simulations, many problems of interest remain that are difficult to solve for AMG, or any other “fast” solver. Because of this, solutions to such problems are generally obtained through Krylov methods or direct solves, each of which are more computationally expensive than desired. Efforts have been made to develop more robust AMG solvers applicable to difficult SPD systems such as anisotropic diffusion [33, 51], and nonsymmetric matrix equations [13, 33, 38, 50, 56], among others. Despite improvements in the stateoftheart, one class of problems that comes up in scientific computation, but lacks robust algebraic solvers, are upwind discretizations of hyperbolic PDEs.
In contrast to elliptic and parabolic PDEs, the solution of hyperbolic PDEs lies on characteristic curves, and the solution at any point depends only on the solution upwind along its given characteristic. This allows for very steep gradients or “fronts” to propagate through the domain in the direction of characteristics. Typical continuous finiteelement or finitedifference discretizations often struggle to capture such behavior because fronts are effectively discontinuities in the discrete setting, and not necessarily gridaligned. It is worth noting that recent work introducing a functional scaling for a continuous leastsquares finiteelement discretization suggests that continuous discretizations can effectively capture fronts in the steadystate transport equation [32]. Nevertheless, due to the discontinuouslike behavior and the flow of information in a single direction along characteristics, discontinuous upwind discretizations are a natural approach to discretizing many hyperbolic PDEs [14, 35, 36, 45]. For a fully upwinded discretization, the resulting matrix has a blocktriangular structure in some (although potentially unknown) ordering. Although solving a blocktriangular matrix is an easy task in the serial setting, current approaches in parallel are limited by scalability.
Solving triangular systems in a traditional context using a forward or backward solve is a strictly sequential operation and, thus, does not scale well in parallel. In the dense matrix setting, this offers limited opportunity for parallelism. However, for sparse matrices, scheduling algorithms have been developed that can add some level parallelism to this process [3, 5, 28, 29]. In a sparse triangular matrix, there exists at least one DOF, say , that is independent of other DOFs and can be eliminated from the system. Then, by nature of a sparse matrix, we expect multiple DOFs to depend only on , and can then be computed in parallel. In a discretization sense, these would be DOFs immediately downwind from . Once new DOFs have been computed, DOFs downwind of them can be computed, and so on until the solution is obtained. This algorithm requires only floating point operations for a sparse matrix with nonzero entries, but the parallelism depends on the structure of the matrix. Much of the research on increased parallel performance for such problems is focused on shared memory environments such as GPUs [28, 29].
One problem and discretization that requires many parallel sparse triangular solves is the socalled “diffusion synthetic acceleration” (DSA) approach to solving the linear Boltzmann transport equation (socalled “synthetic acceleration” is simply preconditioning) [1, 16, 26, 45]. The full linear problem is seven dimensional, necessitating highly parallel solvers for even moderately refined grids, and is fundamental to studying neutron and radiation transport. DSA is effectively a twolevel multigrid algorithm. A “transport sweep” acts as a surrogate for relaxation, and consists of discretizing in angle and solving a steadystate transport equation over all angles in the spatial domain. An accurate transport sweep puts error in the range of a surrogate “coarsegrid” diffusion solve, and the process iterates back and forth [1]. AMG is a fast and scalable solver for the diffusion solve, but the transport sweep consists of solving many discretizations of a hyperbolic PDE, often based on an upwind discretization [14, 35, 36, 45], and is difficult for most iterative solvers [52].
Similar scheduling algorithms to those used for sparse triangular solves on GPUs have been developed in distributed memory environments to perform “transport sweeps” [2, 6, 16, 26]. On a structured mesh, these algorithms were shown to have an optimal theoretical scaling over all angles of , for processors and a problem of dimension [2]. Although this is reasonable scaling in parallel, it is suboptimal on the socalled “road to exascale,” as a steadily increasing number of processors become available. On irregular meshes, scheduling for the transport sweep becomes more difficult and this theoretical bound cannot be achieved. The sweeping algorithm has been shown to be a bottleneck in large simulation codes [44]. On the other hand, AMG scales like in parallel [7] and with respect to DOFs, even on irregular meshes. An effective AMG algorithm for triangular systems could prove faster than known scheduling algorithms for transport sweeps in a highly parallel setting. For example, for and , we have , while , yielding a factor of 15 improvement in parallel efficiency. Of course the leading constants may change this balance in either direction, especially on irregular meshes, but this suggests there is a reasonable for which AMG will outperform a traditional transport sweep.
Traditional AMG is based on minimization principles in the norm or energynorm, . If is not SPD, such a norm is not welldefined, and the bulk of theory motivating AMG (for example, [10, 22, 55]) no longer holds. Nevertheless, AMG has still been applied to nonsymmetric linear systems. For some nonsymmetric problems, letting restriction be the transpose of interpolation in a traditional AMG sense remains a robust solver [48]. However, at times this can produce coarsegrid matrices that are not amenable to recursive application of the method; for example, the diagonals could change sign or even be zero. Such situations were the original motivation for this work. There have also been efforts to develop AMG methods and underlying theory targeting general nonsymmetric matrices [13, 33, 39, 50, 56]. Nevertheless, when each of these approaches is appropriate requires experience and experimentation, which can limit their use and reliability outside of the AMG community.
In this work, we develop a novel, reductionbased AMG method to solve systems with triangular structure, with a particular focus on upwind discretizations of hyperbolic PDEs. Although geometric multigrid has been applied to upwind discretizations using the wellknown linesmoother approach [43] and other hyperbolictype problems [4, 57], an effective AMG algorithm does not need geometric information, a structured grid, or a fixed/known direction over which to relax. For these reasons, an effective AMG solver is much more robust and nonintrusive, in the sense that it can easily be used by software packages, regardless of their meshing, discretization, etc.. The model problem considered here is the steadystate transport equation, which comes up in largescale simulation of neutron and radiation transport [2, 6, 16, 26, 44]. Although this work is focused on algorithm theory and development, the longterm goal is a highlyscalable parallel solver for upwind discretizations and other systems with triangular structure.
Results of our new algorithm significantly outperform current stateoftheart iterative solvers in the context of steadystate transport [52]. Even for highorder, discontinuous discretizations on unstructured grids, our method is able to achieve an orderofmagnitude residual reduction in 1–2 iterations. The method likely extends to other systems with triangular structure such as directed acyclic graphs, other hyperbolic PDEs, or certain full spacetime discretizations of hyperbolic PDEs as well, but this is a topic for future study.
Background on AMG, along with theoretical motivation for the new method and its relation to existing algorithms, is discussed in Section 2. The underlying premise is that certain “ideal” operators can lead to a reductionbased AMG method; in particular, we develop an AMG algorithm based on approximating ideal restriction (AMGir). Section 3 introduces a natural way to approximate ideal operators for matrices with triangular structure, as well as theoretical results on the corresponding multigrid error propagation. The algorithm and possible variations are presented in Section 4, and numerical results and scaling studies in Section 5. Finally, conclusions and future directions of research will be discussed in Section 6.
2 Background and motivation
2.1 Algebraic multigrid
Algebraic multigrid applied to the linear system , , consists of two processes: relaxation and coarsegrid correction, typically designed to be complementary in the sense that they reduce error associated with different parts of the spectrum. Relaxation takes the form
(1) 
where is some approximation to such that the action of can be easily computed. For example, may be the diagonal of (Jacobi) or the upper or lower triangular block of (GaussSeidel). Error associated with large eigenvalues of , generally corresponding to geometrically highfrequency error that cannot be wellrepresented on a coarse grid, is targeted through relaxation. Coarsegrid correction is then built to reduce error not reduced by relaxation, typically algebraically smooth error, that is, modes associated with small eigenvalues. Coarsegrid correction takes the form
(2) 
where and are interpolation and restriction operators, respectively, between and the next coarser grid in the AMG hierarchy, .
A twolevel cycle is then given by combining the coarsegrid correction in (2) with pre and postrelaxation steps as in (1), resulting in a twogrid error propagation operator of the form
(3) 
The goal in building an AMG solver is to bound in some norm, such that AMG iterations are guaranteed to converge quickly. For SPD problems, ( denoting the Euclidean inner product) defines a norm, referred to as the norm or energy norm. Typically, restriction is then defined as , wherein the coarsegrid correction is exactly an orthogonal projection onto , the range of . For nonsymmetric systems considered here, no longer defines a valid norm. Various efforts have been made to develop a framework for convergence of nonsymmetric problems [13, 39], but choosing such that coarsegrid correction is an orthogonal projection (in some innerproduct) is often not feasible in practice. Thus, a restriction operator must instead be chosen such that the oblique projection, , has desirable properties for AMG convergence. Here, the concept of “ideal restriction” is theoretically motivated in Sections 2.2, followed by a practical approximation in Section 3 and an algorithm outline in Section 4.
The computational cost or complexity of an AMG algorithm is typically measured in work units (WU), where one WU is the cost to perform a sparse matrixvector multiply on the finest level in the AMG hierarchy. This motivates two additional complexity measures in AMG, operator complexity (OC) and cycle complexity (CC). Operator complexity gives the cost in WUs to perform a sparse matrixvector multiply on each level in the hierarchy, while CC gives the cost in WUs to perform one AMG iteration, including pre and postrelaxation, computing and restricting the residual, and coarsegrid correction. Let , and be operators on the th level of the AMG hierarchy, and denote the number of nonzeros in matrix . Then
OC  
CC 
where and are the number of pre and postrelaxations, respectively, and is the finest level in the hierarchy. Such measures lead to two additional objective measures of AMG performance, the effective convergence factor (ECF), , which measures the residual reduction factor for the equivalent of one WU, and workperdigitofaccuracy, , which measures the WUs necessary to achieve an orderofmagnitude reduction in the residual. For convergence factor and cycle complexity CC,
We use these objective measures to motivate parts of the AMGir algorithm, and to consider the performance and scaling of AMGir in Section 5. Although these are good measures of serial performance, they do not reflect parallel efficiency of the algorithm.
2.2 Ideal restriction
In a classical AMG sense, suppose that a CFsplitting has been performed on the current grid, that is, all degreesoffreedom (DOFs) have been designated as either a coarsepoint (Cpoint) or a finepoint (Fpoint). Cpoints each correspond to a DOF on the next coarser grid in the AMG hierarchy, and are interpolated and restricted between grids by injection (i.e., by value) [10]. Let denote the number of Fpoints and the number of Cpoints, and, for example, let denote the block of FtoF connections in . Then, operators , and can be written in the following block forms:
(4) 
where interpolates to Fpoints on the fine grid via linear combinations of coarsegrid DOFs, and restricts Fpoint residuals. Note that (4) implicitly assumes the same CFsplitting for and , although the sparsity patterns for nonzero elements of and may be different. For notation, define the coarsegrid operator and oblique projection onto , respectively, as
(5)  
Here, is used to denote the coarsegrid operator instead of the traditional notation, to avoid confusion with subscripts denoting Cpoints.
Lemma 1 (Schurcomplement coarse grid)
Proof
Suppose . Then, and
Identical steps show the equivalent result for .
The operator defined in Lemma 1 is referred to here as “ideal restriction,” a natural extension of the wellknown “ideal interpolation” operator, , also defined in Lemma 1. Note that if is symmetric, . Ideal interpolation is wellmotivated under classical AMG theory (for example, see [22]). Here, we show that ideal restriction and ideal interpolation each have significance in the nonsymmetric setting as well. In Theorem 1, ideal restriction is shown to be ideal in a certain sense, albeit not the same sense for which ideal interpolation acquired its name. Thus, consider errorpropagation for a coarsegrid correction applied to the current error vector, :
(6)  
Now, let , that is, let be interpolated from with some error term , and recall . Then, plugging into the finegrid block, and into the coarsegrid block gives:
(7) 
The following theorem follows from (7).
Theorem 1 (Ideal restriction)
For a given CFsplitting, assume that is full rank and let take the block form as given in (4). Assume that Cpoints are interpolated and restricted by injection in a classicalAMG sense (4). Then, an exact coarsegrid correction at Cpoints is attained for all if and only if
(8) 
Furthermore, the error in the coarsegrid correction is given by
(9) 
where . Finally, a coarsegrid correction using followed by an exact solve on Fpoints results in an exact twogrid method, independent of .
Proof
From (7), coarsegrid correction as applied to some vector and restricted to Cpoints is given by . Noting that there does not exist a such that for all vectors , it follows that if and only if . Given that is nonsingular, if and only if . The error shown in (9) follows directly from plugging into (7). An exact solve on Fpoints would eliminate this error, providing an exact twogrid method.
Corollary 1 (Ideal interpolation)
An exact solve on Fpoints followed by a coarsegrid correction using socalled “ideal interpolation,” , as defined in Lemma 1 gives an exact twolevel method, independent of .
Proof
It follows from Theorem 1 and Corollary 1 that and each lead to an exact twolevel method, independent of the accompanying interpolation and restriction operators, respectively, when coupled with an exact solve on Fpoints. Such a twolevel method is in fact a reduction algorithm, as solving is reduced to solving two smaller systems. Note that the ordering of solving the coarse and finegrid problems is fundamental to achieving reduction. That is, to achieve reduction, the Fpoint solve must follow coarsegrid correction with , while the Fpoint solve must precede coarsegrid correction with . Background on reduction algorithms and ideal restriction in the multigrid context is given in Sections 2.3 and 3.
A natural corollary of Theorem 1 is that if we define “trivial interpolation” as , then error propagation of coarsegrid correction with and is an orthogonal projection. Similarly, if we define “trivial restriction” as , then residual propagation of coarsegrid correction with and is an orthogonal projection. This is consistent with the proofs of Theorem 1 and Corollary 1, that is, is based on eliminating residuals and on eliminating error. Despite orthogonality, in practice, choosing for interpolation of Fpoints does not always provide the best convergence factors. This is discussed in greater detail in Section 4.6.
2.3 Relation to existing AMG methods
The goal of a reductionbased algorithm is simply to take a problem and split its solving into smaller subproblems. One way to enable reduction is through block rowelimination, where can be written as
(10) 
Using (10), solving is reduced to solving two systems of size and , the latter of which is the Schur complement, . Unfortunately, (10) is generally impractical as shown because of the need to compute the inverse of and solve the typically dense Schurcomplement system. However, such block decompositions have been used to motivate many algorithms, and a new approximation to inverting (10) can be found in Section 3.3.
The goal of this paper is to develop a reductionbased AMG algorithm for highly nonsymmetric linear systems. However, Schurcomplement coarsegrid operators and reductionbased multigrid are not new to the multigrid literature. Reductionbased multigrid has been considered in many contexts, starting in the geometric multigrid setting [24, 47]. A highperformance variation of geometric multigrid based on a reduction pointofview is the SMG solver in the Hypre library, which uses a sparse approximation to that reproduces the action of on the constant vector [23]. The SMG solver, however, assumes knowledge of an underlying structured grid, and forms symmetric, Galerkin coarse grids. More recently, an adaptive, twolevel reductionbased AMG method was proposed in [11, 30], where an approximation to is developed through an adaptive process. This work provided interesting theoretical results; however, they are based in the norm, and thus again confined to the SPD setting. As seen in Section 2.2, a reductionbased algorithm is invariably linked to a Schurcomplement coarsegrid operator. Other literature exists that is not explicitly marketed as reductionbased, but targets a Schurcomplement coarse grid, for example, [15, 31, 34, 37, 38, 49]. Finally, the multigrid reductionintime parallelintime algorithm [18, 19, 20] is also based on (10), and in [20, 25] it was shown that the wellknown parareal parallelintime method [25] is equivalent to a twolevel multigrid reductionintime method.
Algebraic multigrid methods have also been developed that target nonsymmetric systems, but the theory is limited and the development of effective solvers is ongoing. The reductionbased decomposition given in (10) was considered in [38] for nonsymmetric Mmatrices resulting from convectiondiffusion discretizations. An AMG algorithm for nonsymmetric systems was proposed in [56] based on (10), where it is shown that, coupled with an exact solve on Fpoints, convergence in the spectral sense is bounded based on how well the coarse grid approximates the Schur complement.
Ideal restriction and interpolation are approximated in [56] by performing a constrained minimization over a fixed sparsity pattern for and , where known nearnull space modes are interpolated exactly, and the remaining DOFs used to approximate the ideal operators. Such an approach is similar to the constrained minimization approach for nonsymmetric systems used in [33, 42], which approximates the ideal operators of and for and , respectively. Although reduction is not achieved for ideal operators based on and , such operators are wellmotivated under traditional AMG convergence theory [33]. In fact, in the symmetric setting, approximating ideal operators for vs. through constrained energyminimization produce nearly identical performance [33]. As an extension to the nonsymmetric setting, it is hypothesized that the AMG solvers in [33, 56] would achieve similar performance, although such a comparison has not yet been done. Regardless, the constrained approximation of ideal operators such as in [33, 56] is a broadly applicable solver that relies heavily on constraints for good convergence [33, 53].
This leads to the novelty of this work. Here, we show that matrices with triangular structure are amenable to a reductiontype algorithm (Section 3), and we develop a natural way to approximate ideal operators for triangular matrices. The resulting solver does not rely on constraints for strong convergence, and is based only on an appropriate CFsplitting and the approximation of , which are discussed in Section 3.
3 Triangular structure and approximate ideal restriction
In Section 2.2, theory was developed to motivate ideal restriction and ideal interpolation for nonsymmetric linear systems. However, ideal operators are typically not formed in practice due to the complexity of forming . This section shows that for blocktriangular matrices, there is a natural way to approximate . Furthermore, approximating ideal operators in coarsegrid correction, coupled with Jacobi Frelaxation, gives a nilpotent multigrid errorpropagation operator, that is, convergence to the exact solution is ultimately guaranteed (Section 3.1). Coupled with an appropriate CFsplitting (Section 3.2), this leads to a fast and practical solver for matrices with triangular structure.
Although direct solves of triangular matrices do not scale well in parallel, direct solve considerations suggest that a triangular matrix is amenable to a reductionbased algorithm, because each step in a forward or backward solve is effectively reduction by eliminating one DOF. Thus, the goal here is to develop a reductionbased solver for triangular systems, independent of matrix ordering, based on the concept of “approximate ideal restriction” (AIR). Moving forward, assume that is lower triangular with unit diagonal in some ordering. For theoretical purposes, we assume that is actually ordered to be lower triangular, but it is important to note that results presented here are independent of this ordering, and it is only used to simplify proofs. An indepth discussion on handling of blocktriangular matrices is given in Section 4.1.
Let , where is the strictly lower triangular part of , and is, thus, also nilpotent. Hence, can be written as a finite Neumann expansion:
(11) 
An order approximation to is then given by truncating (11): , for some . The error in can be measured as . Noting that , let and consider its action on :
Since is nilpotent, approximating ideal restriction via a truncated Neumann expansion of can be seen as equivalent to approximating the action of on , namely, trying to set equal to zero within Fpoint columns. For certain structured matrices, the thorder Neumann expansion is exactly eliminating the contribution of Fpoints within distance for a given Cpoint (row of ). In fact, this was the original motivation for AIR – eliminating the contribution of error at Fpoints to the coarsegrid righthand side.
Numerical results presented in Section 5 are based on two properties of the AMGir errorpropagation operator. In a reduction context, the errorpropagation operator in the multilevel setting is nilpotent, ensuring an (eventually) exact method (Section 3.1). In addition to the nilpotency, an appropriate CFsplitting such that offdiagonal elements of are small additionally ensures effective error reduction at indices that are not “reduced” in a given iteration (Section 3.2). The latter is likely the primary reason for strong convergence, although both are important.
3.1 Nilpotency of error propagation
Lemma 2 and Corollary 2 below are presented as important pieces in motivating AMGir. Corollary 2 is fundamental to a multilevel implementation of AMGir, showing that as long as in ideal restriction and interpolation is approximated with a lowertriangular matrix, the coarsegrid operator retains the same triangular structure as the fine grid. Thus, the same algorithms for forming and are appropriate in a recursive and multilevel fashion. For given ideal restriction and interpolation approximations, coupled with Jacobi relaxation on Fpoints, Lemma 3 shows that the twogrid error propagation operator is nilpotent. Theorem 2 is the primary convergence result of this section, showing that a full multilevel implementation has a nilpotent errorpropagation operator. A trivial corollary is that convergence to the exact solution is guaranteed within iterations. Although bounds of iterations of complexity is far removed from the typical linear or nearlinear complexity of AMG, such a bound is at least of theoretical interest.
Lemma 2
Let be lower triangular and suppose the DOFs of have been partitioned into Cpoints and Fpoints. Then, if is lower triangular, is nilpotent and strictly lower triangular. The same holds for , where .
Proof
Let Cpoints be given by , where denotes the global index of the th Cpoint, and likewise for Fpoints. Furthermore, assume and are in increasing order, that is, , and that global indices of are ordered such that is lower triangular, that is, if . In terms of paths in the graph of , this means that from a given node , there only exist paths to nodes .
For to be strictly lower triangular, we must have for all . This can be shown by considering paths between the th and th Cpoints in the graph of .

Starting at , can only contain paths from to Fpoints .

Then, can only contain paths from to Fpoints .

Finally, maps from Fpoints to Cpoints .
Thus, the only possible paths in the graph of are from points to . It follows that if , or, equivalently, , then .
Corollary 2
Let be lower triangular and and be lower triangular approximations to , and define and . Then, the coarsegrid operator (5) is lower triangular with diagonal given by that of .
Proof
First note that
We know that is lower triangular. It follows from Lemma 2 that is strictly lower triangular. Thus, is lower triangular with the same diagonal as .
Lemma 3 (Nilpotent twogrid AMG)
Let be lower triangular in some ordering and and be lower triangular approximations to , and define and . Then, the twogrid AMG preconditioner with coarsegrid correction and Jacobi Frelaxation is strictly lower triangular, and thus nilpotent.
Proof
Define Cpoints as the set , where is the global index of the th Cpoint, and likewise for Fpoints. Furthermore, let and be in increasing order, that is, , and let global indices of be ordered such that is lower triangular, that is, if . Then, for a given node in , there only exists paths to nodes . It follows that offdiagonal block only maps from to , and similarly for .
The twogrid AMG errorpropagation operator with relaxation scheme looks like . First consider the coarsegrid correction term:
(12) 
Note that is strictly lower triangular, and and are lower triangular. Then, the action of the lowerleft block in (12) can be seen as three steps, mapping , where . A similar result holds for the upperright block, showing that, in the global ordering, the offdiagonal blocks of (12) are strictly lower triangular. For the lowerright block, it follows from Corollary 2 and Lemma 2 that is lower triangular with unit diagonal. Subtracting from the identity gives a strictly lower triangular block. Finally, given that the product of a strictly lower triangular matrix and lower triangular matrix is strictly lower triangular, it follows from Lemma 2 that is strictly lower triangular. Then, the upper left block is lower triangular with unit diagonal.
Thus, the errorpropagation operator for coarsegrid correction is lower triangular with unit diagonal in the FFblock, and strictly lower triangular in other blocks. Furthermore, the errorpropagation matrix for Jacobi relaxation on Fpoints is strictly lower triangular in the FFblock and lower triangular elsewhere. It follows that the AIR twogrid errorpropagation operator given by the product of FJacobi relaxation and coarsegrid correction is a strictly lower triangular and, thus, nilpotent, operator.
Theorem 2 (Nilpotent AMG)
Let be lower triangular in some ordering and and be lower triangular approximations to , and define and . Then, the multilevel AMG preconditioner with coarsegrid correction and Jacobi Frelaxation is strictly lower triangular, and thus nilpotent.
Proof
In proving a nilpotent errorpropagation for a multilevel algorithm, we proceed in a recursive fashion. Consider the errorpropagation of coarsegrid correction with an inexact coarsegrid solve. Error propagation of coarsegrid correction takes the form , for some coarsegrid correction . In the case of an exact coarsegrid solve, and . For an inexact coarsegrid solve, the correction can be written as some perturbation of the exact solve, , where the error in the correction is given by . The error propagation of the inexact coarsegrid solve, that is, the coarsegrid Vcycle, then operates on , that is, , for some initial guess . Assuming a zero initial guess on the coarsegrid problem as is standard in AMG, then
Lemma 3 showed that is strictly lower triangular when coupled with Jacobi Frelaxation. Now, we must show that the error term is also strictly lower triangular. Given that is lower triangular, it is sufficient to show that is strictly lower triangular. Expanding, we get
(13) 
An analogous result to Lemma 3 confirms that if is strictly lower triangular, then so is (13).
Let be the errorpropagation operator for a Vcycle starting on the th level of the hierarchy. On the coarsest level of the hierarchy, say , because the solve is exact. Then, on level , is strictly lower triangular. It follows that the errorpropagation operator on level , , is strictly lowertriangular. On level , is again strictly lower triangular, because is strictly lower triangular and lower triangular. By induction, the errorpropagation operator for a multilevel hierarch with levels is strictly lower triangular and, thus, nilpotent.
For triangular systems, it is easy to see that simple Jacobi relaxation also produces a nilpotent errorpropagation operator. The significance of Lemma 3 and Theorem 2 is in showing analytically that (at least in the spectral sense) coarsegrid correction does not cause divergent behavior, and AMGir is asymptotically robust for triangular systems.
3.2 CFsplitting and
Although nilpotency of errorpropagation is a nice property to have, it does not necessarily imply fast convergence. To understand what else is contributing to good convergence, we further examine the errorpropagation operator. First note that error propagation for Jacobi Frelaxation is given by . Letting for some , then the following identities hold, which we will need later:
(14) 
For ease of notation, let and . Then, using (14) and a Woodbury inverse expansion for , we get
(15) 
Plugging (15) and the identity into the coarsegrid correction (12) shows that error propagation for coarsegrid correction can be written as
Define , , and . Then, if we consider the highestorder term of , error propagation takes the form
(16) 
The key thing to notice in (16) is that each error update on Cpoints is hit with . Note that iterations of FJacobi relaxation has error propagation
(17) 
Then, coupling coarsegrid correction (16) with iterations of pre or postFrelaxation scales the error at all points by .
Such a scaling of error by is a complementary contribution to strong convergence factors achieved by AMGir. Ensuring that is not only nilpotent, but also has very small elements, is key to achieving a fast reduction in error. Here, we focus on matrices that result from the discretization of differential operators, in which case offdiagonal entries are typically expected to be smaller in magnitude than the diagonal. In fact, the traditional AMG motivation for a CFsplitting, picking Fpoints such that is wellconditioned, is also appropriate for AMGir. A classical AMG coarsening approach targets to be strictly diagonally dominant, which is equivalent to minimizing the size of elements in . Figure 1 demonstrates that for a model problem introduced in Section 5, has very small row sums, although very few rows of are diagonal (in which case the row of is empty). Similar results hold on all levels of the hierarchy, as well as for other discretizations and finite element orders.
Small row sums of reduce error efficiently for most rows. For rows without small row sums in , we rely on the redistribution of error through operators , and , as well as the nilpotency of error propagation to reduce error. Specifically, larger row sums in typically occur near the boundary of the domain. Near the boundary, a nilpotent errorpropagation operator reaches the exact solution in iterations.
3.3 Schurcomplement preconditioning
This work focuses on a traditional AMG framework. However, preconditioners are often developed by approximating the block decomposition of in (10). An analogous result to Theorem 2 holds for the block preconditioner
(18) 
where , and are lowertriangular approximations to and some approximation to the Schur complement (not necessarily ), that is, the multilevel errorpropagation operator, , is nilpotent. Letting be an order truncated Neumann expansion as in (11) and gives the relation , where . Some linear algebra then shows that
(19) 
Note that here we have only assumed to be nonsingular. An interesting result of (19) is that the error after iterations only depends on previous error at Fpoints, that is, at each iteration, Cpoint error is eliminated and then acquired based on error at Fpoints. In the case of lower triangular , (18) can act as a preconditioner with very similar properties to AMGir. Although we do not numerically explore preconditioning with here, its theoretical similarity to AMGir is worth demonstrating for future work.
Remark 1
An infinite Neumann expansion (11) can be used to approximate for general nontriangular matrices, and similar algorithms developed (16) and (19). However, in the case of triangular matrices, an order Neumann approximation to is exact on the diagonal and first subdiagonals. For general matrices, such accuracy cannot be guaranteed. Even in the case of symmetric positive definite matrices, where offdiagonal entries decay exponentially fast for wellconditioned [12], a truncated Neumann expansion does not necessarily provide a good approximation to (or its diagonal for that matter). The application of AMGir to matrices with nontriangular components is ongoing work.
4 Algorithm
Algorithmically, AMGir takes on the traditional structure of an AMG method. Building on the theoretical motivation provided in Section