# Aggregation-based Multilevel Methods for Lattice QCD

## Abstract

In Lattice QCD computations a substantial amount of work is spent in solving the Dirac equation. In the recent past it has been observed that conventional Krylov solvers tend to critically slow down for large lattices and small quark masses. We present a Schwarz alternating procedure (SAP) multilevel method as a solver for the Clover improved Wilson discretization of the Dirac equation. This approach combines two components (SAP and algebraic multigrid) that have separately been used in lattice QCD before. In combination with a bootstrap setup procedure we show that considerable speed-up over conventional Krylov subspace methods for realistic configurations can be achieved.

Multilevel Methods
\FullConference The XXIX International Symposium on Lattice Field Theory - Lattice 2011

July 10-16, 2011

Squaw Valley, Lake Tahoe, California

## 1 Introduction

The most costly task in lattice QCD computations is the solution of large sparse linear systems of equations

(1.1) |

where is a discretization of the Dirac operator. Here we consider the Wilson discretization which couples only nearest neighbors and depends on a gauge field and a mass parameter . Usually is calculated by a Krylov subspace method (e.g. CGN, GCR, BiCGStab). Those methods suffer from critical slowing down when approaching the critical mass as well as lattice spacing . Thus it is of utmost importance to develop preconditioners for these methods that remedy these scaling problems.

In the recent past preconditioners based on domain decomposition (DD) for the solution of (1.1) have been proposed in [1]. Although DD methods excel in supercomputing environments due to their high inherent parallelism they are unable to remedy the scaling problems completely unless they are combined with a multilevel approach. Thus we combine the DD approach with an algebraic multigrid hierarchy based on a bootstrap aggregation framework [2, 3]. Our approach is similar in construction to the one introduced in [4, 5, 6] where it has been shown that using such algebraic multigrid approaches can remedy the scaling problems in QCD computations. As in [4, 5, 6] we obtain a multilevel hierarchy using non-smoothed aggregation. The difference is that we replace the multigrid smoother by a DD approach, expecting a gain in efficiency on highly parallel machines.

## 2 Domain Decomposition

Domain decomposition methods were developed as iterative solvers for linear systems arising from discretizations of PDEs. The main idea consists of solving the system on the whole domain by repeatedly solving smaller systems with less degrees of freedom on smaller subdomains.

Consider a block decomposition of a lattice (Figure 2 illustrates a 2D example). The corresponding trivial embeddings and block solvers are denoted by and . Note that the trivial embedding is just the restriction of the identity on to . Then one iteration of a domain decomposition method consists of solving each of the block systems , interleaved with a number of residual updates . In the two extreme cases where one does only one residual update before solving all block systems or when the residual is updated after each block solution, the corresponding error propagators are given by

(2.1) |

These methods go back to H. Schwarz [7] and thus are called additive Schwarz and multiplicative Schwarz method, respectively. The block systems of the additive variant can be solved simultaneously while the multiplicative variant is inherently sequential. Its advantage is that it spreads the information faster on the lattice as a solution of a block system uses previous solutions of other block systems.

The two methods can be combined to exploit advantages of both methods. Coloring the blocks such that no adjacent blocks have the same color, a residual update on one block no longer influences the residual on blocks of the same color. Thus it suffices to perform the update once for each color. All blocks of the same color can then be computed simultaneously as described in Algorithm 2.1. Such a DD approach has been applied to solve (1.1) in [1], where it has been named Schwarz Alternating Procedure (SAP).

Typically the solution of the block system is approximated by a few iterations of a Krylov subspace method (e.g. GMRES), and the DD method itself is in turn used as a preconditioner for a (flexible) Krylov subspace method. As illustrated in Figure 2 we observe that SAP is able to reduce error components belonging to a large part of the spectrum very well but a small part belonging to eigenvectors (EVs) to small eigenvalues (EWs) remains intractable. For larger configurations the number of EWs with small magnitude of the Dirac operator gets larger, which yields an explanation why SAP is not able to remedy the scaling problem as the number of intractable eigenvectors increases as well. Though, the seen behavior of damping large EVs is desirable for an iterative method to be used as a smoother in a multigrid method and motivated us to use it in this context.

## 3 Algebraic Multigrid

A multigrid method typically consists of a simple iterative method called smoother and complementary coarse grid correction. As motivated in section 2 we deem SAP suitable for the use as a smoother since it is cheap to compute and reduces the error efficiently on a large part of the spectrum. The main idea of multigrid is to treat the error that is left after a few iterations of the smoother within a smaller subspace where the troublesome error components can be approximated. More precisely we want to define and solve a “coarse” linear system

(3.1) |

with a much smaller operator in order to reduce the error on the critical part of (1.1). To this purpose we have to define linear maps to restrict information based on the current residual to the subspace and a linear map to interpolate the information that we obtain from solving (3.1) back to where (1.1) is given. This yields a subspace correction

(3.2) |

with the corresponding error propagator . As should resemble the action of D on the troublesome subspace approximated by , the action of is chosen as the action of on interpolated vectors which are restricted afterwards. Formally this amounts to a Petrov-Galerkin formulation of the coarse operator as . With this choice of the error propagator of the subspace correction is given by . In order to benefit from such a subspace correction, solving (3.1) has to be much cheaper than solving (1.1). That is should be small compared to , and should be sparse. As the dimension of the troublesome subspace grows with (cf. [8]) we do not want to fix (like in deflation methods) but want to find a sparse description of on that subspace.

Once is found a basic two level algorithm consists of the alternating application of smoother and subspace correction. This procedure can be recursively extended by formulating a two level algorithm of this kind for the computation of (3.1) until we get an operator which is small enough to solve (3.1) directly.

Aggregation Based Interpolation: We decided to adjust an aggregation based interpolation, that in turn yields the subspace correction, as the complementary component to the SAP smoother. Due to the fact that the coarse grid correction in (3.2) only acts on error components in , it should approximate the subspace spanned by eigenvectors to eigenvalues of small magnitude of (cf. Figure 2). In the multigrid literature, is built from right, and from left EVs corresponding to EWs of small magnitude of . Due to the spectral properties of the Wilson Dirac operator it is natural to choose . Furthermore, with additional assumptions on its structure we can choose according to [5]. Therefore we define the aggregates in such a way that there exists such that . Note, that with these assumptions on the error propagator of the subspace correction (3.2) is given by .

With this structure of in mind, we define its entries based on a set of test vectors whose span approximates the troublesome subspace and a set of aggregates .The aggregates can be realized as another block decomposition of the lattice. Note that the DD smoother and the interpolation do not have to share a common block decomposition.

The interpolation is then given by decomposing the test vectors over the aggregates (cf. Figure 3). Hence is a linear map from the coarse grid to the fine grid, defined by

where is the -th unit vector. In order to have , the test vectors are locally orthonormalized over the aggregates. Note that due to the aggregation structure of and the sparsity/connection structure of resembles the one of , i.e., the corresponding graphs of the operators are regular four dimensional grids. Thus we can apply (3.2) recursively to (3.1) and obtain a hierarchy of coarser grids and coarser operators. This construction of is similar to constructions found in [4, 5, 6, 8]. More precisely, the structure of the interpolation operators is identical but the test vectors used to build them and thus the actions of the operators are different.

Bootstrap Setup: A critical part of the construction of an efficient multigrid hierarchy is the computation of the test vectors used in the definition of . The setup procedure we employ for this task is divided into two parts. We start with a set of random test vectors and apply a small number of smoother iterations to them. During this procedure the smoothed test vectors are kept orthonormal. After that a temporary operator on the next coarser grid is computed according to the aggregation based construction. This procedure is continued recursively on the coarser grids until we get an operator on the coarsest grid. Herein we restrict smoothed test vectors with to the next coarser grid. In what follows we omit the level indices for the sake of simplicity.

Since the subspace should contain a significant part of the eigenvectors to eigenvalues of small magnitude of the fine grid space , the EWs of small magnitude of should approximate those of . For the corresponding EVs the relations and imply . Thus the second part of our setup procedure starts with the calculation of approximations to the smallest EVs and EWs of by means of harmonic Ritz vectors and values. The approximate EVs for are successively interpolated to the next finer grid and smoothed towards their harmonic Ritz values with some steps of SAP with iteration matrix . In this case the local inverses for the pair are given by

The resulting set of vectors and the old test vectors are again reduced to vectors. In order to preserve the most significant information, the smallest singular values and their corresponding vectors of the matrix are calculated and the corresponding linear combinations of the vectors of are the final vectors . They define the interpolation from the next coarser grid to the current grid and the operator on the next coarser grid. The second part of the setup executes this procedure exactly once, starting on the coarsest grid.

## 4 Results

Our adaptive DD multilevel solver (MG-DD) has been implemented in C using the parallelization interface of MPI. Krylov subspace methods have been implemented within a common framework for a fair comparison. For the numerical experiments we have combined our multigrid solver against CGN, i.e. CG on the normal equation. All results were produced with a two level method. The stopping criterion was to reduce the initial residual norm by a factor of at least . The solutions of the block systems within SAP were approximated by iterations of GMRES and the coarse grid equation (3.1) was approximately solved with iterations of GMRES. For the outer flexible GMRES routine we chose a restart length of . All results have been computed on Juropa at the Jülich Supercomputing Centre.

FGMRES | |||

+ MG-DD | CGN | ||

setup | timings | 45.22s | - |

solve | iterations | 9 | 4476 |

timings | 6.97s | 121.82s | |

total | timings | 52.19s | 121.82s |

Table 1: Clover Wilson-Dirac, , (), , generated using public code with parameters from L. Del Debbio [9], block-size , coarsening , cores (Juropa at JSC). FGMRES + MG-DD CGN setup timings 24.85s - solve iterations 17 9181 timings 5.05s 119.04s total timings 29.90s 119.04s

Table 2: Wilson-Dirac 2HEX smeared tree level improved Clover, , (), , provided by the BMW collaboration [10, 11], block-size , coarsening , cores (Juropa at JSC).

In the cases shown our multigrid solver was and times faster than CGN. Including the setup time we are still twice and four times as fast as CGN. Since the setup has to be done only once the benefits of our approach are larger the more right hand sides there are to be solved.

## 5 Summary and Outlook

The developed method combining DD techniques and algebraic multigrid shows great potential to speed-up calculations of propagators in lattice QCD. Even for single right hand sides our method outperforms conventional Krylov subspace methods with the potential of an even more significant speed-up when solving for many right hand sides. This result is mainly due to the introduction of the highly parallel DD smoother and the bootstrap setup into the algebraic multigrid method. While the first speeds up the setup of the method and the subsequent solution the latter significantly speeds up the setup of the multigrid hierarchy, which in general is a bottleneck in algebraic multigrid methods especially compared to setup-free Krylov subspace methods. We are currently working on an optimized version of the code that should be able to run on large-scale parallel machines and extend our testing of the method towards larger lattices and lighter quark masses. In the near future we plan to incorporate our algorithm into the production codes of our collaborators within SFB TR55.

Acknowledgments: This work is funded by Deutsche Forschungsgemeinschaft (DFG) Transregional Collaborative Research Centre 55 (SFB TR55).

### References

- M. Luscher, “Solution of the Dirac equation in lattice QCD using a domain decomposition method,” Comput.Phys.Commun. 156 (2004) 209–220, arXiv:hep-lat/0310048 [hep-lat].
- A. Brandt, J. Brannick, K. Kahl, and I. Livshits, “Bootstrap AMG.,” SIAM J. Sci. Comput. 33 no. 2, (2011) 612–632.
- M. Brezina, R. Falgout, S. MacLachlan, T. Manteuffel, S. McCormick, and J. Ruge, “Adaptive smoothed aggregation (SA) multigrid,” SIAM Review 47, No. 2 (2005) 317–346.
- J. Osborn, R. Babich, J. Brannick, R. Brower, M. Clark, et al., “Multigrid solver for clover fermions,” PoS LATTICE2010 (2010) 037, arXiv:1011.2775 [hep-lat].
- R. Babich, J. Brannick, R. Brower, M. Clark, T. Manteuffel, et al., “Adaptive multigrid algorithm for the lattice Wilson-Dirac operator,” Phys.Rev.Lett. 105 (2010) 201602, arXiv:1005.3043 [hep-lat].
- J. Brannick, R. Brower, M. Clark, J. Osborn, and C. Rebbi, “Adaptive Multigrid Algorithm for Lattice QCD,” Phys.Rev.Lett. 100 (2008) 041601, arXiv:0707.4018 [hep-lat].
- H. Schwarz, “Gesammelte mathematische Abhandlungen,” Vierteljahrschrift Naturforsch. Ges. Zürich (1870) 272–286.
- M. Luscher, “Local coherence and deflation of the low quark modes in lattice QCD,” JHEP 0707 (2007) 081, arXiv:0706.2298 [hep-lat].
- L. Del Debbio, L. Giusti, M. Lüscher, R. Petronzio, and N. Tantalo, “QCD with light Wilson quarks on fine lattices (I): First experiences and physics results,” JHEP 0702 (2007) 056, arXiv:hep-lat/0610059 [hep-lat].
- S. Durr, Z. Fodor, C. Hoelbling, S. Katz, S. Krieg, et al., “Lattice QCD at the physical point: light quark masses,” Phys.Lett. B701 (2011) 265–268, arXiv:1011.2403 [hep-lat].
- S. Durr, Z. Fodor, C. Hoelbling, S. Katz, S. Krieg, et al., “Lattice QCD at the physical point: Simulation and analysis details,” JHEP 1108 (2011) 148, arXiv:1011.2711 [hep-lat].