An iterative method to compute the sign function of a nonHermitian matrix and its application to the overlap Dirac operator at nonzero chemical potential
Abstract
The overlap Dirac operator in lattice QCD requires the computation of the sign function of a matrix. While this matrix is usually Hermitian, it becomes nonHermitian in the presence of a quark chemical potential. We show how the action of the sign function of a nonHermitian matrix on an arbitrary vector can be computed efficiently on large lattices by an iterative method. A Krylov subspace approximation based on the Arnoldi algorithm is described for the evaluation of a generic matrix function. The efficiency of the method is spoiled when the matrix has eigenvalues close to a function discontinuity. This is cured by adding a small number of critical eigenvectors to the Krylov subspace, for which we propose two different deflation schemes. The ensuing modified Arnoldi method is then applied to the sign function, which has a discontinuity along the imaginary axis. The numerical results clearly show the improved efficiency of the method. Our modification is particularly effective when the action of the sign function of the same matrix has to be computed many times on different vectors, e.g., if the overlap Dirac operator is inverted using an iterative method.
keywords:
overlap Dirac operator, quark chemical potential, sign function, nonHermitian matrix, iterative methodsPacs:
02.60.Dc, 11.15.Ha, 12.38Gc, , , and
1 Introduction
The only systematic nonperturbative approach to quantum chromodynamics (QCD) is the numerical simulation of the theory on a finite spacetime lattice. For a long time, the implementation of chiral symmetry on the lattice posed serious problems [NN], but these problems have recently been solved in a number of complementary ways. Perhaps the most prominent solution is the overlap Dirac operator [overlap] which provides an exact solution of the GinspargWilson relation [Ginsparg:1981bj]. However, the price one has to pay for this solution is the numerical computation of the sign function of a sparse matrix of dimension . Here and in the following, computing some function of a matrix is a shorthand for computing , where , i.e., determining the action of on the vector . Typically is Hermitian, and efficient methods to compute its sign function have been developed for this case [Neuberger:1998my, vandenEshof:2002ms].
The phase diagram of QCD is currently being explored experimentally in relativistic heavy ion collisions and theoretically in lattice simulations and model calculations [Misha06]. To describe QCD at nonzero density, a quark chemical potential is introduced in the QCD Lagrangian. If this chemical potential is implemented in the overlap operator [Bloch:2006cd], the matrix loses its Hermiticity, and one is faced with the problem of computing the sign function of a nonHermitian sparse matrix. On a small lattice, this can be done by performing a full diagonalization and using the spectral matrix function definition (see Eq. (4) below), but on larger lattices one needs to resort to iterative methods to keep the computation time and memory requirements under control.
The purpose of this paper is to introduce such an iterative method. In the next section we describe the nonHermitian problem in more detail and briefly discuss the sign function for nonHermitian matrices. In Sec. 4 we propose an Arnoldibased method to make a Krylov subspace approximation of a generic matrix function. The efficiency of this method is poor when computing the sign function of a matrix having eigenvalues with small absolute real parts. This is caused by the discontinuity of the sign function along the imaginary axis. In Sec. 5 we enhance the Arnoldi method by taking into account exact information about these critical eigenvalues. We use the method to compute the sign function occurring in the overlap Dirac operator of lattice QCD, and in Sec. 6 we discuss the results obtained for two different lattice sizes.
2 The overlap operator and the sign function
The overlap formulation of the Dirac operator is a rigorous method to preserve chiral symmetry at finite lattice spacing in a vectorlike gauge theory. Its construction is based on successive developments described in seminal papers by Kaplan, Shamir, Furman, Narayanan and Neuberger [Kaplan:1992bt, Shamir:1993zy, Furman:1994ky, overlap]. In the presence of a nonzero quark chemical potential , the massless overlap Dirac operator is given by [Bloch:2006cd]
(1) 
where stands for the sign function, , is the WilsonDirac operator at nonzero chemical potential [Hasenfratz, Kogut:1983ia] with negative Wilson mass , , and with are the Dirac gamma matrices in Euclidean space. The WilsonDirac operator is a discretized version of the continuum Dirac operator that avoids the replication of fermion species which occurs when a naive discretization of the derivative operator is used. It is given by
(2)  
where and is the matrix associated with the link connecting the lattice site to . The exponential factors are responsible for the nonHermiticity of the operator. The quark field at each lattice site corresponds to 12 variables: 3 color components 4 Dirac spinor components.
For the argument of the sign function becomes nonHermitian, and we need to define the sign function for this case. Consider first a given matrix with no particular symmetry properties and a function . Let be a collection of closed contours in such that is analytic inside and on and such that encloses the spectrum of . Then the function of the matrix can be defined by [Dunford]
(3) 
where the integral is defined componentwise and denotes the identity matrix.
From this definition it is easy to derive a spectral function definition, even if the matrix is nonHermitian. If the matrix is diagonalizable, i.e., with a diagonal eigenvalue matrix and , then
(4)  
with  
(5) 
If cannot be diagonalized, a more general spectral definition of can be derived from Eq. (3) using the Jordan decomposition with Jordan blocks
(6) 
Then,
(7) 
where
(8) 
with the size of the Jordan block, see [Golub]. The superscripts denote derivatives of the function with respect to its argument.
NonHermitian matrices typically have complex eigenvalues, and applying Eq. (4) or (7) to the sign function in Eq. (1) requires the evaluation of the sign of a complex number. The sign function needs to satisfy and, for real , if . These properties are satisfied if one defines
(9) 
where in the last equality the cut of the square root is chosen along the negative real axis. Using the definition (9) in the spectral definition (7) yields a matrix sign function in agreement with that used in [Denman76, Rob80, Kenney91, Hig94]. Indeed, based on the general Jordan decomposition
(10) 
where represents the Jordan blocks corresponding to the eigenvalues with positive real part and those corresponding to the eigenvalues with negative real part, the spectral definition (7) for the sign function becomes
(11) 
see also [HoJo]. This definition agrees with the result one obtains when deriving Eq. (1) from the domainwall fermion formalism at in the limit in which the extent of the fifth dimension goes to infinity [Bloch:2007xi].
For any square matrix we have , and a short calculation [Bloch:2006cd] shows that for this reason the overlap operator as defined in Eq. (1) satisfies the GinspargWilson relation
(12) 
If is Hermitian, the polar factor of coincides with , and this fact has been used successfully to develop efficient iterative methods for computing the action of the matrix sign function on a vector [Borici:lanczos]. However, if is nonHermitian, then in general and . Thus, for , replacing by in the definition of the overlap operator in Eq. (1) not only changes the operator but also violates the GinspargWilson relation, as can also be seen in numerical experiments. We conclude that the definition given in Eq. (1) is the correct formulation of the overlap operator for .
3 Direct and iterative methods
A numerical implementation of the sign function using the spectral definition (4) is only possible for small matrices, as a full diagonalization becomes too expensive as the matrix grows. As an alternative, matrixbased iterative algorithms for the computation of the matrix sign function have been around for many years [Denman76, Kenney91, Kenney95, Higham97]. Although these algorithms are much faster than the direct implementation of the spectral definition for mediumsized problems, they still require the storage and manipulation (i.e., inversions and/or multiplications) of the entire matrix. This is feasible for mediumsized matrices, but becomes too expensive for the very large matrices occurring in typical lattice QCD simulations. E.g., even for an lattice, which is the minimum lattice size required for a physically relevant problem, the matrix dimension is already . Even though these QCD matrices are sparse, the iterative procedure computing the sign function fills the matrix as the iterations proceed, and the method eventually becomes prohibitively expensive.
Therefore, another iterative method is required which does not produce an approximation to the full sign matrix itself, but rather produces an approximation to the vector , i.e., to the operation of the sign matrix on an arbitrary vector. Many QCD applications only require the knowledge of this product for a number of selected source vectors . For instance, some lowenergy properties of QCD can be described by the lowestlying eigenvalues of the Dirac operator. These eigenvalues can efficiently be found by an iterative eigenvalue solver like ARPACK [arpack], which only requires the computation of matrixvector multiplications. Analogously, the computation of the propagation of fermions can be well approximated by inverting the Dirac operator on a selected number of source vectors , i.e., the solution of the systems . These inversions are also performed using iterative linear solvers requiring only matrixvector multiplications.
Such iterative methods, mostly from the class of Krylov subspace methods, are already extensively used for the solution of eigenvalue problems, linear systems, and for function evaluations [Vor88, Drus98] with Hermitian matrices. There, the ancestor of all methods is the Lanczos method, of which many variants and improvements have been built over the years. The Lanczos method makes use of short recurrences to build an orthonormal basis in the Krylov subspace.
Krylov subspace methods are also used for nonHermitian matrices in the context of eigenvalue problems (a popular example being the restarted Arnoldi method of ARPACK), for the solution of linear systems, and even for the evaluation of the exponential function [gallopoulos89parallel, hochbruck]. The two most widely used methods are the Arnoldi method and the twosided Lanczos method. In contrast to the Hermitian case, the Arnoldi method requires long recurrences to construct an orthonormal basis for the Krylov subspace, while the twosided Lanczos method uses two short recurrence relations, but at the cost of losing orthogonality.
In the next section we present an Arnoldibased method to compute a generic function of a nonHermitian matrix. The application of the twosided Lanczos method to this problem will be investigated in a separate publication.
4 Arnoldi function approximation for a nonHermitian matrix
From the spectral definition (4) it follows that is identical to some polynomial of degree . Indeed, the unique interpolation polynomial , which interpolates at the different eigenvalues of , satisfies , as follows immediately from Eq. (4). If has nontrivial Jordan blocks, this is still true, but interpolation is now in the Hermite sense, where at each eigenvalue interpolates and its derivatives up to order one less than the size of the largest corresponding Jordan block, see [HoJo].
Hence, for an arbitrary vector ,
(13) 
The idea is to construct an approximation to using a polynomial of degree with . One possibility is to construct a good polynomial approximation such that . This would yield a single polynomial approximation operating on any vector to compute . One such example is the expansion in terms of Chebyshev polynomials up to degree . Although Chebyshev polynomials are very close to the minimax polynomial for a function approximation over an appropriate ellipse in the complex plane, one can do better for matrix function approximations. First of all, one only needs a good approximation to at the eigenvalues of , and secondly, one can use information about the source vector to improve the polynomial approximation. The vector can be decomposed using the (generalized) eigenvectors of as a complete basis, and clearly some eigendirections will be more relevant than others in the function approximation. Using a fixed interpolation polynomial does not use any information about the vector . Furthermore, the only feature of usually taken into account by such polynomial approximations is the extent of its spectrum.
Indeed, there exists a best polynomial approximation of degree at most , which is readily defined as the orthogonal projection of on the Krylov subspace . If is an matrix whose columns form an orthonormal basis in , then is a projector on the Krylov space, and the projection is .
The operation of any polynomial of of degree smaller than on is also an element of , and the projection corresponds to the polynomial which minimizes , as . Clearly, this approximation explicitly takes into account information about and .
The problem, however, is that to find this best polynomial approximation of degree one already has to know the answer . Therefore, we need a method to approximate the projected vector . This can be done by one of the Krylov subspace methods mentioned in Sec. 3. Here, we use the Arnoldi algorithm to construct an orthonormal basis for the Krylov subspace using the long recurrence
(14) 
where , , is an upper Hessenberg matrix (upper triangular + one subdiagonal), , and is the th basis vector in . The projection of on can be written as
(15) 
Making use of , Eq. (15) becomes
(16) 
From Eq. (14) it is easy to see that
(17) 
as and . Therefore it seems natural to introduce the approximation [gallopoulos89parallel]
(18) 
in Eq. (16), which finally yields
(19) 
This expression is just a linear combination of the Arnoldi vectors , where the coefficients are given by times the first column of . Saad [saad:209] showed that the approximation (19) corresponds to replacing the polynomial interpolating at the eigenvalues of by the lowerdegree polynomial which interpolates at the eigenvalues of , which are also called the Ritz values of in the Krylov space. The hope is that for not too large the approximation (19) will be a suitable approximation for . The approximation of by Eq. (19) replaces the computation of by that of , where is of much smaller size than . The evaluation of (the first column of) can be implemented using a spectral decomposition, or another suitable evaluation method [Denman76, Kenney91, Kenney95, Higham97].
The long recurrences of the Arnoldi method make the method much slower than the Lanczos method used for Hermitian matrices. Nevertheless, our first results showed consistent convergence properties when computing the matrix sign function: as the size of the Krylov space increases, the method converges to within machine accuracy. More precisely, for the sign function the method shows a seesaw profile corresponding to evenodd numbers of Krylov vectors, as was previously noticed for the Hermitian case as well [vandenEshof:2002ms]. This evenodd pattern is related to the sign function being odd in its argument. The seesaw effect is completely removed when using the Harmonic Ritz values as described in Ref. [vandenEshof:2002ms] for Hermitian matrices and extended to nonHermitian matrices in Ref. [hochbruck], and convergence becomes smooth. Alternatively, one can just as well restrict the calculations to evensized Krylov subspaces.
Unfortunately, in the case of the sign function the Arnoldi method described above has a very poor efficiency when some of the eigenvalues are small. In the case considered in Fig. 3 (see Sec. 6 below), the size of the Krylov space has to be taken very large () to reach accuracies of the order of (see the curve in the top pane of Fig. 3). A discussion and resolution of this problem are given in the next section.
5 Deflation
5.1 Introduction
For Hermitian matrices, it is well known that the computation of the sign function can be improved by deflating the eigenvalues smallest in absolute value [vandenEshof:2002ms]. The reason why this is crucial and specific to the sign function is the discontinuity of the sign function at zero. For nonHermitian matrices, the situation is analogous since the sign function now has a discontinuity along the imaginary axis. A necessary condition for the method described in Sec. 4 to be efficient is the ability to approximate well at the eigenvalues of by a loworder polynomial. If the gap between the eigenvalues of to the left and to the right of the imaginary axis is small, no loworder polynomial will exist which will be accurate enough for all eigenvalues.
The idea is to resolve this problem by treating these critical eigenvalues exactly, and performing the Krylov subspace approximation on a deflated space.
In the Hermitian case, deflation is straightforward. The function of a Hermitian matrix with eigenvalues and orthonormal eigenvectors () can be written as
(20) 
and its operation on an arbitrary vector as
(21) 
If critical eigenvalues of and their corresponding eigenvectors have been computed, one can split the vector space into two orthogonal subspaces and write an arbitrary vector as , where and . Eq. (21) can then be rewritten as
(22) 
The first term on the righthand side of Eq. (22) can be computed exactly, and the second term can be approximated by applying the Arnoldi method of Sec. 4 to . As the vector does not contain any contribution in the eigenvector directions corresponding to the critical eigenvalues, the polynomial approximation no longer needs to interpolate at the eigenvalues closest to the function discontinuity to approximate well. Therefore, after deflation, lowerdegree polynomials will yield the required accuracy, and a smallersized Krylov subspace can be used in the approximation. In theory, the orthonormality of the eigenvectors guarantees that the Krylov subspace will be perpendicular to , but in practice numerical inaccuracies could require us to reorthogonalize the subspaces during the construction of the Krylov subspace.
For nonHermitian matrices the (generalized) eigenvectors are no longer orthonormal, and it is not immediately clear how to deflate a critical subspace. The matrix functions as defined in (4) or (7) involve the inverse of the matrix of basis vectors, , and no simple decomposition into orthogonal subspaces can be performed.
In the remainder of this section, we will develop two alternative deflation schemes for the nonHermitian case, using a composite subspace generated by adding a small number of critical eigenvectors to the Krylov subspace. This idea of an augmented Krylov subspace method has been used in the iterative solution of linear systems for some time, see, e.g., Ref. [Saad97]. Since in computational practice one will never encounter nontrivial Jordan blocks, we assume in the following, for simplicity, that the matrix is diagonalizable.
5.2 Schur deflation
We construct the subspace , which is the sum of the subspace spanned by the eigenvectors corresponding to critical eigenvalues of and the Krylov subspace . The aim is to make an approximation similar to that of Eq. (19), but to treat the contribution of the critical eigenvalues to the sign function explicitly so that the size of the Krylov subspace can be kept small.
Assume that critical eigenvalues and right eigenvectors of are determined using an iterative eigenvalue solver like the one implemented in ARPACK. From this one can easily construct Schur vectors and the corresponding upper triangular matrix satisfying
(23) 
where is the matrix formed by the orthonormal Schur vectors and the diagonal elements of are the eigenvalues corresponding to the Schur vectors. These Schur vectors form an orthonormal basis of the eigenvector subspace , which is invariant under operation of .
After constructing the dimensional subspace we run a modified Arnoldi method to construct an orthogonal basis of the composite subspace . That is, each Arnoldi vector is orthogonalized not only against the previous ones, but also against the Schur vectors . In analogy to (14), this process can be summarized as
(24) 
Here, the columns of form an orthonormal basis of the space , which is the projection of the Krylov subspace onto the orthogonal complement of . In particular, , where is the projection of onto and . Again, is an upper Hessenberg matrix.
Note that the orthogonality of with respect to has to be enforced explicitly during the Arnoldi iterations, as the operation of on a vector in the projected Krylov subspace in general gets a contribution belonging to , i.e., is not invariant under the operation of . This is a consequence of the nonorthogonality of the eigenvectors of .
Defining
(25) 
satisfies a relation similar to Eq. (17), namely
(26) 
and the function approximation derived in Sec. 4 can be used here as well. We briefly repeat the steps of Sec. 4. The operation of the matrix function on the vector can be approximated by its projection on the composite subspace,
(27) 
and because lies in the subspace,
(28) 
As satisfies Eq. (26), we can introduce the same approximation as in Eq. (18),
(29) 
and substituting Eq. (29) in Eq. (28) we construct the function approximation
(30) 
Because of the block structure (25) of the composite Hessenberg matrix , the matrix can be written as
(31) 
The upper left corner is the function of the triangular Schur matrix (which is again triangular), and the lower right corner is the (dense) matrix function of the Arnoldi Hessenberg matrix . The upper right corner reflects the coupling between both subspaces and is given by the solution of the Sylvester equation
(32) 
with , which follows from the identity . Combining (30) and (31), we obtain
(33)  
Note that . Therefore only and the first column of and , i.e., the first columns of , are needed to evaluate (33). This information can be computed using the spectral definition (4) or some other suitable method [Denman76, Kenney91, Kenney95, Higham97]. In the case of the sign function we chose to use Roberts’ iterative method [Rob80]
(34) 
with , which converges quadratically to . Roberts’ method is applied to compute and . The matrix is computed by solving the Sylvester equation (32) using the method described in Appendix A.
In the implementation one has to be careful to compute the deflated eigenvectors to high enough accuracy, as this will limit the overall accuracy of the function approximation. When computing for several , the partial Schur decomposition (23) and the triangular matrix need to be computed only once. Only the modified Arnoldi method must be repeated for each new vector .
We summarize our algorithm for approximating :

Determine the eigenvectors for critical eigenvalues of using ARPACK. Construct and store the corresponding Schur matrix and the upper triangular matrix . The columns of form an orthonormal basis of a subspace .

Compute the triangular matrix using (34).

Compute .

Construct an orthonormal basis for the projected Krylov subspace using a modified Arnoldi method. The basis is constructed iteratively by orthogonalizing each new Krylov vector with respect to and to all previous Arnoldi vectors, and is stored as columns of a matrix . Also build the upper Hessenberg matrix with .

Compute the approximation using formula (33).
If has to be computed for several , only steps (iii)(vi) need to be repeated for each vector .
5.3 LRdeflation
An alternative deflation in the same composite subspace can be constructed using both the left and right eigenvectors corresponding to the critical eigenvalues. This deflation algorithm is a natural extension of the method described in Sec. 5.1 from the Hermitian to the nonHermitian case. A similar idea has been used for the iterative solution of linear systems [MoZh04, Hasenfratz:2005tt].
Assume that critical eigenvalues of have been computed together with their corresponding left and right eigenvectors by some iterative method like the one provided by ARPACK. The right eigenvectors satisfy
(35) 
with the diagonal eigenvalue matrix for the critical eigenvalues and the matrix of right eigenvectors (stored as columns). Similarly, the left eigenvectors obey
(36) 
where is the matrix containing the left eigenvectors (also stored as columns). For a nonHermitian matrix, the left and right eigenvectors corresponding to different eigenvalues are orthogonal (for degenerate eigenvalues linear combinations of the eigenvectors can be formed such that this orthogonality property remains valid in general). Furthermore, if the eigenvectors are normalized such that , then clearly , and is an oblique projector on the subspace spanned by the right eigenvectors.
Let us now decompose as
(37) 
where is an oblique projection of on and .
Applying to the decomposition (37) yields
(38) 
The first term on the righthand side can be evaluated exactly using
(39) 
which follows from Eq. (35), while the second term can be approximated by applying the Arnoldi method described in Sec. 4 to the vector . An orthonormal basis is constructed in the Krylov subspace using the recurrence
(40) 
where and . By construction, has no components along the critical (right) eigendirections, and successive operations of will yield no contributions along these directions either, hence does not mix with . In principle, numerical inaccuracies accumulated during the Arnoldi iterations might make it necessary to occasionally reextract spurious components along the critical eigendirections. However, this turned out not to be necessary in our numerical calculations.
Applying the Arnoldi approximation (19) to Eq. (38) yields the final approximation
(41) 
Note that again only the first column of is needed to evaluate Eq. (41). As before, has to be computed using some suitable method. The function can be efficiently computed using Roberts’ algorithm (34).
We summarize our algorithm for approximating in the LRdeflation scheme:

Determine the left and right eigenvectors for critical eigenvalues of using ARPACK. Store the corresponding eigenvector matrices and .

Compute () for the critical eigenvalues.

Compute .

Construct an orthonormal basis for the Krylov subspace using the Arnoldi recurrence. The basis is constructed iteratively by orthogonalizing each new Krylov vector with respect to all previous Arnoldi vectors, and is stored as columns of a matrix . Also build the upper Hessenberg matrix .

Compute (the first column of) using Roberts’ iterative method (34).

Compute the approximation to using Eq. (41).
If has to be computed for several , only steps (iii)(vi) need to be repeated for each vector .
5.4 Discussion
We briefly compare both deflation schemes. Although both schemes use the same composite subspace , they yield different function approximations resulting from a different subspace decomposition.
In the Schur deflation, the composite subspace is decomposed in a sum of two orthogonal subspaces which are coupled by , while in the LRdeflation the subspaces no longer mix, at the expense of losing orthogonality of the two subspaces.
Accordingly, both schemes introduce different approximations to : the Schur deflation approximates the orthogonal projection of the solution vector on the total composite space using Eq. (29), while the LRdeflation first extracts the critical eigendirections and only approximates the orthogonal projection of the residual vector on the Krylov subspace using Eq. (19). Therefore, in the Schur deflation the components of along the Schur vectors become more accurate as the Krylov subspace becomes larger, while in the LRdeflation the components of along the critical eigendirections can be computed exactly, independently of the size of the Krylov subspace. This is probably the reason for the observation that, for fixed and , the LRdeflation is slightly more accurate than the Schur deflation, see Fig. 4 below.
Numerically, the LRdeflation has two advantages over the Schur deflation. First, its Arnoldi method does not require the deflated directions to be (obliquely) projected out of the Krylov subspace because the subspaces do not mix. Second, this absence of coupling between the subspaces means that the LRscheme has no analog of the Sylvester equation (32).
A downside of the LRdeflation is that both left and right eigenvectors need to be computed in the initialization phase, whereas the Schur deflation only needs the right eigenvectors. Hence, the Schur deflation will have a shorter initialization phase, unless one needs to operate with both and its adjoint , in which case both sets of eigenvectors are needed anyways (for the latter, the roles of left and right eigenvectors are interchanged).
In the next section, we will present numerical results obtained with our modified Arnoldi method.
6 Results
We implemented the modified Arnoldi method proposed in the previous section to compute the sign function occurring in the overlap Dirac operator (1) of lattice QCD.
First we discuss the critical eigenvalues of the WilsonDirac operator , which are needed for deflation and have to be computed once for any given gauge configuration. Deflation is needed because of the existence of eigenvalues close to the imaginary axis. In Fig. 1 we show the spectrum of for a lattice and a lattice, using the same parameters as in Ref. [Bloch:2006cd], i.e., and gauge coupling . These complete spectra were computed with LAPACK [lapack]. This is a very costly calculation, especially for the lattice, which was done for the sole purpose of this numerical investigation but cannot be performed routinely during production runs.
Although the eigenvalues of interest for deflation in the case of the sign function are those with smallest absolute real parts, we decided to deflate the eigenvalues with smallest magnitude instead. Numerically the latter are more easily determined, and both choices yield almost identical deflations for the Wilson operator at nonzero chemical potential. The reason for this is that, as long as the chemical potential does not grow too large, the spectrum looks like a very narrow bowtie shaped strip along the real axis (see Fig. 1), and the sets of eigenvalues with smallest absolute real parts and smallest magnitudes will coincide.
In practice we compute the eigenvalues of with smallest magnitude with ARPACK. This package has an option to retrieve the eigenvalues with smallest magnitude without performing an explicit inversion, which would be very expensive in this case. However, the use of this option requires the eigenvalues to be near the boundary of the convex hull of the spectrum. From Fig. 1 it is clear that the eigenvalues closest to the origin are deep inside the interior of the convex hull and do not satisfy this requirement. Therefore we opted to compute the eigenvalues with smallest magnitude of the squared operator . Clearly the eigenvalues with smallest magnitude will be the same for both operators, as . The eigenvalues of are given by , where and are the real and imaginary parts of the eigenvalues of . The spectra of for the and lattices are shown in Fig. 2, and clearly the eigenvalues with smallest magnitudes are now close to the boundary of the convex hull of the spectrum so that ARPACK can find them more easily. Since in this approach we square the matrix, there is the potential danger that small eigenvalues get spoiled just by the additional numerical roundoff introduced when applying twice. However, this should be noticeable only if these eigenvalues are comparable in size to the roundoff error. Our calculations are not affected by this problem.
Obviously there is a tradeoff between the number of deflated eigenvalues and the size of the Krylov subspace. A useful piece of information in this context is the ratio of the magnitude of the largest deflated eigenvalue over the largest overall eigenvalue, which is given in Table 1 for different numbers of deflated eigenvalues. A comparison of the values for both lattice sizes indicates that the number of eigenvalues with a magnitude smaller than a given ratio increases proportionally with the lattice volume. This is consistent with a spectral density of the small eigenvalues proportional to the lattice volume. This property is also apparent in Figs. 1 and 2, as the contours enclosing the spectra remain unchanged when the volume is increased. As a first guess we therefore expect that scaling with the volume should yield comparable convergence properties of the method for various lattice sizes.
lattice  lattice  

2  0.0040  0.0016 
4  0.0056  0.0026 
8  0.0119  0.0035 
16  0.0183  0.0052 
32  0.0351  0.0084 
64  0.0631  0.0155 
128  —  0.0286 
The convergence of the method is illustrated in Fig. 3, where the accuracy of the approximation is shown as a function of the Krylov subspace size for two different lattice sizes. The various curves correspond to different numbers of deflated eigenvalues. The results in the figure were computed using the LRdeflation scheme. The Schur deflation yields similar results.
Without deflation () the Krylov subspace method would be numerically unusable because of the need of large Krylov subspaces. Clearly, deflation highly improves the efficiency of the numerical method: as more eigenvalues are deflated, smaller Krylov subspaces are sufficient to achieve a given accuracy.
Furthermore, the deflation efficiency grows with increasing lattice volume. To reach an accuracy of for the lattice with 25 () deflated eigenvalues, one requires a Krylov subspace size (). However, to reach the same accuracy for the lattice with a comparable deflation of 128 () critical eigenvalues, one only requires (). Although the matrix size is more than 5 times larger for the lattice, the Krylov subspace only has to be expanded by a factor of 1.2 to achieve the same accuracy (when is scaled proportional to so that the ratio of Table 1 remains approximately constant for both lattice sizes).
In Fig. 4 we compare the accuracy of the two deflation schemes described in Sec. 5.2 and 5.3. For an equal number of deflated eigenvalues and equal Krylov subspace size, the LRdeflation seems systematically slightly more accurate than the Schur deflation.
To assess the quality of the modified Arnoldi approximation, it is interesting to compare the approximations (33) and (41) for with the best approximation in the composite subspace, which corresponds to the orthogonal projection (27) of on ,
(42) 
The relative accuracy of this projection is also shown in Fig. 4. It is encouraging to note that the modified Arnoldi approximation is quite close to the exact projection .
lattice,  

Schur deflation  
initialization time: 14.1 s  
Arnoldi  total  
100  0.18  0.03  0.23 
200  0.59  0.21  0.81 
300  1.22  0.52  1.75 
400  2.05  1.08  3.16 
500  3.12  1.79  4.93 
600  4.37  2.90  7.31 
700  5.88  4.57  10.49 
800  7.56  6.69  14.28 
900  9.50  9.38  18.92 
1000  11.63  12.68  24.36 
lattice,  

LRdeflation  
initialization time: 27.5 s  
Arnoldi  total  
100  0.12  0.03  0.15 
200  0.45  0.20  0.66 
300  1.01  0.49  1.51 
400  1.77  1.02  2.82 
500  2.76  1.69  4.47 
600  3.94  2.77  6.74 
700  5.36  4.40  9.79 
800  6.96  6.44  13.44 
900  8.84  9.10  17.98 
1000  10.84  12.33  23.21 
lattice,  

Schur deflation  
initialization time: 884 s  
Arnoldi  total  
100  2.03  0.05  2.13 
200  5.16  0.22  5.45 
300  9.27  0.56  9.91 
400  14.59  1.15  15.85 
500  20.95  2.09  23.17 
600  28.12  3.35  31.61 
700  36.81  5.17  42.15 
800  46.32  7.39  53.88 
900  56.83  10.37  67.39 
1000  68.29  13.88  82.39 
lattice,  

LRdeflation  
initialization time: 1713 s  
Arnoldi  total  
100  0.66  0.03  0.75 
200  2.39  0.15  2.62 
300  5.16  0.42  5.69 
400  9.01  0.94  10.06 
500  13.96  1.73  15.84 
600  20.03  2.80  22.98 
700  27.09  4.44  31.70 
800  35.09  6.49  41.78 
900  44.38  9.10  53.70 
1000  54.74  12.36  67.34 
In Table 2 we show the CPU time used by the modified Arnoldi method for the Schur and LRdeflation schemes. The times needed to construct the orthonormal basis in the Krylov subspaces according to Eqs. (24) and (40) and to compute the sign function of the Arnoldi Hessenberg matrices are tabulated separately. The tabulated times were measured for an deflation for the lattice and for the lattice.
The larger CPU times required by the Schur deflation mainly reflect the additional orthogonalization of the Arnoldi vectors with respect to the Schur vectors. The time needed to compute the sign of the Hessenberg matrix is also slightly larger for the Schur deflation as it involves the additional solution of the Sylvester Equation (32). For the same reasons, varying for a given lattice size will only significantly change the timings for the Schur deflation (this dependence is not shown in the table).
To summarize, the LRdeflation scheme has a somewhat better accuracy and requires less CPU time per iteration than the Schur deflation. The one advantage of the Schur deflation is that it only requires the initial computation of the right eigenvectors, while the LRdeflation requires the computation of both left and right eigenvectors. The time needed to compute the critical eigenvectors of is given in the headers of the four blocks in Table 2. The choice of deflation scheme depends on the number of vectors for which needs to be computed. This will be the topic of future work on nested iterative methods for nonHermitian matrices occurring during the inversion of the overlap operator. Of course, as mentioned in Sec. 5.4, if one needs to apply both and its adjoint, then the LRdeflation will be the better choice.
7 Conclusion
In this paper we have proposed an algorithm to approximate the action of a function of a nonHermitian matrix on an arbitrary vector, when some of the eigenvalues of the matrix lie in a region of the complex plane close to a discontinuity of the function.
The method approximates the solution vector in a composite subspace, i.e., a Krylov subspace augmented by the eigenvectors corresponding to a small number of critical eigenvalues. In this composite subspace two deflation variants are presented based on different subspace decompositions: the Schur deflation uses two coupled orthogonal subspaces, while the LRdeflation uses two decoupled but nonorthogonal subspaces.
The subspace decompositions are then used to compute Arnoldibased function approximations in which the contribution of the critical eigenvalues is taken into account explicitly. This deflation of critical eigenvalues allows for a smaller size of the Krylov subspace and is crucial for the efficiency of the method.
For the sign function, deflation is particularly important because of its discontinuity along the imaginary axis. The method was applied to the overlap Dirac operator of lattice QCD at nonzero chemical potential, where deflation was shown to clearly enhance the efficiency of the method. If the overlap Dirac operator has to be inverted using some iterative method, each iteration will require the computation of for some vector . In such a situation the cost for computing the critical eigenvectors, which is done just once, is by far outbalanced by the smaller costs for each evaluation of the sign function. However, an important question that deserves further study is how the optimal number of deflated eigenvectors depends on the volume and how this influences the initialization time. This question could become performance relevant for large volumes.
As mentioned above, our next steps include the application of the twosided Lanczos method to the problem of approximating for a nonHermitian matrix, and the investigation of nested iterative methods for nonHermitian matrices. Work in these directions is in progress.
Appendix A Sylvester equation
In this appendix we describe a particularly simple algorithm to solve the special Sylvester equation
(43) 
where is an upper triangular matrix, is an upper Hessenberg matrix, and the righthand side and the unknown matrix are matrices. Classical methods to solve the Sylvester equation when and are full matrices are formulated in [BaSt72, GoNaVL79]. For triangular and the Sylvester equation can easily be solved by direct substitution, see, e.g., [Golub]. In principle, this algorithm could also be applied to Eq. (43) if the upper Hessenberg matrix is first transformed into triangular form using a Schur decomposition. Here we present a more efficient approach, which can be regarded as a natural extension of the algorithm for the triangular Sylvester equation when one of the matrices is upper Hessenberg instead of triangular. Blocking would also be possible (cf., e.g., [Geijn]), but since the solution of the Sylvester equation accounts only for a small portion of the overall time we did not pursue this issue further.
This matrix equation can be solved row by row from bottom to top, since Eq. (44) can be solved for row once rows are known,
(45) 
with .
Inside row one can solve for the element as a function of the elements to its left,
(46) 
for columns . From Eq. (46) it follows that all elements of row can be written as
(47) 
where the coefficients and can be computed explicitly from the recurrence relations
(48) 
starting from , . After substituting Eq. (47) with the known coefficients (48), element of Eq. (45) can be solved for ,
(49) 
Once is known, all other elements of row can be computed using Eq. (47) with coefficients (48).
Acknowledgments
This work was supported in part by DFG grants FOR465WE2332/42 and Fr755/151. JB would like to thank Thomas Kaltenbrunner for useful discussions.