Recursive Integral Method with Cayley Transformation

Recursive Integral Method with Cayley Transformation

Abstract

Recently, a non-classical eigenvalue solver, called RIM, was proposed to compute (all) eigenvalues in a region on the complex plane. Without solving any eigenvalue problem, it tests if a region contains eigenvalues using an approximate spectral projection. Regions that contain eigenvalues are subdivided and tested recursively until eigenvalues are isolated with a specified precision. This makes RIM an eigensolver distinct from all existing methods. Furthermore, it requires no a priori spectral information. In this paper, we propose an improved version of RIM for non-Hermitian eigenvalue problems. Using Cayley transformation and Arnoldi’s method, the computation cost is reduced significantly. Effectiveness and efficiency of the new method are demonstrated by numerical examples and compared with ’eigs’ in Matlab.

1 Introduction

We consider the non-Hermitian eigenvalue problem

 Ax=λBx, (1)

where and are large sparse matrices. Here can be singular. Such eigenvalue problems arise in many scientific and engineering applications [6, 11, 13] as well as in emerging areas such as data analysis in social networks .

The problem of interest in this paper is to find (all) eigenvalues in a given region on the complex plane without any spectral information, i.e., the number and distribution of eigenvalues in are not known.

In a recent work , we developed an eigenvalue solver RIM (recursive integral method). RIM, which is essentially different from all the existing eigensolvers, is based on spectral projection and domain decomposition. Briefly speaking, given a region whose boundary is a simple closed curve, RIM computes an indicator using spectral projection defined by a Cauchy contour integral on . The indicator is used to decide if contains eigenvalue(s). When the answer is positive, is divided into sub-regions and indicators for these sub-regions are computed. The procedure continues until the size of the region is smaller than the specified precision (e.g., ). The centers of the regions are the approximations of eigenvalues.

To be specific, for , the resolvent of the matrix pencil is defined as as

 Rz(A,B):=(A−zB)−1. (2)

Let be a simple closed curve lying in the resolvent set of on . Spectral projection for (1) is given by

 P(A,B)=12πi∫Γ(A−zB)−1dz. (3)

Given a random vector , it is well-known that projects onto the generalized eigenspace associated with the eigenvalues enclosed by . Clearly, is zero if there is no eigenvalue(s) inside , and nonzero otherwise.

RIM differs from classical eigensolvers  and recently developed integral based methods[5, 4]. It simply computes an indictor of a region using the approximation to ,

 Pf≈12πiW∑j=1ωjxj, (4)

where ’s are quadrature weights and ’s are the solutions of the linear systems

 (A−zjB)xj=f,j=1,…,W. (5)

Recall that if there is no eigenvalue inside , then for all .

In practice, one needs a threshold to distinguish between and . The indicator needs to be robust enough to treat the following problems:

• The randomly selected may have only a small component in the range of (which we write as ), in which case may be small even when there are eigenvalues in ;

• Since a quadrature rule is used to approximate , the indicator will not be zero (and may not even be very small) when there is no eigenvalue in .

The strategy of RIM in  selects a small threshold based on substantial experimentation, i.e., contains no eigenvalue if . This choice of threshold for discarding a region systematically leans towards further investigation of regions that may potentially contain eigenvalues. Of course, such a strategy leads to more computation cost.

To understand the first problem, consider an orthonormal basis for , which coincides with the eigenspace associated with all the eigenvalues within . For a random ,

 |Pf|=∣∣f|R(P)∣∣=∣∣ ∣∣M∑j=1ajϕj∣∣ ∣∣=(M∑j=1a2i)1/2, (6)

where . Since is random, could be very small. The solution in  is to normalize and project again. The indicator is set to be

 δS:=∣∣ ∣∣P(Pf|Pf|)∣∣ ∣∣. (7)

Analytically, . But numerical approximations to and may differ significantly.

The following is the basic algorithm for RIM:

• RIM

• Input: matrices , region , precision , threshold , random vector .

• Output: generalized eigenvalue(s) inside

• Compute using (7).

• Decide if contains eigenvalue(s).

• If , then exit.

• Otherwise, compute the size of .

• If ,

• partition into subregions .

• for

• RIM.

• end

• If ,

• set to be the center of .

• output and exit.

To compute , one needs to solve many linear systems

 (A−zjB)xj=f (8)

parameterized by . In , the Matlab linear solver ‘' is used to solve (8). This is certainly not efficient!

In this paper, we propose a new version of RIM, called RIM-C, to improve the efficiency. The contributions include: 1) Cayley transformation and Arnoldi’s method to speedup linear solves for the parameterized system (8); and 2) a new indicator to improve the robustness and efficiency. The rest of the paper is arranged as follows. In Section 2, we present how to incorporate Cayley transformation and the Arnoldi’s method into RIM. In Section 3, we introduce a new indicator to decide if a region contains eigenvalues. Section 4 contains the new algorithm and some implementation details. Numerical examples are presented in Section 5. We end up the paper with some conclusions and future works in Section 6.

2 Cayley Transformation and Arnoldi’s Method

2.1 Cayley transformation

The computation cost of RIM mainly comes from solving the linear systems (8) to compute the spectral projection . In particular, when the method zooms in around an eigenvalue, it needs to solve linear systems for many close ’s. This is done one by one in the first version of RIM . It is clear that the computation cost will be greatly reduced if one can take the advantage of the parametrized linear systems of same structure.

Without loss of generality, we consider a family of linear systems

 (A−zB)x=f, (9)

where is a complex number. When is nonsingular, multiplication of on both sides of (9) leads to

 (B−1A−zI)x=B−1f. (10)

Given a matrix , a vector , and a non-negative integer , the Krylov subspace is defined as

 Km(M;b):=span{b,Mb,…,Mm−1b}. (11)

The shift-invariant property of Krylov subspaces says that

 Km(aM+bI;b)=Km(M;b), (12)

where and are two scalars. Thus the Krylov subspace of is the same as , which is independent of .

The above derivation fails when is singular. Fortunately, this can be fixed by Cayley transformation . Assume that is not a generalized eigenvalue and . Multiplying both sides of (9) with

 (A−σB)−1, (13)

one obtains that

 (A−σB)−1f = (A−σB)−1(A−zB)x = (A−σB)−1(A−σB+(σ−z)B)x = (I+(σ−z)(A−σB)−1B)x.

Let and . Then (9) becomes

 (I+(σ−z)M)x=b. (14)

From (12), the Krylov subspace is the same as .

2.2 Analysis of the pre-conditioners

Now we look at the connection between two pre-conditioners and . Assume that is non-singular. Let be an eigenvalue of . Then is an eigenvalue of

 (A−σB)−1(A−zB).

The spectrum of might spread over the complex plane such that Krylov subspace based iterative methods may not converge. However, after Cayley transformation, when becomes large, will cluster around (see Fig. 1 for matrices and of Example 1 in Section 5). Similar result holds when is singular. Note that when approaches , will be very large in magnitude. When approaches , goes to zero. When is away from and , is . The key here is that the spectrum of (2.1) has a cluster of eigenvalues around and only a few isolated eigenvalues, which favors fast convergence in Krylov subspace. Figure 1: Matrices A and B are from Example 1 in Section 5. Left: Spectrum of original problem. Right: Spectrum after Cayley transformation.

2.3 Arnoldi Method for Linear Systems

The computation cost can be significantly reduced by exploiting (14). Consider the orthogonal projection method for

 Mx=b.

Let the initial guess be . One seeks an approximate solution in of dimension by imposing the Galerkin condition 

 (b−Mxm)⊥Km(M;b). (15)

The basic Arnoldi’s process (Algorithm 6.1 of ) is as follows.

• Choose a vector of norm

• for

• ,

• ,

• , if stop

• .

Let be the orthogonal matrix with column vectors and be the Hessenberg matrix whose nonzero entries are defined as above. From Proposition 6.6 of , one has that

 MVm=VmHm+vm+1hm+1,meTm (16)

such that

 span{col(Vm)}=Km(M;b).

Let . The Galerkin condition (15) becomes

 VTmb−VTmMVmy=0. (17)

Since (see Proposition 6.5 of ), the following holds:

 Hmy=VTmb.

From the construction of , . Let . Then

 y=βH−1me1. (18)

Consequently, the residual of the approximated solution can be written as

 ∥b−Mxm∥2=hm+1,m|eTmy|. (19)

Due to the shift invariant property, one has that

 {I+(σ−z)M}Vm=Vm(I+(σ−z)Hm)+(σ−z)vm+1hm+1,meTm. (20)

By imposing a Galerkin condition similar to (15), we have that

 VTmb−VTm{I+(σ−z)M}Vmy=0, (21)

which implies

 {I+(σ−z)Hm}y=βe1. (22)

From (19), one has that

 ∥b−{I+(σ−z)M}xm∥2=(σ−z)hm+1,m|eTmy|. (23)

Matrix is an matrix and is an upper Hessenberg matrix such that . Once and are constructed by Arnoldi’s process, they can be used to solve (22) for different ’s with residual given by (23). The residual can be monitored with a little extra cost.

Next we explain how the Arnoldi’s process is incorporated in RIM. To solve (8) for quadrature points ’s, one chooses a proper shift . Following (14), one has that

 (I+(σ−zj)M)xj=b, (24)

where and .

From (20) and (22),

 yj = β(I+(σ−zj)Hm)−1e1, (25) xj ≈ Vmyj, Pf ≈ 12πi∑wjVmyj. (26)

Hence the Krylov subspace for can be used to solve many linear systems associated with ’s close to .

3 An Efficient Indicator

Another critical problem of RIM is to how to define the indicator . As seen above, the indicator in  defined by (7) is to project a random vector twice. One needs to solve linear systems with different right hand sides, i.e., and . Consequently, two Krylov subspaces, rather than one, are constructed for a single shift .

In this section, we propose a new indicator that avoids the construction of two Krylov subspaces. The indicator stills needs to resolve the two problems (P1 and P2) in Section 1. The idea is to approximate with different sets of trapezoidal quadrature points by taking the advantage of the Cayley transformation and Arnoldi’s method discussed in the previous section.

Let be the approximation of with quadrature points. It is well-known that trapezoidal quadratures of a periodic function converges exponentially [14, Section 4.6.5], i.e.,

 ∣∣Pf−Pf|n∣∣=O(e−Cn),

where C is a constant depending on . The spectral projection satisfies

 Pf|n{≠0if there are % eigenvalues inside S,≈0no eigenvalue inside S.

For a large enough , one has that

 ∣∣Pf|2n0∣∣∣∣Pf|n0∣∣=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩|Pf|+O(e−C2n)|Pf|+O(e−Cn)if there are eigenvalues inside S,O(e−C2n)O(e−Cn)=O(e−Cn)no eigenvalue inside S.

The new indicator is set to be

 δS=|Pf2n0|/|Pfn0|. (27)

A threshold value is also needed to decide if there exists eigenvalue in or not. If , is said to be admissible, i.e., there exists eigenvalue(s) in . The value is chosen based on numerical experimentation. Due to (25) - (26), the computation cost to evaluate the new indicator is not expensive.

4 The New Algorithm

Now we are ready to give the algorithm in detail. It starts with several shifts ’s distributed in uniformly. The associated Krylov subspaces are constructed and stored. For a quadrature point , the algorithm first attempts to solve the linear system (9) using the Krylov subspace with shift closest to . If the residual is larger than the given precision , a Krylov subspace with a new shift is constructed, stored and used to solve the linear system. Briefly speaking, the algorithm constructed some Krylov subspaces with different ’s. These subspaces are then used to solve the linear system for all quadrature points ’s. From (25) and (26), instead of solving a family of linear systems of size , the algorithm solves linear systems of reduced size for most ’s. This is the key idea to speed up RIM. We denote this improved version of RIM by RIM-C (RIM with Cayley transformation).

Given a search region and a normalized random vector , we compute the indicator using (27). Without loss of generality, is assumed to be a square. We set in (27). If , is divided uniformly into regions. The indicators of these regions are computed. This process continues until the size of the region is smaller than .

1. Algorithm RIM-C:

2. RIM-C

3. Input:

• : matrices

• : search region in

• : a random vector

• : precision

• : residual threshold

• : indicator threshold

• : size of Krylov subspace

• : number of quadrature points

4. Output:

• generalized eigenvalues inside

1. Choose several ’s uniformly in and construct Krylov subspaces

2. Compute using (27).

• Let be a quadrature point.

• Check if the linear system can be solved using the existing Krylov subspaces with residual less than .

• Otherwise, choose a new , construct a new Krylov subspace to solve the linear system.

3. Decide if each contains eigenvalues(s).

• If , exit.

• Compute the size of , .

• If , uniformly partition into subregions

• for to

• call RIM-C

• end

• Otherwise, output the eigenvalue and exit.

5 Numerical Examples

In this section, RIM-C (implemented in Matlab) is employed to compute all the eigenvalues in a given region. To the authors’ knowledge, there exists no eigensolver doing exactly the same thing. We compare RIM-C with ‘eigs’ in Matlab (IRAM: Implicitly Restarted Arnoldi Method ). Although the comparison seems to be unfair to both methods, it gives some idea about the performance of RIM-C.

The matrices for Examples 1-5 come from a finite element discretization of the transmission eigenvalue problem [7, 3] using different mesh size . Therefore, the spectra of these problems are similar. For Matlab function ‘eigs(A,B,K,SIGMA)’, ‘K’ and ‘SIGMA’ denote the number of eigenvalues to compute and the shift, respectively. For RIM-C, the size of Krylov space is set to be , , , , and . All the examples are computed on a Macbook pro with 16 Gb memory and 3 GHz Intel Core i7.

Example 1: The matrices and are (mesh size ). The search region . For ‘eigs’, the ‘shift’ is set to be . For this problem, it is known that there exist eigenvalues in . Therefore, ‘K’ is set to be . Note that RIM-C does not need this information. The results are shown in Table 1. Both RIM-C and ‘eigs’ compute eigenvalues and they are consistent. ‘eigs’ uses less time than RIM-C.

Example 2: Matrices and are (mesh size ). Let . For ‘eigs’, ’shift’ is set to be . Again, it is known in advance that there are eigenvalues in . Hence ‘K’ is set to be . The results are shown in Table 2. Both methods compute same eigenvalues and ‘eigs’ is faster.

Example 3: Matrices and are matrices (mesh size ). Let . There are eigenvalues in . It is well-known that the performance of ‘eigs’ is highly dependent on ‘shift’. In Table 3, we show the time used by RIM-C and ‘eigs’ with different shifts ’shift =5, 10, 15’. Notice that when the shift is not good, ‘eigs’ uses much more time. In practice, good shifts are not known in advance.

Example 4: We consider a larger problem: and are . Let (mesh size ). There are eigenvalues in . The results are in Table 4. This example, again, shows that for larger problems without any spectrum information, the performance of RIM-C is quite stable and consistent. However, the performance of ‘eigs’ varies a lot with different ’shifts’.

Example 5: This example demonstrates the effectiveness and robustness of the new indicator. The same matrices in Example 3 () are used. Consider three regions and . has one eigenvalue inside. has two eigenvalues inside. contains no eigenvalue. Table 5 shows the indicators of these three regions computed using (27). It is seen that the indicator is different when there are eigenvalues inside the region and when there are no eigenvalues.

Table 6 shows the means, minima, maxima, and standard deviations of indicators of these three regions computed using random vectors. The indicators are consistent for different random vectors.

Example 6: The last example shows the potential of RIM-C to treat large matrices. The sparse matrices are of arising from a finite element discretization of localized quantum states in random media . RIM-C computed real eigenvalues in , shown in the right picture of Fig. 2.

6 Conclusions and Future Works

This purposes of this paper is to compute (all) the eigenvalues of a large sparse non-Hermitian problem in a given region. We propose a new eigensolver RIM-C, which is an improved version of the recursive integral method using spectrum projection. RIM-C uses Cayley transformation and Arnoldi method to reduce the computation cost.

To the authors’ knowledge, RIM-C is the only eigensolver for this particular purpose. As we mentioned, the comparison of RIM-C and ‘eigs’ is unfair to both methods. However, the numerical results do show that RIM-C is effective and has the potential to treat large scale problems.

Currently, the algorithm is implemented in Matlab. A parallel version using C++ is under development. For the time being, RIM-C only computes eigenvalues, which is good enough for some applications. However, adding a component to give the associated eigenvectors is necessary for other applications. It would also be useful to provide the multiplicity of an eigenvalue. These are our future works to make RIM-C a robust efficient eigensolver.

References

1. D.N. Arnold, G. David, D. Jerison, S. Mayboroda and M. Filoche, Effective confining potential of quantum states in disordered media. Physical Review Letters, 116(5), 2016, 056602.
2. R. Huang, A. Struthers, J. Sun and R. Zhang, Recursive integral method for transmission eigenvalues. Journal of Computational Physics, Vol. 327, 830 - 840, 2016.
3. J. Sun, Iterative methods for transmission eigenvalues. SIAM J. Numer. Anal., 49(5),1860-1874, 2011.
4. E. Polizzi, Density-matrix-based algorithms for solving eigenvalue problems. Phys. Rev. B., 79, 115112, 2009.
5. T. Sakurai and H. Sugiura, A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math, 159(1), 119-128, 2003.
6. G.H. Golub and H.A. van der Vorst, Eigenvalue computation in the 20th century. J. Comput. Appl. Math, 123(1-2), 35-65, 2000.
7. X. Ji, J. Sun and T. Turner, A mixed finite element method for Helmholtz Transmission eigenvalues. ACM Trans. Math. Software, 38(4), Article No. 29, 2012.
8. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang, Complex Networks: Structure and Dynamics. Physics Reports, 424(4-5), 175-308, 2006.
9. K. Meerbergen, The Solution of Parametrized Symmetric Linear Systems. SIAM Journal on Matrix Analysis and Applications, 24(4),1038-1059, 2003.
10. Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd Ed., Society for Industrial and Applied Mathematics, 2003.
11. Y. Saad, Numerical Methods for Large Eigenvalue Problems, 2nd Ed., Society for Industrial and Applied Mathematics, 2011.
12. R.B. Lehoucq, D.C. Sorensen and C. Yang, ARPACK User’s Guide – Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1998.
13. J. Sun and A. Zhou, Finite Element Methods for Eigenvalue Problems, Chapman and Hall/CRC, 2016.
14. P.J. Davis and P. Rabinowitz, Methods of Numerical Integration, 2nd Ed., Academic Press, Inc., Orlando, FL, 1984.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   