Convergence analysis of GMRES for the Helmholtz equation via pseudospectrum ††thanks: This work was supported by the Alfred Kordelin’s Foundation and Academy of Finland projects 13267297 and 14341
Most finite element methods for solving time-harmonic wave - propagation problems lead to a linear system with a non-normal coefficient matrix. The non-normality is due to boundary conditions and losses. One way to solve these systems is to use a preconditioned iterative method. Detailed mathematical analysis of the convergence properties of these methods is important for developing new and understanding old preconditioners. Due to non-normality, there is currently very little existing literature in this direction. In this paper, we study the convergence of GMRES for such systems by deriving inclusion and exclusion regions for the pseudospectrum of the coefficient matrix. All analysis is done a priori by relating the properties of the weak problem to the coefficient matrix. The inclusion is derived from the stability properties of the problem and the exclusion is established via field of values and boundedness of the weak form. The derived tools are applied to estimate the pseudospectrum of time-harmonic Helmholtz equation with first-order absorbing boundary conditions, with and without a shifted-Laplace preconditioner.
Several different strategies for discretizing time-harmonic wave propagation problems using finite elements have been proposed in the literature. For typical problems, most of these strategies lead to a linear system with a large, sparse, indefinite and non-normal coefficient matrix. The indefiniteness is due to the wave-nature of the problem and the non-normality arises either from losses or truncation of infinite domains to finite ones. The large size of the system is due to the number of degrees of freedom required to resolve an oscillating solution. Because of their properties, the linear systems related to time-harmonic wave propagation problems are difficult to solve. Memory is an issue with direct solvers and lack of efficient preconditioners with iterative ones.
In order to develop new preconditioners and to understand old ones, it is important to know their effect on the convergence properties of the applied iterative method. Unfortunately, the convergence of iterative methods for linear systems with a non-normal coefficient matrix is a difficult subject of study. When the non-normality is significant, the iterative properties can be very different from the ones indicated by eigenvalues, [21, 10]. Similar difficulties are met with other properties related to the non-normal matrices, e.g., behavior of matrix exponentials cannot be predicted by eigenvalues, . Determining when the non-normality has a significant impact to iterative properties is complicated. The simplest way to estimate the impact is to compute one of the commonly used scalar measures of non-normality, e.g., , the conditioning of eigenvectors or the conditioning of individual eigenvalues, . However, except for the first one, these measures are not computable for large matrices. In addition, they can vary considerably even for relatively small systems .
The convergence of preconditioned iterative methods has been extensively studied in the context of finite element methods, . However, majority of the research has focused on real valued symmetric positive definite problems. The finite element discretization of these problems also leads to symmetric positive definite linear systems, which are solved using the preconditioned conjugate gradient method (PCG). The aim in the analysis of these methods is to estimate the convergence rate before computations. Only few of the existing works deal with indefinite linear systems, [28, 9, 2, 3, 18], and even fewer with non-normal indefinite ones, [25, 12, 26].
Most preconditioners for finite element discretizations of elliptic weak problems have been analyzed by using the abstract framework of Schwarz methods, . This framework is based on studying the properties of the underlying weak problem instead of the linear system. The convergence of PCG is related to the weak form via Rayleigh quotients. Such analysis is done in the inner product induced by the bilinear form. As Rayleigh quotients are the first step in the existing analysis, it does not carry over to complex valued, indefinite, non-normal linear systems. Such systems require a different set of analytics tools.
There currently exists three different ways to analyze iterative properties of a non-normal matrix : to study the field of values (FOV), pseudospectrum, or to include conditioning of eigenvectors to the convergence estimates. For time-harmonic wave-equations, estimating eigenvector conditioning before the matrices are constructed seems to be complicated and thus this approach is not suitable for our purposes. FOV has been applied to analyze the preconditioned time-harmonic Helmholtz equation e.g. in . However, FOV is always a convex set containing all eigenvalues of the matrix. As we will see, the spectrum of the problems we are interested in curls around the origin making FOV based methods unsuitable for our purposes. In contrast, the pseudospectrum can be a non-convex set and as we will show it can be estimated a priori, making it the best option of the three for this work.
In this paper, we study pseudospectrum as a tool for relating the properties of the weak problem to the convergence of the GMRES method. We derive convergence estimates for GMRES by establishing inclusion and exclusion regions for the pseudospectrum. The exclusion region is derived from the stability estimates of the weak problem and the inclusion region is based on then relation between pseudospectrum and FOV. In several cases, an inclusion for FOV can be easily obtained based on continuity properties of the weak form. All analysis is done a priori, so that the regions can be obtained without constructing the actual matrices or performing computations with them. The derived bounds are explicit in the relevant parameters of the problem, e.g., mesh size, wave-number and the losses. The presented analysis relies on general properties of the weak problem, stability and continuity so it is possible that it can be applied to other preconditioners and problems.
The paper is organized as follows. We begin with some preliminaries and proceed to give estimates relating pseudospectrum to a weak problem. After establishing these abstract results, we apply them to three example problems. We begin the examples by considering the Poisson problem, which is included for easy reference on what kind of information the derived estimates can deliver. Then we apply the presented tools to time-harmonic Helmholtz equation with and without a shifted-Laplace preconditioner. We end the paper with a discussion of the presented material.
Our model problem is: Find such that
where is some finite element space, is a sesquilinear form and an antilinear functional. The finite element space is spanned by basis so that every function admits the representation
in which the vector of coefficients . Problem (1) leads to the linear system
where , and . Hence, the sesquilinear form and the matrix are related as
where - is the conjugate transpose. The properties of the matrix will depend on the properties of the sesquilinear form and the basis functions via the above equation.
We will describe the actual problem and discretization in detail in Section 4. For now, let us note that when the sesquilinear form is related to the time-harmonic Helmholtz equation with absorbing boundary conditions, the matrix can be very large. This is due to the facet that the finite element mesh size has to be sufficiently fine before finite element method can produce accurate results, see [13, 14]. Typical engineering rule of thumb is to use ten degrees of freedom per one wave-length. For example, a cube for which each dimension is ten wave-lengths long requires one to use degrees of freedom, this is, or larger.
In the following, we assume that problem (1) has a unique solution and admits some kind of a stability estimate. Stability estimates are typically derived under additional assumptions on the domain and the antilinear functional . In general, the functional can be from the space , where is the dual space of . As such functionals can be quite badly behaving, stability estimates are often derived under the assumption , where . In this spirit, we make the following assumption.
Let be a Hilbert space, , and be the unique solution to problem (1). Then there exists a constant independent of and such that
where is a norm on and .
The pseudospectrum of a matrix , , is a family of sets depending on a parameter . The sets in the family are defined as
in which is the standard spectral norm. When the matrix is singular, we define . The notation is also used for the Euclidian norm of a vector. Clearly, the pseudospectrum can also be characterized as
in which we denote the smallest singular value of a matrix as .
The pseudospectrum was independently proposed by several authors as an extension of the spectrum, suitable to study the properties non-normal matrices, . The pseudospectrum has been extensively studied in the literature, see e.g. [24, 23, 17]. In the following, we write , when the matrix is clear from the context.
In the derivation of the inclusion region, we take advantage on the relation between FOV and pseudospectrum. The FOV of a matrix is defined as the set
The set is convex, compact and contains all eigenvalues of . As we will see, coarse inclusion for can be obtained by using it’s close relation with the sesquilinear form. We postpone stating the relation between pseudospectrum and FOV to Section 3, where we have introduced sufficient notation for proving it.
Both pseudospectrum and FOV can be related to convergence of the GMRES method, [5, 10, 21]. The approximation error for the solution generated by GMRES on step is measured as , where is the residual, . There holds that
in which is the space of monic polynomials of degree . The matrix valued polynomial in the above minimization problem can be evaluated using Dunford integral [27, 5]. Let the open set be such that and is the union of rectifiable positively oriented Jordan curves. The set is the spectrum of . Application of the Dunford integral gives
This integral can be used to derive estimates for equation (5). Let satisfy the assumptions made on the set and in addition let
This is, . In our case, is an approximation for the pseudospectral set. Estimating the integral gives
As we will illustrate in Section 4, this bound is useful for deriving worst case behavior of the GMRES convergence rate.
The convergence bound (8) is meaningful only if one can solve the complex polynomial minimization problem. Typically, the set is replaced with a larger set on which the minimization problem can be solved analytically. In simple cases, can replaced with a circular or an elliptical domain, . Due to the constraint , this approach gives useful information only when the circle or ellipsoid containing does not contain the origin. When this is the case, one can try to apply so-called bratwurst shaped domains . As the name suggests, a bratwurst shaped domain can curl around the origin and it can be used to derive convergence estimates for the minimization problem. The bratwurst shaped domains can be applied with the inclusion and exclusion regions derived in this paper. However, the construction given in  is not simple, and cannot yield easy to use a priori bounds.
In order to verify the analytically derived inclusion and exclusion results, we compute examples of the pseudospectral sets. Several different strategies for computing pseudospectrum have been proposed, see  and references therein. Several software packages, such as EigTool, are also freely available 222see the Pseudospectral Gateway, http://www.cs.ox.ac.uk/pseudospectra/.
To have full control over the computation of the pseudospectrum, we have chosen to use our own implementation of GRID - approach to compute pseudospectral sets. In the GRID-approach, a mesh is placed in the complex plane and the norm of the resolvent is computed for each grid point. The computed data is used to isolines describing the set. In the simplest case, the norm is computed as the largest singular value of the matrix . Clearly, such an approach is very expensive for large number of points and large matrices. The process can be sped up by adapting the computational grid to the resolvent norm or by using a suitable matrix factorization to speed up the evaluation of the largest singular value. We have opted to speed up the computation by using an adaptive strategy to refine the computational grid. An initial triangular grid is placed in the complex plane. The grid is iteratively refined to conform to the shape of the resolvent norm. We use a refinement strategy based on splitting triangles intersecting with pre-specified level sets of the resolvent norm. This guarantees higher resolution at interesting regions of the complex plane.
3 Abstract Framework
In this section,we derive inclusion and exclusion regions for the pseudospectral set. For this purpose, it is easier to bound the complement of , i.e.
The inclusion and exclusion regions will lead to a set containing the pseudospectrum. If the boundary of this set is a rectifiable Jordan curve, it can be used in connection with equation (8) to compute convergence estimates for the GMRES method. The exclusion will be a disc around the origin. For the results to be meaningful, the exclusion should not be fully contained in the inclusion. If this is the case, the polynomial minimization problem in equation (8) does not tend to zero and the bound does not provide useful information. This has to be studied separately for each problem.
When is non-singular, the matrix norm in equation (9) is defined as
To eliminate the inverse and to establish a connection to the weak problem, we define an auxiliary vector such that
Estimates for the resolvent norm are derived using the auxiliary variable. First, we establish the stability bound . When is bounded from above, this implies that is non-singular. In this case, the auxiliary vector is uniquely defined and
The norm (10) can be estimated using the stability estimate for as
We begin by taking advantage of the stability of the weak problem, Assumption 2.1. Due to the duality between coefficient vectors and functions, stability of the weak problem implies stability of the linear system. As all finite dimensional norms are equal, there exists positive constants independent of such that
When the derived framework is applied to a specific problem, and are typically dependent on the mesh size. The dependency of these constants on relevant problem parameters are discussed in Section 4. Combining these norm equivalences with Assumption 2.1 leads to the following corollary.
Let and be such that . Then there holds that
Let be such that
where is inner product on . Using Cauchy-Schwarz inequality and the norm equivalence given in equation (13) there holds that . Via this construction, vector defines an antilinear functional on as . By the definition of the dual norm and Cauchy-Schwarz inequality
It follows that
The above Corollary essentially gives a lower bound for the smallest singular value of . There holds that
so, that . Corollary 3.1 can be used to derive exclusion region near the origin. We give here a direct proof that fits well to the framework of the paper. Same result can be established from the lower bound for the smallest singular value by using Theorem 3 from .
The inclusion is obtained by relating pseudospectrum to FOV. The following Theorem is proven e.g. in, . For completeness, we give a proof using the notation used in this Section.
Let in which
Then there holds that .
The auxiliary variable is defined as
Testing the above equation with gives
Using Cauchy-Schwarz inequality gives
By the definition of FOV(A) in equation (4) there holds that
Theorem 3.2 gives tools for deriving an inclusion for the pseudospectrum. The FOV is directly related to the boundedness properties of the sesquilinear form of the original problem. This relation arises from the connection . The simplest estimate follows from boundedness of the sesquilinear form. Assume that there exists such that
Then there holds that
This is a very crude estimate, but it demonstrates how FOV can be bounded in simple cases. However, as we will see, more refined estimates are required to avoid inclusion of zero to the approximate pseudospectrum.
In this section, we demonstrate the presented theory with three examples. In all examples, we assume that is a bounded domain with Lipschitz continuous boundary. We use standard notation for Sobolev spaces, see .
The finite element space is defined as
where is a shape regular triangular or tetrahedral partition of , . This is, is the space of first order Lagrange finite elements. The space of first order polynomials over set is denoted by and the mesh-size by , respectively.
The presented theoretical results are independent of the domain, but the actual numerical examples are computed on . The meshes used in the tests are generated from a coarse mesh with approximately 100 nodes using uniform refinement. The coarse mesh is called level one mesh, once refined coarse mesh as a level two mesh and so on.
Throughout this Section, are generic positive constants independent of mesh size , solution, load, and parameters of the weak problem, if not otherwise stated. They may depend on the shape regularity constant of the partition and the domain .
4.1 Poisson equation
We begin by considering the finite element discretization of the Poisson equation: Find such that
In which and . This is
so that . We use the standard -norm
for the space .
It is straightforward to see that the matrix related to problem (17) is symmetric and positive definite, . The convergence of iterative methods for such linear systems can be analyzed using much easier techniques than pseudospectrum. However, such a simple example is useful for demonstrating what kind of information on GMRES convergence can be obtained based on the inclusion and exclusion results.
Pseudospectrum of a normal matrix can be easily computed from it’s eigenvalues. All normal matrices are unitary diagonalizable, hence there exists a diagonal and a unitary such that . Based on this expansion, there holds that
Thus, pseudospectrum of any normal matrix is a union of discs centered around it’s eigenvalues ,
The pseudospectrum for level one mesh is visualized in Figure 1 for different values of .
Next, we derive inclusion and exclusion regions using Theorem 3.1 and 3.2. First, we need to establish a stability estimate satisfying Assumption 2.1. As we are interested in mesh size explicit bounds, we use -explicit norm equivalences instead of equation (13). For the Poisson problem, stability estimate follows from the weak problem (17) by using Poincare-Friedrichs inequality. Let be the solution to (17) then there exists a constant such that
Following this stability estimate, we choose the space as and . Norm equivalences between -, - and the Euclidian norm can be derived in the finite element space using the scaling argument and inverse inequality, . There exists and such that
Now, we can apply Corollary 3.1 to derive a stability constant for the linear system arising from the weak problem (17). Let be such that . Then by Corollary 3.1 and the -explicit norm equivalences, there exists a constant such that
Application of Theorem 3.1 gives the following exclusion near the origin,
We proceed by deriving an inclusion for , which together with Theorem 3.2 gives inclusion for . It is easy to derive the estimates
So that . An application of Theorem 3.2 gives the inclusion , in which
The above inclusion and exclusion regions give us an approximation of pseudospectrum ,
Where the constant is independent of and . To exclude the origin from this approximate pseudospectrum, we have to choose the parameter as . In this case, the length of the boundary curve around the approximate pseudospectrum satisfies for some independent of and . When combined with equation (8) approximate pseudospectrum gives the GMRES convergence bound
Although the estimate could be optimized with respect to parameter , we have chosen , which gives correct asymptotic behavior with respect to . Using this and , the circle based bound leads to the estimate
When the termination criteria for GMRES is chosen such that the relative residual satisfies , the above estimate gives the required number of iterations as
Our approximate pseudospectrum cannot capture the behavior of for very small values of . For example in the current case, the exact pseudospectrum is composed of small discs with boundary length . Let be such that the discs generating the pseudospectrum do not intersect. Any finite union of disjoint disks satisfies the conditions placed on the set in the Dunford integral. Using equation (6) we obtain the estimate
which is valid for sufficiently small . Here is the number of disjoint eigenvalues of . For quasi-uniform meshes, there exists such that so that
Combining the above estimate with equation (5) gives
This estimate based on the exact set has a different multiplicative term in comparison to equation (20). Interestinly, for , multiplicative term is smaller, for it is equivalent and for bigger. Regardless of the multiplicative constant, the estimate (22) can deliver improved convergence number estimates. The best possible bound can be obtained at the limit , when the minimization problem can be solved using Chebychev polynomials, see e.g. . Based on the FOV, the condition number . We obtain an estimate for the number of iterations
The main difference between the estimates (21) and (23) is in in the power of the mesh size . For the particular problem, this difference is due to the fact, that the set cannot capture the behaviour of the pseudospectrum for small . For complicated problems, such knowledge is very difficult to come by and one has to be satisfied with worst case estimates, such as equation (21). The second difference between the two estimates is in the additive terms. These additive terms are relevant only when tolerance is of the same order of magnitude with , which requires usage of very fine mesh sizes
4.2 Helmholtz equation with absorbing boundary conditions
The Helmholtz equation with first-order absorbing boundary conditions is a more realistic example for the analysis presented in this paper. The weak problem reads: Find such that
The parameter , and . The inner product is the standard -inner product over . The stability of this problem has been analyzed in domains excluding any resonant behavior, .
Let be a bounded, star shaped domain with a smooth boundary and let be the solution to problem (24). Then there exists a constant independent of ,, and such that
in which the norm is defined as
The finite element approximation is defined as: Find such that
When the solution has -regularity, the existence of a unique solution to this problem can be guaranteed, when the mesh size requirement is satisfied, [13, 14, 19]. In this case, there exists a constant C such that the a priori error estimate
Due to the boundary term , problem (24) leads to a linear system with a non-normal coefficient matrix. As the boundary term depends on , it is complicated to determine if the non-normality is meaningful or not. In addition, due to the relation between the wave-number and the mesh size it is difficult to study the asymptotic behaviour of GMRES, when tends to infinity.
This discrete stability estimate holds under the following assumptions.
Assume that is a bounded, star shaped domain with a smooth boundary, the mesh size is such that and the solution to problem (24) has -regularity.
Following the discrete stability result (28) we choose the space as with the norm . The - explicit norm equivalences given in equation (18) can be used for this space. As we are interested in wavenumber and the mesh size explicit estimates, we use the -dependent norm given in equation (26) for the space V. Norm equivalences for this -dependent norm are easily established using equation (18) and (19) as
for some . Application of Corollary 3.1 gives the stability estimate for the coefficient vector
Using Theorem 3.1 leads to the exclusion region
around the origin. To obtain an inclusion, we again derive an inclusion for and apply Theorem 3.2. The sesquilinear form satisfies the boundedness estimates
for some . The estimate between the - and Euclidian norm used in above is derived using identical techniques as used for proving inequality (18). When Assumption 4.1 is satisfied, combining the two boundedness estimates leads to the inclusion
This set contains the origin, so it cannot be used to derive GMRES convergence bounds. In this case, the presented theory is genuinely required to understand GMRES convergence.
To validate the derived inclusion and exclusion regions, we have computed examples from the exact set for using mesh levels three and four. The results are visualized in Figures 2 and 3. Although, the - shaped domain used in computations does not have smooth boundary nor -regularity, the actual pseudospectral set is in good agreement with our theoretical results. Most importantly, when is sufficiently large, the pseudospectrum curls around the origin as predicted. Due to the solution having less that -regularity, the requirement on the mesh size on -shaped domain just takes the form , for some , depending on regularity of the exact solution.
The approximate pseudospectrum could also be used to to derive convergence estimate for GMRES method using Bratwurst shaped domains to solve the minimization problem. However, as a preconditioner would always be applied, the current case is not very interesting hence we do not proceed further with it.
4.3 Shifted-Laplace preconditioned Helmholtz equation
The analysis of inclusion and exclusion regions is more complicated, when a preconditioner is applied to speed up the convergence of the GMRES method. Several different preconditioners have been proposed for problem (24), see e.g. . We consider here the shifted-Laplace preconditioner . This preconditioner is based on solving an auxiliary problem on each step of the iteration. The auxiliary problem is defined as: For a given find such that
The sesquilinear form in the above equation is given as
in which is as defined in Section 4.2 and . This is, a loss term is added to the sesquilinear from defined in equation (25). The addition of the loss term leads to a stability estimate on the finite element space independent of the mesh size. Choosing in equation (31) gives
Taking imaginary part leads to
Now, using Cauchy-Schwarz inequality and norm equivalence (18) gives
for some . The matrix form of the preconditioner is denoted as , where . Hence, the problem to be solved by the GMRES method is
The rationale behind using shifted-Laplace preconditioners is that when a sufficiently large loss term is added, the action of the preconditioner can be efficiently evaluated using a multigrid method, . When applied directly to solve the original problem (24), multigrid methods face two challenges, . The standard smoothing iteration is not stable and the coarse grid correction has to be made on a sufficiently fine mesh. The introduction of a loss term has been analyzed in  for a problem with zero Dirichlet boundary conditions. In this case, additional losses improve the multigrid solver by allowing the coarse grid correction to be made on a coarser mesh. The coarse grid depends on the loss term, hence there is a tradeoff between the number of GMRES iterations and the cost of applying the preconditioner. Typically, the loss parameter is chosen as . For simplicity, we will consider here only the exact preconditioner. This gives good insight on what one can expect from the inexact case.
As we will see, a shifted-Laplace preconditioner can eliminate the mesh size dependency from the pseudospectral set. This is, the inclusion and exclusion regions are independent of the applied mesh size. This is a desired property, as the mesh size dependency in the non-preconditioned case leads quickly to an unbearably large number of iterations. The exclusion regions will, however depend on the ratio of and .
The shifted-Laplace preconditioner has been previously analyzed in  by estimating the location of the eigenvalues. The existing analysis is not explicit in and does not take the non-normality into account. In addition, the previous work does not include the exclusion region around the origin, which we can obtain using Theorem 3.1. and the stability result given in equation (28).
To study the shifted-Laplace preconditioner, we interpret the matrix as the matrix form of the sesquilinear form , where is as defined in equation (25) and in equation (31). A suitable stability estimate for this sesquilinear form is established by the following Corollary.
Let be such that
In addition, let Assumption 4.1 be satisfied. Then there exists a constant independent of ,,, and such that
The above stability estimate is given in the norm . Hence, we choose this as the norm of the space . The above Corollary also suggest to choose the space as previously. With these choices, a direct application of Theorem 3.1 gives the exclusion
When , the preconditioner solves the problem exactly and . As the constant in above is is independent of and , setting , leads to . A field of values based inclusion can be obtained as follows. There holds that
This is, the FOV is located inside the set .
The polynomial minimization problem in the GMRES convergence bound (8) does not give any information on the convergence, when the approximate pseudospectrum is an annulus surrounding the origin. To apply the FOV based estimate, one has to explicitly know the constants in derived inclusion and exclusion regions to guarantee that this cannot happen. The constant in the inclusion for FOV is related to the norm equivalence between and Euclidian norm. It is easy to see, that , where is the mass matrix. In typical cases , so that derived inclusion is not useful when and the dimension of the exclusion tends to zero when grows.
Due to the close relation between the preconditioner and the original problem, we can estimate the pseudospectrum using a problem specific technique.
There exists a positive constant such that
There holds that and . Using the identity for any , it follows that
Let be such that
As in Section 3, we establish the stability estimate . When is finite, this estimate yields the desired bound. Testing with any gives
Assuming that and dividing by yields
By adding an subtracting a suitable term, the above can be written as
Choosing , using the identity and taking imaginary part gives
When and , this is
the coefficient of the - term is positive and we obtain the estimate