# Low-rank updates of matrix functions

Bernhard Beckermann Laboratoire Painlevé UMR 8524, UFR Mathématiques, Univ. Lille, 59655 Villeneuve d’Ascq, France. E-mail: bbecker@math.univ-lille1.fr. Supported in part by the Labex CEMPI (ANR-11-LABX-0007-01).    Daniel Kressner MATHICSE-ANCHP, École Polytechnique Fédérale de Lausanne, Station 8, 1015 Lausanne, Switzerland. E-mail: daniel.kressner@epfl.ch    Marcel Schweitzer MATHICSE-ANCHP, École Polytechnique Fédérale de Lausanne, Station 8, 1015 Lausanne, Switzerland. E-mail: marcel.schweitzer@epfl.ch. The work of Marcel Schweitzer has been supported by the SNSF project Low-rank updates of matrix functions and fast eigenvalue solvers.
July 23, 2019
###### Abstract

We consider the task of updating a matrix function when the matrix is subject to a low-rank modification. In other words, we aim at approximating for a matrix of rank . The approach proposed in this paper attains efficiency by projecting onto tensorized Krylov subspaces produced by matrix-vector multiplications with and . We prove the approximations obtained from steps of the proposed methods are exact if is a polynomial of degree at most and use this as a basis for proving a variety of convergence results, in particular for the matrix exponential and for Markov functions. We illustrate the performance of our method by considering various examples from network analysis, where our approach can be used to cheaply update centrality and communicability measures.

Key words. matrix function, low-rank update, Krylov subspace method, tensorized Krylov subspace, matrix exponential, Markov function, graph communicability

AMS subject classifications. 15A16, 65D30, 65F30, 65F60

## 1 Introduction

This work is concerned with the problem of updating a matrix function when is subject to a low-rank modification. More specifically, given of rank we aim at computing the difference

 f(A+D)−f(A), \hb@xt@.01(1.1)

at a cost significantly smaller than the cost of computing from scratch. Throughout this work, we assume that is analytic on some domain containing and , the spectra of and .

The generality of (LABEL:eq:matrixfunupdate) includes a wide scope of applications. We will particularly focus on measuring network properties, such as centrality and communicability, via applying the matrix exponential to the adjacency matrix of an undirected graph [18]. Removing/inserting individual edges or nodes in the graph corresponds to rank- updates of . Being able to solve (LABEL:eq:matrixfunupdate) efficiently therefore allows to quickly update these measures. In fact, this task is explicitly needed when measuring the betweenness of a node [18, Sec. 2.3].

For , the well-known Sherman-Morrison-Woodbury formula implies that the difference (LABEL:eq:matrixfunupdate) has rank and can be directly obtained from applying to the low-rank factors of . In particular, when has rank and can be written as with vectors and , we have

 (A+bc∗)−1−A−1=−11+c∗A−1bA−1bc∗A−1. \hb@xt@.01(1.2)

Unless is a scalar multiple of the identity matrix [27, Theorem 1.35], there is no such simple relation between and for a general analytic function . In the case of a rational function of degree , Bernstein and Van Loan [11] show that has rank at most and provide an explicit formula for the rank- correction. This construction is based on explicit Krylov bases and potentially prone to numerical instability for larger degrees.

The Cauchy integral representation

 f(A)=12πi∫Γf(z)(zI−A)−1dz \hb@xt@.01(1.3)

provides a link between matrix functions and (shifted) inverses. In [43], a combination of numerical quadrature with (LABEL:eq:sherman) has been explored for approximating . However, such an approach suffers from a number of drawbacks. Most importantly, the choice of a contour that is equally good for and is highly nontrivial, in particular in the non-Hermitian case. Also, the evaluation of the approximation amounts to the solution of a differently shifted linear system for each quadrature point. Although not used in our algorithms, integral representations will play a role in their convergence analysis, see Section LABEL:sec:convergence_integral.

The approach proposed in this work avoids the need for choosing a contour or an explicit rational approximation of . Inspired by existing Krylov subspace methods for matrix equations [41] and linear systems with tensor product structure [31], we make use of tensorized subspaces. More specifically, given orthonormal bases for -dimensional Krylov subspaces involving the matrices and the starting vectors , we construct an approximation of the form

 f(A+bc∗)−f(A)≈UmXm(f)V∗m,

with a well-chosen small matrix . In turn, when , the computational cost is dominated by matrix-vector products with and . As we will demonstrate for a variety of examples, this dramatically reduces the computational effort compared to computing from scratch, even when only selected quantities – such as the diagonal or trace of – are required.

The rest of this paper is organized as follows. In section LABEL:sec:algorithms, we introduce Krylov subspace algorithms for approximating the update (LABEL:eq:matrixfunupdate) when is a matrix of rank one, both for the Hermitian and the non-Hermitian case, and discuss the extension to higher rank. In section LABEL:sec:exactness, we prove that the approximation obtained from steps of our Krylov subspace algorithm is exact when is a polynomial of degree at most . This result forms the basis of the convergence analysis presented in sections LABEL:sec:convergenceanalysis and LABEL:sec:convergence_integral. Precisely, the convergence of our Krylov approximations is linked to certain polynomial approximation problems in section LABEL:sec:convergenceanalysis, while section LABEL:sec:convergence_integral contains results that are based on exploiting an integral representation of , e.g., for general analytic functions and for Markov functions. Applications for our methods are described in Section LABEL:sec:experiments, with special focus on up-/down-dating of communicability measures in network analysis, and the efficiency of our approach is illustrated by numerical experiments from this area. Concluding remarks are given in Section LABEL:sec:conclusion.

We use to denote the Euclidean norm of a vector or the induced spectral norm of a matrix.

## 2 Krylov projection algorithms

In the following, we assume that or at least the part relevant to the application (e.g., the diagonal of ) have already been computed. For the moment, we continue to assume a rank- modification, that is, we consider the approximation of for vectors . In Remark LABEL:remark:higherrank below, we will comment on the extension to higher ranks.

Because it is conceptually simpler, we first discuss the Hermitian case: and .

The first step of our algorithm consists of constructing an orthonormal basis for a Krylov subspace of with starting vector . Given , such a Krylov subspace takes the form

 Um:=\@fontswitchKm(A,b)=span{b,Ab,A2b,…,Am−1b}.

For simplicity, we suppose that has dimension , which is generically satisfied. Applying steps of the Lanczos method [32] (possibly with reorthogonalization) results in the Lanczos relation

 AUm=UmGm+βm+1um+1e∗m, \hb@xt@.01(2.1)

where the columns of form an orthonormal basis of , the matrix is tridiagonal, and denotes the th unit vector of length . The first column of is a scalar multiple of and, without loss of generality, we may assume that with the first unit vector .

The second step of our algorithm then chooses an approximation in the tensorized subspace , that is,

 f(A+bb∗)−f(A)≈UmXm(f)U∗m \hb@xt@.01(2.2)

for some matrix . In the spirit of Krylov subspace methods for matrix equations [41], we choose as the solution of the compressed problem:

 Xm(f)=f(U∗m(A+bb∗)Um)−f(U∗mAUm)=f(Gm+∥b∥2e1e∗1)−f(Gm), \hb@xt@.01(2.3)

where we assume that is defined on the spectra of and . Below, in Theorem LABEL:the:polynomial_exactness, we will see that this choice of leads to an approximation that is exact when is a polynomial of degree at most .

The resulting method is summarized in Algorithm LABEL:alg:krylovhermitian. The tridiagonal matrix from (LABEL:eq:arnoldi_relationU) is built from the orthogonalization coefficients as

 Gm=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣α1β2β2α2β3⋱⋱⋱βm−1αm−1βmβmαm⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

A trivial modification of this algorithm can be used to approximate .

Algorithm LABEL:alg:krylovhermitian represents the most basic form of the Lanczos process; in particular, it does not employ any reorthogonalization. It is well known that such short recurrences may suffer from severe loss of orthogonality in the presence of round-off errors, so that it can be advisable to use reorthogonalization strategies [37, 40]. The most straight-forward to retain numerical orthogonality amongst the basis vectors is to store all basis vectors and perform full reorthogonalization in each step of the method.

An alternative is to perform no reorthogonalization at all. In contrast to, e.g., the CG method for linear systems, there are no short recurrences for the iterates available. In turn, this requires to use a two-pass strategy for forming if one wants to avoid storing the full basis . In a first pass, the tridiagonal matrix is assembled (while storing only three basis vectors at a time and discarding the older ones) and is formed. In a second pass, the basis vectors are computed anew (again only storing three at a time), and the parts of interest of (e.g., its diagonal) are gradually computed. This approach doubles the number of matrix-vector products but reduces the storage requirement from to .

###### Example 2.1

We illustrate the (possible) impact of reorthogonalization on convergence. Consider the diagonal matrices , where the eigenvalues of are equidistantly spaced in the interval and the eigenvalues of are logarithmically spaced in . We approximate where and is a random vector of unit norm using Lanczos without reorthogonalization and with full reorthogonalization, as explained above. The resulting convergence curves are shown in Figure LABEL:fig:reo. For , both methods are observed to behave identically, which is well explained by the fact that the eigenvalue distribution of does not favor the convergence of Ritz values, which has a close link to loss of orthogonality [35, 37]. For , the eigenvalue distribution favors the convergence of Ritz values to larger eigenvalues and, in turn, convergence degrades after some time when using no reorthogonalization. Still, the method eventually converges to the same accuracy, it just needs more iterations. This phenomenon is also known from the conjugate gradient method in finite precision arithmetic [35] and when approximating  [15]. To avoid that orthogonality issues distort our findings, we use the Lanczos method with full reorthogonalization for all numerical experiments in the rest of this paper.

The computational cost of steps of Algorithm LABEL:alg:krylovhermitian is as follows:

• matrix-vector products with (2, when the two-pass approach is used), requiring operations for a sparse matrix with nonzeros;

• operations for the orthogonalization procedure when using full reorthogonalization (and only when the two-pass approach is used, because in each step orthogonalization against only two previous basis vectors is performed)

• the computation of , which requires the evaluation of two functions of matrices, which, depending on the function , typically needs at most operations.

One should avoid forming explicitly and we expect that this is actually not needed in most applications involving a large sparse matrix . For example, the computation of communicability measures discussed in Section LABEL:sec:network only requires the diagonal entries of , which can be computed directly from , with operations.

In the general non-Hermitian case, we construct orthonormal bases for the two (polynomial) Krylov subspaces and . Applying the Arnoldi method with reorthogonalization results in the Arnoldi relations (LABEL:eq:arnoldi_relationU) and, additionally,

 A∗Vm=VmHm+hm+1,mvm+1e∗m, \hb@xt@.01(2.4)

where the columns of form an orthonormal basis of . Note that both and are now upper Hessenberg matrices. The approximation is chosen in the tensorized subspace , that is,

 f(A+bc∗)−f(A)≈UmXm(f)V∗m \hb@xt@.01(2.5)

for some matrix . Concerning the choice of , it turns out that the non-Hermitian analogue of (LABEL:eq:Xm_hermitian_difference) does not have favorable theoretical properties. For example, the polynomial exactness property mentioned above for (LABEL:eq:Xm_hermitian_difference) and proven in section LABEL:sec:exactness does not hold for such a choice. We will use a different choice, motivated by the following simple result.

###### Lemma 2.2

Let , and . Define the block matrix

 A:=[Abc∗0A+bc∗]. \hb@xt@.01(2.6)

Then

 f(A)=[f(A)f(A+bc∗)−f(A)0f(A+bc∗)].

Proof. Letting denote a contour that encloses both and , the Cauchy integral formula (LABEL:eq:contourintro) applied to yields

 f(A) = 12πi∫Γf(z)(zI−A)−1dz \hb@xt@.01(2.7) = 12πi∫Γf(z)[(zI−A)−1−(zI−A)−1bc∗(zI−A−bc∗)−10(zI−A−bc∗)−1]dz = [f(A)−12πi∫Γf(z)(zI−A)−1bc∗(zI−A−bc∗)−1dz0f(A+bc∗)].

On the other hand, combining the Cauchy integral formula for and with the second resolvent identity yields

 f(A+bc∗)−f(A) = 12πi∫Γf(z)[(zI−A−bc∗)−1−(zI−A)−1]dz \hb@xt@.01(2.8) = −12πi∫Γf(z)(zI−A−bc∗)−1bc∗(zI−A)−1dz = −12πi∫Γf(z)(zI−A)−1bc∗(zI−A−bc∗)−1dz,

which matches the (1,2) block of (LABEL:eq:auxintegral) and thus completes the proof.

The result of Lemma LABEL:lem:block shows that the desired update is contained in the (1,2) block of . This motivates us to choose as the (1,2) block of applied to the compression of onto :

 f([Um00Vm]∗[Abc∗0A+bc∗][Um00Vm]) \hb@xt@.01(2.9) =

where we again assume that is defined on the spectra of and . Using Lemma LABEL:lem:block, it is straightforward to see that this choice of coincides in the Hermitian case with the one from section LABEL:sec:hermitian.

Algorithm LABEL:alg:krylovnonhermitian summarizes the proposed procedure, where we omit the algorithmic details for the Arnoldi method for the sake of brevity; see, e.g., [21].

The computational effort of Algorithm LABEL:alg:krylovnonhermitian compares to that of Algorithm LABEL:alg:krylovhermitian (with full reorthogonalization) as follows. In contrast to the Hermitian case, we now need to build two Krylov spaces instead of one, meaning that the number of matrix vector products necessary for steps of the method increases from to . The orthogonalization cost is the same as in the Hermitian case (with full reorthogonalization). As the number of operations needed for approximating a (dense) matrix function often grows cubically in the matrix size it is typically about four times as expensive to compute one matrix function of size than computing two matrix functions of size . The fact that the matrix for which we need to evaluate in Algorithm LABEL:alg:krylovnonhermitian is non-Hermitian and not tridiagonal may increase the cost further. However, as long as , the main cost of Algorithm LABEL:alg:krylovnonhermitian consists of performing matrix-vector products, so that we can expect it to take roughly twice the computation time of Algorithm LABEL:alg:krylovhermitian for a problem of the same size and structure.

###### Remark 2.3

There are two different ways of extending our approach from a rank-one modification to a general rank- modification with and :

1. By letting and denote the columns of and , respectively, we can write as a sum of rank-one matrices, , e.g., by computing a singular value decomposition of and then apply times Algorithm LABEL:alg:krylovnonhermitian in order to subsequently incorporate each of the rank-one modifications. Note that the th step of this procedure requires working with the Krylov subspaces and which in general do not coincide with and .

Some care is required in order to make use of Algorithm LABEL:alg:krylovhermitian for a Hermitian rank- matrix . After a preprocessing step (see, e.g., [6, Section 2.3]) one can write . First, Algorithm LABEL:alg:krylovhermitian is applied to incorporate the first terms followed by a slight modification of this algorithm to incorporate the last terms

2. Block Krylov subspaces [24, 36, 42] offer a conceptually different way of dealing with a rank- modification. Instead of the Arnoldi method, a block Arnoldi method is used for computing orthonormal bases of the block Krylov subspaces and . The approximation of is computed by projecting onto the tensorized block Krylov subspace , leading to a straightforward extension of Algorithm LABEL:alg:krylovnonhermitian. Block Krylov subspace methods are more complicated to implement, see, e.g., [24] for some of the issues one has to take into account, but they offer (at least) one major advantage. Even though the product of with an matrix requires the same number of operations as individual matrix vector products, it often performs much faster on a computer, benefitting from a more “cache-friendly” memory access pattern, see, e.g., [3]. Again some care is required in the symmetric case, to derive a block variant of Algorithm LABEL:alg:krylovhermitian.

### 2.3 Stopping criterion

A simple stopping criterion for Algorithm LABEL:alg:krylovhermitian or Algorithm LABEL:alg:krylovnonhermitian is based on the error estimate

 ∥f(A+bc∗)−UmXm(f)V∗m∥≈∥Um+dXm+d(f)V∗m+d−UmXm(f)V∗m∥ \hb@xt@.01(2.10)

for some small integer . This error estimate is similar in spirit to error estimates for linear systems proposed in [22, 34]. Evaluating the right-hand side of (LABEL:eq:error_estimate_difference) only requires forming the coefficient matrices defined in (LABEL:eq:krylov_update_hermitian) or (LABEL:eq:krylov_update_nonhermitian), because

 ∥Um+dXm+d(f)V∗m+d−UmXm(f)V∗m∥ = ∥∥∥Um+dXm+d(f)V∗m+d−Um+d[[]ccXm(f)000]V∗m+d∥∥∥ = ∥∥∥Xm+d(f)−[Xm(f)000]∥∥∥.

We illustrate the behavior of the error estimator (LABEL:eq:error_estimate_difference) by means of a simple numerical experiment.

###### Example 2.4

We choose as the standard finite difference discretization of the two-dimensional Laplace operator on the unit grid, and as vectors with random, normal distributed entries. We use our proposed Krylov subspace algorithm to approximate and show the convergence history as well as the proposed error estimator for in Figure LABEL:fig:error_estimates. We observe that the difference-based error estimators follow the norm of the exact error very closely and thus give very accurate results already for such small values of .

We remark that the error estimator (LABEL:eq:error_estimate_difference) is clearly heuristic and can be overly optimistic, especially in situations where the iteration (almost) stagnates, although it works very well in most practical situations. Of course, when already iterations of the method have been performed, one will typically use the approximation from the th step instead of the approximation from the th step (for which the error is estimated), as it can be expected to be a more accurate approximation. In fact, in Example LABEL:ex:error_estimate, for the error of the iterate lies below the error estimate (LABEL:eq:error_estimate_difference) for all but four iterations.

## 3 Exactness properties of Krylov subspace approximations for the update

In this section, we show that the approximation (LABEL:eq:krylov_update_nonhermitian) is exact when is a polynomial of degree at most . This serves two purposes: On the one hand, this justifies the particular choice of approximation we made. On the other hand, this provides the fundamental basis for performing our convergence analyses in section LABEL:sec:convergenceanalysis.

As a tool for proving polynomial exactness, we utilize the following matrix identity, which does not seem to be widely known.

###### Proposition 3.1

Let . Then, for any ,

 Mj−Nj=j−1∑k=0Nj−1−k(M−N)Mk.

Proof. The proof is by induction on . For , the assertion trivially holds. Suppose now that the assertion holds for :

 Mj−1−Nj−1=j−2∑k=0Nj−2−k(M−N)Mk. \hb@xt@.01(3.1)

Multiplying both sides of (LABEL:eq:matrix_identity_proof) by from the left and adding , we find

 Mj+NMj−1−Nj=Mj+j−2∑k=0Nj−1−k(M−N)Mk.

Finally, subtracting on both sides and noting that completes the proof.

###### Theorem 3.2

Let , . Then the Krylov subspace approximation returned by Algorithm LABEL:alg:krylovnonhermitian is exact for all , where denotes the space of all polynomials of degree at most , i.e.,

 p(A+bc∗)−p(A)=UmXm(p)V∗m.

Proof. By linearity, it suffices to show that the result holds for every monomial , . We recall that is the -block of the matrix

 [Gm∥b∥∥c∥e1e∗10(H∗m+∥c∥V∗mbe∗1)]j.

In particular, . By (LABEL:eq:Xm_nonhermitian), the result of the theorem is shown if we can establish

 (A+bc∗)j−Aj=UmXm(pj)V∗m. \hb@xt@.01(3.2)

Considering

 [Gm∥b∥∥c∥e1e∗10(H∗m+∥c∥V∗mbe∗1)]j=[Gj−1mXm(pj−1)0(H∗m+∥c∥V∗mbe∗1)j−1][GmXm(p1)0(H∗m+∥c∥V∗mbe∗1)],

we find the relation

 Xm(pj)=Gj−1mXm(p1)+Xm(pj−1)(H∗m+∥c∥V∗mbe∗1).

Resolving this recursion gives

 Xm(pj)=j−1∑k=0Gj−1−km(∥b∥∥c∥e1e∗1)(H∗m+∥c∥V∗mbe∗1)k. \hb@xt@.01(3.3)

Recalling that is the compression of onto , a well-known polynomial exactness property (see, e.g., [39, 17]) of Krylov subspace approximations yields

 ∥b∥UmGℓme1=Aℓb for % all ℓ=0,…,m−1. \hb@xt@.01(3.4)

Similarly, by noting that is the compression of onto , we obtain

 ∥c∥e∗1(H∗m+∥c∥V∗mbe∗1)ℓV∗m=c∗(A+bc∗)ℓ for all ℓ=0,…,m−1. \hb@xt@.01(3.5)

Multiplying (LABEL:eq:Ej_recursive) by from the left and by from the right then gives

 UmXm(pj)V∗m=j−1∑k=0UmGj−1−km(∥b∥∥c∥e1e∗1)(H∗m+∥c∥V∗mbe∗1)kV∗m

which by (LABEL:eq:polynomial_exactness_vector1)–(LABEL:eq:polynomial_exactness_vector2) gives for the relation

 UmXm(pj)V∗m=j−1∑k=0Aj−1−kbc∗(A+bc∗)k=(A+bc∗)j−Aj,

where we used Proposition LABEL:pro:matrix_identity in the second equality. This establishes (LABEL:eq:exactpower) and thus completes the proof.

## 4 Convergence analysis based on polynomial approximation problems

In this section, we give bounds for the error of the approximations (LABEL:eq:krylov_update_hermitian) and (LABEL:eq:krylov_update_nonhermitian). The results are based on the polynomial exactness property from Theorem LABEL:the:polynomial_exactness and connect the approximation quality of (LABEL:eq:krylov_update_hermitian) and (LABEL:eq:krylov_update_nonhermitian) to certain polynomial approximation problems.

The convergence results in this section rely on a theorem by Crouzeix [12, 13], for which we first recall some basic concepts. The field of values (or numerical range) is defined as the set

 W(A):={x∗Ax:∥x∥=1},

which is a convex and compact subset of containing all eigenvalues of . Further, we make use of the supremum norm

 ∥f∥E:=supz∈E|f(z)|

on a subset for which is defined. Using these definitions, Crouzeix’s theorem states that

 ∥f(A)∥≤C∥f∥E \hb@xt@.01(4.1)

with a constant for any function which is analytic in a neighborhood of . Notice that if is Hermitian then we can compute in terms of the eigenvalues of , and thus obviously (LABEL:eq:crouzeix) holds with .

We now use (LABEL:eq:crouzeix) together with the polynomial exactness property proven in Theorem LABEL:the:polynomial_exactness to obtain bounds on the error at the th step of our method:

 Em(f):=f(A+bc∗)−f(A)−UmXm(f)V∗m. \hb@xt@.01(4.2)
###### Theorem 4.1

Let be Hermitian and let be defined on a compact set containing . Then, the error (LABEL:eq:convergence_error) with and from (LABEL:eq:Xm_hermitian_difference) satisfies

 ∥Em(f)∥≤4minp∈Πm∥f−p∥E.

Proof. By the polynomial exactness property of stated in Theorem LABEL:the:polynomial_exactness, we have

 Em(f)=Em(f)−p(A+bb∗)+p(A)+UmXm(p)U∗m=Em(f−p)

for any polynomial . For arbitrary we thus have

 ∥Em(f)∥ = ∥Em(f−p)∥ \hb@xt@.01(4.3) = ∥(f−p)(A+bb∗)−(f−p)(A)−UmXm(f−p)U∗m∥ ≤ ∥(f−p)(A+bb∗)∥−∥(f−p)(A)∥−∥UmXm(f−p)U∗m∥ ≤ 2∥f−p∥E+∥UmXm(f−p)U∗m∥,

where the last inequality follows from (LABEL:eq:crouzeix). By (LABEL:eq:krylov_update_hermitian) and (LABEL:eq:Xm_hermitian_difference), we have

 UXm(f−p)U∗m=Um((f−p)(Gm+∥b∥2e1e∗1)−(f−p)(Gm))U∗m.

In turn,

 ∥UmXm(f−p)U∗m∥=∥Xm(f−p)∥≤2∥f−p∥E, \hb@xt@.01(4.4)

where, using and , we again applied (LABEL:eq:crouzeix). Inserting (LABEL:eq:proof_fp2) into (LABEL:eq:proof_fp) and taking the minimum over all completes the proof.

Theorem LABEL:the:convergence_polynomial_rational allows to derive convergence bounds from known polynomial approximation results for analytic functions. The obtained convergence rates will typically be exponential for functions which are analytic in a neighborhood of and superlinear for entire functions like the exponential.

To extend Theorem LABEL:the:convergence_polynomial_rational to the non-Hermitian case, we have to assume that is analytic on the field of values of the block matrix from (LABEL:eq:calA).

###### Theorem 4.2

Let and let be analytic in a neighborhood of a compact set containing . Then, the error (LABEL:eq:convergence_error), with , , and computed by Algorithm LABEL:alg:krylovnonhermitian, satisfies

 ∥Em(f)∥≤2Cminp∈Πm∥f−p∥E

with a constant .

Proof. By Lemma LABEL:lem:block, the update is the (1,2) block of , which we denote by . Letting , we note that the columns of are orthonormal and

 UmXm(f)V∗m=[Wmf(W∗mAWm)W∗m]1,2

holds by the definition of .

As in the proof of Theorem LABEL:the:convergence_polynomial_rational we use the fact that for any . Thus, we obtain for arbitrary that

 ∥Em(f)∥ = ∥Em(f−p)∥ = ∥(f−p)(A+bc∗)−(f−p)(A)−UmXm(f−p)V∗m∥ ≤ ∥[(f−p)(A)]1,2∥+∥[Wm(f−p)(W∗mAWm)W∗m]1,2∥ ≤ ∥(f−p)(A)∥+∥(f−p)(W∗mAWm)∥ ≤ 2C∥f−p∥E

where the last inequality follows from Crouzeix’s theorem, using . Taking the minimum over all gives the desired result.

Note that the matrix from Theorem LABEL:the:convergence_polynomial_rational_nonhermitian can be easily block-diagonalized:

 T−1AT=[AA+bc∗],withT=[II 0I],

with the matrix having the very modest -norm condition number . Unfortunately, this does not seem to admit any meaningful conclusion about the numerical range of . In fact, we are not aware of any tight relationship between and the numerical ranges of , . Writing

 A=[A00A]+[0bc∗0bc∗]=[A00A]+[bb][0∗,c∗]

we obtain from [29, Section 1.0.1 & Property 1.2.10] the inclusion

 W(A)⊆W(A)+W(uv∗),u=[bb],v=[0c]

where refers to the Minkowski sum of sets. In general, the field of values of a rank-one matrix is an ellipse with focal points and and minor semi-axis , which in our special case amounts to focal points and and minor semi-axis .

In Section LABEL:sec:convdiff, we will illustrate for an example of practical relevance that can have rather undesirable properties, to the extent that Theorem LABEL:the:convergence_polynomial_rational_nonhermitian is of little help in understanding the convergence of our algorithms. In the next section, we therefore derive convergence bounds by an approach that avoids the dependence on and only depends on and . While the second set may still be larger than , it is at least always smaller than .

## 5 Convergence results based on integral representations

We begin by considering results based on the Cauchy integral formula in Section LABEL:subsec:cauchy and then focus on the special case of Markov functions in Section LABEL:subsec:markov.

### 5.1 Convergence results based on the Cauchy integral formula

Let be analytic on a domain containing the eigenvalues of