Regularizing Effects of the Krylov Iterative Solvers CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDsThis work was supported in part by the National Science Foundation of China (No. 11771249)

# Regularizing Effects of the Krylov Iterative Solvers CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDs††thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)

Zhongxiao Jia Department of Mathematical Sciences, Tsinghua University, 100084 Beijing, China. (jiazx@tsinghua.edu.cn)
###### Abstract

For the large-scale linear discrete ill-posed problem or with contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to , CGME, the CG method applied to or with , and LSMR, the minimal residual (MINRES) method applied to . These methods have intrinsic regularizing effects, where the number of iterations plays the role of the regularization parameter. In this paper, we analyze the regularizing effects of CGME and LSMR and establish a number of results, which include the filtered SVD expansion of CGME iterates and prove that the 2-norm filtering best possible regularized solutions by CGME are less accurate those by LSQR and the 2-norm filtering best regularized solutions by LSMR are at least as accurate as those obtained by LSQR. We also prove that the semi-convergence of CGME always occurs no later than that of LSQR and the semi-convergence of LSMR always occurs no sooner than that of LSQR. In the meantime, using the analysis approach for CGME, we substantially improve a fundamental result on the accuracy of the truncated rank approximate SVD of generated by randomized algorithms. Numerical experiments confirm our theory on CGME and LSMR.

Key words. Discrete ill-posed, rank approximations, TSVD solution, semi-convergence, regularized solution, regularizing effects, CGME, LSMR, LSQR, CGLS

AMS subject classifications. 65F22, 15A18, 65F10, 65F20, 65R32, 65J20, 65R30

## 1 Introduction and Preliminaries

Consider the linear discrete ill-posed problem

 minx∈Rn∥Ax−b∥\,\ or \ Ax=b,   A∈Rm×n, b∈Rm, \hb@xt@.01(1.1)

where the norm is the 2-norm of a vector or matrix, and is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. Since the results in this paper hold for both the overdetermined () and underdtermined () cases, we simply assume that . (LABEL:eq1) typically arises from the discretization of the first kind Fredholm integral equation

 Kx=(Kx)(t)=∫Ωk(s,t)x(t)dt=g(s)=g, s∈Ω⊂Rq, \hb@xt@.01(1.2)

where the kernel and are known functions, while is the unknown function to be sought. If is non-degenerate and satisfies the Picard condition, there exists the unique square integrable solution ; see [10, 22, 24, 38]. Here for brevity we assume that and belong to the same set with . Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others; see, e.g., [1, 9, 10, 24, 36, 37, 38, 40, 48]. The right-hand side is assumed to be contaminated by a Gaussian white noise , caused by measurement, modeling or discretization errors, where is noise-free and . Because of the presence of noise and the extreme ill-conditioning of , the naive solution of (LABEL:eq1) bears no relation to the true solution , where denotes the Moore-Penrose inverse of a matrix. Therefore, one has to use regularization to extract a best possible approximation to .

For a Gaussian white noise , throughout the paper, we always assume that satisfies the discrete Picard condition with some constant for arbitrarily large [1, 13, 20, 21, 22, 24, 37]. Without loss of generality, assume that . Then a dominating regularization approach is to solve the problem

 minx∈Rn∥Lx∥  subject to  ∥Ax−b∥≤τ∥e∥ \hb@xt@.01(1.3)

with slightly [22, 24], where is a regularization matrix and its suitable choice is based on a-prior information on . Typically, is either the identity matrix or the scaled discrete approximation of a first or second order derivative operator.

We are concerned with the case in this paper. If , (LABEL:posed), in principle, can be transformed into the standard-form problem (LABEL:posed) [22, 24], which is a 2-norm filtering regularization problem. The solutions of (LABEL:eq1) and (LABEL:posed) can be fully analyzed by the singular value decomposition (SVD) of :

 A=U(Σ0)VT, \hb@xt@.01(1.4)

where and are orthogonal, with the singular values assumed to be simple, the superscript denotes the transpose of a matrix or vector, and denotes a zero matrix. With (LABEL:eqsvd), we have

 xnaive=n∑i=1uTibσivi=n∑i=1uTibtrueσivi+n∑i=1uTieσivi=xtrue+n∑i=1uTieσivi \hb@xt@.01(1.5)

with .

The discrete Picard condition means that, on average, the Fourier coefficients decay faster than and enables regularization to compute useful approximations to , which results in the following popular model that is used throughout Hansen’s books [22, 24] and the references therein as well as the current paper:

 |uTibtrue|=σ1+βi,  β>0, i=1,2,…,n, \hb@xt@.01(1.6)

where is a model parameter that controls the decay rates of .

The covariance matrix of the Gaussian white noise is , and the expected value . With (LABEL:eqsvd), it holds that , so that and ; see, e.g., [22, p.70-1] and [24, p.41-2]. The noise thus affects more or less equally. With (LABEL:picard), relation (LABEL:eq4) shows that, for large singular values, is dominant relative to . Once from some onwards, the noise dominates , and the terms overwhelm for small singular values and thus must be suppressed. The transition point is such that

 |uTk0b|≈|uTk0btrue|>|uTk0e|≈η, |uTk0+1b|≈|uTk0+1e|≈η; \hb@xt@.01(1.7)

see [24, p.42, 98] and a similar description [22, p.70-1].

The truncated SVD (TSVD) method [20, 22, 24] is a reliable and commonly used method for solving small to modest sized (LABEL:posed), and it solves a sequence of problems

 min∥x∥  subject to  ∥Akx−b∥=min \hb@xt@.01(1.8)

starting with onwards, where is a best rank approximation to with respect to the 2-norm with , and ; it holds that [3, p.12], and solves (LABEL:tsvd), called the TSVD regularized solution.

Based on the above properties of the Gaussian white noise and the discrete Picard condition (LABEL:picard), we know from [22, p.70-1] and [24, p.71,86-8,95] that

 xtsvdk=A†kb=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩k∑i=1uTibσivi≈k∑i=1uTibtrueσivi,k≤k0;k∑i=1uTibσivi≈k0∑i=1uTibtrueσivi+k∑i=k0+1uTieσivi,k>k0 \hb@xt@.01(1.9)

and is the 2-norm filtering best TSVD regularized solution to (LABEL:eq1). has the minimal error . The index plays the role of the regularization parameter. It has been observed and justified that essentially has the minimum 2-norm error; see [47], [21], [22, p.109-11] and [24, Sections 4.2 and 4.4]. Naturally, we can take as the standard reference when assessing the regularization ability of a 2-norm filtering regularization method. A number of parameter-choice methods have been developed for finding , such as the discrepancy principle, the L-curve criterion, and the generalized cross validation (GCV) [22, 24].

For large, the TSVD method is generally too demanding, and only iterative regularization methods are computationally viable. Krylov iterative solvers are a major class of methods that compute iterates from a sequence of low dimensional Krylov subspaces to approximate [1, 10, 14, 17, 22, 24, 38]. Of them, the CGLS or CGNR method [15, 26] and its mathematically equivalent LSQR method [42], the CGME or CGNE method [3, 4, 6, 17, 18] and the LSMR method [4, 5, 12] have been commonly used. These methods are deterministic 2-norm filtering regularization methods and have been intensively studied [1, 8, 14, 17, 18, 22, 24, 27, 28], and they have general regularizing effects and exhibit semi-convergence [40, p.89]; see also [3, p.314], [4, p.733], [22, p.135] and [24, p.110]: The iterates first converge to , then the noise starts to deteriorate the iterates so that they start to diverge from and instead converge to . If we stop at the right time, then we have a regularization method, where the iteration number plays the role of the regularization parameter.

The behavior of ill-posed problems depends on the decay rate of . The following characterization of the degree of ill-posedness of (LABEL:eq1) was introduced in [29] and has been widely used [1, 10, 22, 24]: If with , , then (LABEL:eq1) is severely ill-posed; if , then (LABEL:eq1) is mildly or moderately ill-posed for or . We mention that the requirement must be met for a linear compact operator equation [19, 22].

Hanke and Hansen [19] address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult; see also [23]. Much work has been done on the regularizing effects of LSQR and CGLS; see, e.g., [11, 18, 22, 24, 27, 28, 30, 32, 33, 34, 43, 46]. It has long been known (cf. [19, 22, 23, 24]) that if the singular values of the projection matrices involved in LSQR, called the Ritz values, approximate the large singular values in natural order then the regularizing effects of LSQR are as good as those of the TSVD method and the two methods can compute 2-norm filtering best possible regularized solutions with the same accuracy. As we will see clearly, the same results hold for CGME and LSMR when the singular values of projection matrices approximate the large singular values of and in this order, respectively.

If a regularized solution of (LABEL:eq1) is at least as accurate as , it is called a best possible regularized solution. If the regularized solution by an iterative solver at semi-convergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver has only the partial regularization. This definition is introduced in [30, 32]. In terms of it, a fundamental concern is: Do CGLS, LSQR, CGME and LSMR have the full or partial regularization for severely, moderately and mildly ill-posed problems?

For the cases that are simple and multiple, the author [32, 33, 34] has proved that, for LSQR, the Ritz values converge to the large singular values of in natural order and Lanczos bidiagonalization always generates a near best rank approximation until for the severely and moderately ill-posed problems with suitable and , meaning that LSQR and CGLS have the full regularization. If such desired properties fail to hold, nothing has been known on the full or partial regularization of LSQR. Nevertheless, beyond ones’ common expectation, numerical experiments on a number of 1D and 2D real-life ill-posed problems have demonstrated that LSQR always has the full regularization [32, 33]. Obviously, the regularization theory on LSQR is still far from complete, and its in-depth exploration deserves high attention.

In this paper, we analyze the regularization of CGME and LSMR under the assumption that all the singular values are simple. We establish a number of results from several viewpoints, and prove that the regularization ability of CGME is generally inferior to that of LSQR, that is, the 2-norm filtering best regularized solutions obtained by CGME at semi-convergence are generally less accurate than those obtained by LSQR. Particularly, we derive the filtered SVD expansion of CGME iterates, and prove that the semi-convergence of CGME always occurs no later than that of LSQR and can be much earlier than the latter. In the meantime, we show how to extract a rank approximation from the rank approximation to generated in CGME at iteration , which is as accurate as the rank approximation in LSQR. Based on such rank approximation at iteration , we propose a modified CGME (MCGME) method whose regularization ability is shown to be very comparable to that of LSQR. For LSMR, we present a number of results and prove that its regularizing effects are at least as good as those of LSQR and the two methods compute the 2-norm filtering best regularized solutions with essentially the same accuracy. We also show that the semi-convergence of LSMR always occurs no sooner than that of LSQR.

As a windfall, motivated by our analysis approach used for CGME, we improve a fundamental result, Theorem 9.3 presented in Halko et al. [16], on the accuracy of the truncated rank SVD approximation to generated by randomized algorithms, which have formed a highly intensive topic and have been used in numerous disciplines over the years. As remarked by Halko et al. in [16] (cf. Remark 9.1 there), their bound appears “conservative, but a complete theoretical understanding lacks”. Our new bounds for the approximation accuracy are unconditionally sharper than theirs. The new bounds perfectly explain why their bound appears to be an overestimate.

The paper is organized as follows. In Section LABEL:methods, we review LSQR, CGME and LSMR. In Section LABEL:lsqr, we review some results on LSQR in [32, 33, 34] and take them as a comparison standard to assess the regularizing effects of CGME and LSMR. In Section LABEL:cgme, we derive a number of regularizing properties of CGME and propose the MCGME method. In Section LABEL:randomappro, we consider the accuracy of the truncated rank randomized SVD approximation [16] and present our sharper bounds. In Section LABEL:lsmr, we study the regularization ability of LSMR. In Section LABEL:numer, we report numerical experiments to confirm our theory. We conclude the paper in Section LABEL:conclu.

Throughout the paper, we denote by the dimensional Krylov subspace generated by the matrix and the vector , and by the bold letter the zero matrix with orders clear from the context.

## 2 The LSQR, CGME and LSMR algorithms

These three algorithms are all based on the Lanczos bidiagonalization process, which computes two orthonormal bases and of and for , respectively. We describe the process as Algorithm 1.

Algorithm 1:  -step Lanczos bidiagonalization process

• Take , and define .

• For

Algorithm 1 can be written in the matrix form

 AQk =Pk+1Bk, \hb@xt@.01(2.1) ATPk+1 =QkBTk+αk+1qk+1(e(k+1)k+1)T, \hb@xt@.01(2.2)

where denotes the -th canonical basis vector of , , and

 Bk=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝α1β2α2β3⋱⋱αkβk+1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠∈R(k+1)×k. \hb@xt@.01(2.3)

It is known from (LABEL:eqmform1) that

 Bk=PTk+1AQk. \hb@xt@.01(2.4)

Algorithm 1 cannot break down before step when , are simple since is supposed to have nonzero components in the directions of . The singular values of , called the Ritz values of with respect to the left and right subspaces and , are all simple.

Write and . At iteration , LSQR [42] solves

 ∥Axlsqrk−b∥=minx∈VRk∥Ax−b∥

for the iterate

 xlsqrk=Qkylsqrk  with  ylsqrk=argminy∈Rk∥Bky−β1e(k+1)1∥=β1B†ke(k+1)1, \hb@xt@.01(2.5)

where is the first canonical basis vector of , and decreases monotonically with respect to .

CGME [4, 17, 18, 27, 28] is the CG method implicitly applied to or and , and it solves the problem

 ∥xnaive−xcgmek∥=minx∈VRk∥xnaive−x∥

for the iterate . The error norm decreases monotonically with respect to . Let be the matrix consisting of the first rows of , i.e.,

 ¯Bk=PTkAQk. \hb@xt@.01(2.6)

Then the CGME iterate

 xcgmek=Qkycgmek  with  ycgmek=β1¯B−1ke(k)1 \hb@xt@.01(2.7)

and with the -th canonical vector of .

LSMR [4, 12] is mathematically equivalent to MINRES [41] applied to the normal equation , and it solves

 ∥AT(b−Axlsmrk)∥=minx∈VRk∥AT(b−Ax)∥

for the iterate . The residual norm of the normal equation decreases monotonically with respect to , and the iterate

 xlsmrk=Qkylsmrk  with  ylsmrk=argminy∈Rk∥(BTkBk,αk+1βk+1e(k)k)Ty−α1β1e(k+1)1∥. \hb@xt@.01(2.8)

## 3 Some results on LSQR in [32, 33, 34]

From and (LABEL:yk) we have

 xlsqrk=QkB†kPTk+1b, \hb@xt@.01(3.1)

which is the minimum 2-norm solution to the problem that perturbs in (LABEL:eq1) to its rank approximation . Recall that . Analogous to (LABEL:tsvd), LSQR now solves a sequence of problems

 min∥x∥   subject to   ∥Pk+1BkQTkx−b∥=min \hb@xt@.01(3.2)

for starting with onwards, where in (LABEL:eq1) is replaced by a rank approximation of it. Clearly, if is a near best rank approximation to with an approximate accuracy and the singular values of approximate the large in natural order for , then LSQR has the same regularization ability as the TSVD method and thus has the full regularization because (i) and are the regularized solutions to the two perturbed problems of (LABEL:eq1) that replace by two rank approximations with the same quality, respectively; (ii) and solve the two essentially same regularization problems (LABEL:tsvd) and (LABEL:lsqrreg), respectively.

If the fail to approximate the large in natural order for some , at which the semi-convergence of LSQR occurs, then we must have , and vice versa; see [32, Theorem 3.1]. In this case, although there has been no definitive conclusion on the full or partial regularization of LSQR hitherto, the analysis on the TSVD method and the Tikhonov regularization method [22, 24] clearly shows that the core requirement on a regularization method depend on its ability to acquire the dominant SVD components of and meanwhile suppress the SVD components associated with the small singular values. Therefore, the more accurate the rank approximation is and the singular values of a projection matrix are approximations to some of the large singular values of , the better the regularization ability of the method has, so that the best regularized solution obtained by it is more accurate.

Define

 γlsqrk=∥A−Pk+1BkQTk∥, \hb@xt@.01(3.3)

which measures the accuracy of the rank approximation to involved in LSQR. Since the best rank approximation satisfies , we have

 γlsqrk≥σk+1.

Let us define a near best rank approximation to . is called a near best rank approximation to if is closer to than to :

 σk+1≤γlsqrk<σk+σk+12. \hb@xt@.01(3.4)

The author [33] has derived accurate estimates for and some approximation properties of for the three kinds of ill-posed problems. The results have shown that, for severely and moderately ill-posed problems with for suitable and and for , must be a near best rank approximation to , and the Ritz values approximate the large singular values of in natural order. This means that LSQR has the full regularization for these two kinds of problems once and suitably. However, for moderately ill-posed problems with not enough and mildly ill-posed problems, is generally not a near best rank approximation, and the Ritz values do not approximate the large singular values of in natural order for some . Furthermore, for the ill-posed problems with and , the situation is subtle. The author [34] has shown that a near best rank approximation to does not mean the approximations of the Ritz values to the large singular values in natural order.

In particular, the author [33, Theorem 6.1] has shown that

 γlsqrk =∥Gk∥ \hb@xt@.01(3.5)

with

 Gk =⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝αk+1βk+2αk+2βk+3⋱⋱αnβn+1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠∈R(n−k+1)×(n−k), \hb@xt@.01(3.6)
 αk+1 <γlsqrk, βk+2<γlsqrk, k=1,2,…,n−1, \hb@xt@.01(3.7) γlsqrk+1 <γlsqrk, k=1,2,…,n−2. \hb@xt@.01(3.8)

These notation and results will be frequently used later.

## 4 The regularization of CGME

Note that . We obtain

 xcgmek=Qk¯B−1kPTkb. \hb@xt@.01(4.1)

Therefore, analogous to (LABEL:tsvd) and (LABEL:lsqrreg), CGME solves a sequence of problems

 min∥x∥   subject to   ∥Pk¯BkQTkx−b∥=min \hb@xt@.01(4.2)

for the regularized solution starting with onwards, where in (LABEL:eq1) is replaced by a rank approximation of it.

Just as LSQR, if is a near best rank approximation to and the singular values of approximate the large ones of in natural order for , then CGME has the full regularization.

By (LABEL:eqmform1), (LABEL:eqmform2) and (LABEL:Bk), the rank approximation involved in LSQR is

 Pk+1BkQTk=AQkQTk. \hb@xt@.01(4.3)

By (LABEL:gammak), we have For CGME, by (LABEL:eqmform2) and (LABEL:bbar), we obtain

 Pk+1PTk+1A =Pk+1(BkQTk+αk+1e(k+1)k+1qTk+1) =Pk+1(Bk,αk+1e(k+1)k+1)QTk+1 =Pk+1¯Bk+1QTk+1. \hb@xt@.01(4.4)

Therefore, is the solution to (LABEL:cgmereg) in which the rank approximation to is , whose approximation accuracy is

 γcgmek=∥A−Pk¯BkQTk∥=∥(I−PkPTk)A∥. \hb@xt@.01(4.5)
###### Theorem 4.1

For the rank approximations to , , with the definition we have

 γlsqrk<γcgmek<γlsqrk−1 , \hb@xt@.01(4.6) γcgmek+1<γcgmek. \hb@xt@.01(4.7)

Proof. We give two proofs of the upper bound in (LABEL:cgmelowup). The first is as follows. Since , from (LABEL:eqmform2) we obtain

 (γlsqrk)2 =∥A−Pk+1BkQTk∥2 =∥Pk+1PTk+1A−Pk+1BkQTk+(I−Pk+1PTk+1)A∥2 =max∥y∥=1∥((Pk+1PTk+1A−Pk+1BkQTk)+(I−Pk+1PTk+1)A)y∥2 =max∥y∥=1∥Pk+1PTk+1(Pk+1PTk+1A−Pk+1BkQTk)y+(I−Pk+1PTk+1)Ay∥2 =max∥y∥=1(∥Pk+1PTk+1(Pk+1PTk+1A−Pk+1BkQTk)y∥2+∥(I−Pk+1PTk+1)Ay∥2) =max∥y∥=1(∥Pk+1(PTk+1A−BkQTk)y∥2+∥(I−Pk+1PTk+1)Ay∥2) =max∥y∥=1(∥(PTk+1A−BkQTk)y∥2+∥(I−Pk+1PTk+1)Ay∥2) =max∥y∥=1(α2k+1|(e(k+1)k+1)Ty|2+∥(I−Pk+1PTk+1)Ay∥2) >max∥y∥=1∥(I−Pk+1PTk+1)Ay∥2 =∥(I−Pk+1PTk+1)A∥2=(γcgmek+1)2,

which is the upper bound in (LABEL:cgmelowup) by replacing the index with .

Taking in (LABEL:Bk) and augmenting such that is orthogonal, we have

 PTAQn=(Bn0), \hb@xt@.01(4.8)

where all the entries and , , of are positive, and is orthogonal. Then by the orthogonal invariance of the 2-norm we obtain

 γcgmek=∥A−Pk¯BkQTk∥=∥PT(A−Pk¯BkQTk)Qn∥=∥(βk+1e1,Gk)∥ \hb@xt@.01(4.9)

with defined by (LABEL:gk1). It is straightforward to justify that the singular values of strictly interlace those of by noting that is an unreduced symmetric tridiagonal matrix, from which and the lower bound of (LABEL:cgmelowup) follows.

Based on (LABEL:cgmeapp), we can also give the second proof of the upper bound in (LABEL:cgmelowup). Observe from (LABEL:gk1) that is the matrix deleting the first row of . Applying the strict interlacing property of singular values to and , we obtain , which yields the upper bound of (LABEL:cgmelowup).

From (LABEL:cgmeapp), notice that is the matrix deleting the first row of and the first column, which is zero, of the resulting matrix. Applying the strict interlacing property of singular values to and establishes (LABEL:mono).

(LABEL:cgmelowup) indicates that is definitely a less accurate rank approximation to than in LSQR. (LABEL:mono) shows the strict monotonic decreasing property of . Moreover, keep in mind that . Then a combination of it and the results in Section LABEL:lsqr indicates that, unlike in LSQR, there is no guarantee that is a near best rank approximation to even for severely and moderately ill-posed problems, because simply lies between and and there do not exist any sufficient conditions on and that enforce to be closer to , let alone closer to .

In the following we investigate the approximation behavior of the singular values of . Before proceeding, it is necessary to have a closer look at Algorithm 1 and distinguish some subtleties when is rectangular, i.e., , and square, i.e., , respectively.

As we have described in Section LABEL:methods, Algorithm 1 does not break down before step . For the rectangular case , Algorithm 1 is exactly what is presented there, all the and are positive, , and we generate and at step and . As a consequence, by definition (LABEL:leftbidiag), we have

 ¯Bn+1=(Bn,αn+1e(n+1)n+1)=(Bn,0). \hb@xt@.01(4.10)

It is known from (LABEL:fulllb) that the singular values of are identical to the singular values of . Therefore, the singular values of are and zero.

For the square case , however, we must have , that is, the last row of is zero; otherwise, we would obtain an orthonormal matrix , which is impossible since is already an orthogonal matrix. After Algorithm 1 is run to completion, we have

 ¯Bn=PTnAQn,

whose singular values are exactly equal to the singular values of , which are identical to the singular values of .

By the definition (LABEL:leftbidiag) of , from (LABEL:eqmform2) and the above description, for both the rectangular and square cases we obtain

 PTkAATPk=¯Bk¯BTk,k=1,2,…,n∗, \hb@xt@.01(4.11)

with for and for , which are unreduced symmetric tridiagonal matrices. For , the eigenvalues of are just