Regularizing Effects of the Krylov Iterative Solvers CGME and LSMR For Linear Discrete IllPosed Problems with an Application to Truncated Randomized SVDs^{†}^{†}thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)
Abstract
For the largescale linear discrete illposed problem or with contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to , CGME, the CG method applied to or with , and LSMR, the minimal residual (MINRES) method applied to . These methods have intrinsic regularizing effects, where the number of iterations plays the role of the regularization parameter. In this paper, we analyze the regularizing effects of CGME and LSMR and establish a number of results, which include the filtered SVD expansion of CGME iterates and prove that the 2norm filtering best possible regularized solutions by CGME are less accurate those by LSQR and the 2norm filtering best regularized solutions by LSMR are at least as accurate as those obtained by LSQR. We also prove that the semiconvergence of CGME always occurs no later than that of LSQR and the semiconvergence of LSMR always occurs no sooner than that of LSQR. In the meantime, using the analysis approach for CGME, we substantially improve a fundamental result on the accuracy of the truncated rank approximate SVD of generated by randomized algorithms. Numerical experiments confirm our theory on CGME and LSMR.
Key words. Discrete illposed, rank approximations, TSVD solution, semiconvergence, regularized solution, regularizing effects, CGME, LSMR, LSQR, CGLS
AMS subject classifications. 65F22, 15A18, 65F10, 65F20, 65R32, 65J20, 65R30
1 Introduction and Preliminaries
Consider the linear discrete illposed problem
\hb@xt@.01(1.1) 
where the norm is the 2norm of a vector or matrix, and is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. Since the results in this paper hold for both the overdetermined () and underdtermined () cases, we simply assume that . (LABEL:eq1) typically arises from the discretization of the first kind Fredholm integral equation
\hb@xt@.01(1.2) 
where the kernel and are known functions, while is the unknown function to be sought. If is nondegenerate and satisfies the Picard condition, there exists the unique square integrable solution ; see [10, 22, 24, 38]. Here for brevity we assume that and belong to the same set with . Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others; see, e.g., [1, 9, 10, 24, 36, 37, 38, 40, 48]. The righthand side is assumed to be contaminated by a Gaussian white noise , caused by measurement, modeling or discretization errors, where is noisefree and . Because of the presence of noise and the extreme illconditioning of , the naive solution of (LABEL:eq1) bears no relation to the true solution , where denotes the MoorePenrose inverse of a matrix. Therefore, one has to use regularization to extract a best possible approximation to .
For a Gaussian white noise , throughout the paper, we always assume that satisfies the discrete Picard condition with some constant for arbitrarily large [1, 13, 20, 21, 22, 24, 37]. Without loss of generality, assume that . Then a dominating regularization approach is to solve the problem
\hb@xt@.01(1.3) 
with slightly [22, 24], where is a regularization matrix and its suitable choice is based on aprior information on . Typically, is either the identity matrix or the scaled discrete approximation of a first or second order derivative operator.
We are concerned with the case in this paper. If , (LABEL:posed), in principle, can be transformed into the standardform problem (LABEL:posed) [22, 24], which is a 2norm filtering regularization problem. The solutions of (LABEL:eq1) and (LABEL:posed) can be fully analyzed by the singular value decomposition (SVD) of :
\hb@xt@.01(1.4) 
where and are orthogonal, with the singular values assumed to be simple, the superscript denotes the transpose of a matrix or vector, and denotes a zero matrix. With (LABEL:eqsvd), we have
\hb@xt@.01(1.5) 
with .
The discrete Picard condition means that, on average, the Fourier coefficients decay faster than and enables regularization to compute useful approximations to , which results in the following popular model that is used throughout Hansen’s books [22, 24] and the references therein as well as the current paper:
\hb@xt@.01(1.6) 
where is a model parameter that controls the decay rates of .
The covariance matrix of the Gaussian white noise is , and the expected value . With (LABEL:eqsvd), it holds that , so that and ; see, e.g., [22, p.701] and [24, p.412]. The noise thus affects more or less equally. With (LABEL:picard), relation (LABEL:eq4) shows that, for large singular values, is dominant relative to . Once from some onwards, the noise dominates , and the terms overwhelm for small singular values and thus must be suppressed. The transition point is such that
\hb@xt@.01(1.7) 
The truncated SVD (TSVD) method [20, 22, 24] is a reliable and commonly used method for solving small to modest sized (LABEL:posed), and it solves a sequence of problems
\hb@xt@.01(1.8) 
starting with onwards, where is a best rank approximation to with respect to the 2norm with , and ; it holds that [3, p.12], and solves (LABEL:tsvd), called the TSVD regularized solution.
Based on the above properties of the Gaussian white noise and the discrete Picard condition (LABEL:picard), we know from [22, p.701] and [24, p.71,868,95] that
\hb@xt@.01(1.9) 
and is the 2norm filtering best TSVD regularized solution to (LABEL:eq1). has the minimal error . The index plays the role of the regularization parameter. It has been observed and justified that essentially has the minimum 2norm error; see [47], [21], [22, p.10911] and [24, Sections 4.2 and 4.4]. Naturally, we can take as the standard reference when assessing the regularization ability of a 2norm filtering regularization method. A number of parameterchoice methods have been developed for finding , such as the discrepancy principle, the Lcurve criterion, and the generalized cross validation (GCV) [22, 24].
For large, the TSVD method is generally too demanding, and only iterative regularization methods are computationally viable. Krylov iterative solvers are a major class of methods that compute iterates from a sequence of low dimensional Krylov subspaces to approximate [1, 10, 14, 17, 22, 24, 38]. Of them, the CGLS or CGNR method [15, 26] and its mathematically equivalent LSQR method [42], the CGME or CGNE method [3, 4, 6, 17, 18] and the LSMR method [4, 5, 12] have been commonly used. These methods are deterministic 2norm filtering regularization methods and have been intensively studied [1, 8, 14, 17, 18, 22, 24, 27, 28], and they have general regularizing effects and exhibit semiconvergence [40, p.89]; see also [3, p.314], [4, p.733], [22, p.135] and [24, p.110]: The iterates first converge to , then the noise starts to deteriorate the iterates so that they start to diverge from and instead converge to . If we stop at the right time, then we have a regularization method, where the iteration number plays the role of the regularization parameter.
The behavior of illposed problems depends on the decay rate of . The following characterization of the degree of illposedness of (LABEL:eq1) was introduced in [29] and has been widely used [1, 10, 22, 24]: If with , , then (LABEL:eq1) is severely illposed; if , then (LABEL:eq1) is mildly or moderately illposed for or . We mention that the requirement must be met for a linear compact operator equation [19, 22].
Hanke and Hansen [19] address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult; see also [23]. Much work has been done on the regularizing effects of LSQR and CGLS; see, e.g., [11, 18, 22, 24, 27, 28, 30, 32, 33, 34, 43, 46]. It has long been known (cf. [19, 22, 23, 24]) that if the singular values of the projection matrices involved in LSQR, called the Ritz values, approximate the large singular values in natural order then the regularizing effects of LSQR are as good as those of the TSVD method and the two methods can compute 2norm filtering best possible regularized solutions with the same accuracy. As we will see clearly, the same results hold for CGME and LSMR when the singular values of projection matrices approximate the large singular values of and in this order, respectively.
If a regularized solution of (LABEL:eq1) is at least as accurate as , it is called a best possible regularized solution. If the regularized solution by an iterative solver at semiconvergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver has only the partial regularization. This definition is introduced in [30, 32]. In terms of it, a fundamental concern is: Do CGLS, LSQR, CGME and LSMR have the full or partial regularization for severely, moderately and mildly illposed problems?
For the cases that are simple and multiple, the author [32, 33, 34] has proved that, for LSQR, the Ritz values converge to the large singular values of in natural order and Lanczos bidiagonalization always generates a near best rank approximation until for the severely and moderately illposed problems with suitable and , meaning that LSQR and CGLS have the full regularization. If such desired properties fail to hold, nothing has been known on the full or partial regularization of LSQR. Nevertheless, beyond ones’ common expectation, numerical experiments on a number of 1D and 2D reallife illposed problems have demonstrated that LSQR always has the full regularization [32, 33]. Obviously, the regularization theory on LSQR is still far from complete, and its indepth exploration deserves high attention.
In this paper, we analyze the regularization of CGME and LSMR under the assumption that all the singular values are simple. We establish a number of results from several viewpoints, and prove that the regularization ability of CGME is generally inferior to that of LSQR, that is, the 2norm filtering best regularized solutions obtained by CGME at semiconvergence are generally less accurate than those obtained by LSQR. Particularly, we derive the filtered SVD expansion of CGME iterates, and prove that the semiconvergence of CGME always occurs no later than that of LSQR and can be much earlier than the latter. In the meantime, we show how to extract a rank approximation from the rank approximation to generated in CGME at iteration , which is as accurate as the rank approximation in LSQR. Based on such rank approximation at iteration , we propose a modified CGME (MCGME) method whose regularization ability is shown to be very comparable to that of LSQR. For LSMR, we present a number of results and prove that its regularizing effects are at least as good as those of LSQR and the two methods compute the 2norm filtering best regularized solutions with essentially the same accuracy. We also show that the semiconvergence of LSMR always occurs no sooner than that of LSQR.
As a windfall, motivated by our analysis approach used for CGME, we improve a fundamental result, Theorem 9.3 presented in Halko et al. [16], on the accuracy of the truncated rank SVD approximation to generated by randomized algorithms, which have formed a highly intensive topic and have been used in numerous disciplines over the years. As remarked by Halko et al. in [16] (cf. Remark 9.1 there), their bound appears “conservative, but a complete theoretical understanding lacks”. Our new bounds for the approximation accuracy are unconditionally sharper than theirs. The new bounds perfectly explain why their bound appears to be an overestimate.
The paper is organized as follows. In Section LABEL:methods, we review LSQR, CGME and LSMR. In Section LABEL:lsqr, we review some results on LSQR in [32, 33, 34] and take them as a comparison standard to assess the regularizing effects of CGME and LSMR. In Section LABEL:cgme, we derive a number of regularizing properties of CGME and propose the MCGME method. In Section LABEL:randomappro, we consider the accuracy of the truncated rank randomized SVD approximation [16] and present our sharper bounds. In Section LABEL:lsmr, we study the regularization ability of LSMR. In Section LABEL:numer, we report numerical experiments to confirm our theory. We conclude the paper in Section LABEL:conclu.
Throughout the paper, we denote by the dimensional Krylov subspace generated by the matrix and the vector , and by the bold letter the zero matrix with orders clear from the context.
2 The LSQR, CGME and LSMR algorithms
These three algorithms are all based on the Lanczos bidiagonalization process, which computes two orthonormal bases and of and for , respectively. We describe the process as Algorithm 1.
Algorithm 1: step Lanczos bidiagonalization process

Take , and define .

For

Algorithm 1 can be written in the matrix form
\hb@xt@.01(2.1)  
\hb@xt@.01(2.2) 
where denotes the th canonical basis vector of , , and
\hb@xt@.01(2.3) 
It is known from (LABEL:eqmform1) that
\hb@xt@.01(2.4) 
Algorithm 1 cannot break down before step when , are simple since is supposed to have nonzero components in the directions of . The singular values of , called the Ritz values of with respect to the left and right subspaces and , are all simple.
Write and . At iteration , LSQR [42] solves
for the iterate
\hb@xt@.01(2.5) 
where is the first canonical basis vector of , and decreases monotonically with respect to .
CGME [4, 17, 18, 27, 28] is the CG method implicitly applied to or and , and it solves the problem
for the iterate . The error norm decreases monotonically with respect to . Let be the matrix consisting of the first rows of , i.e.,
\hb@xt@.01(2.6) 
Then the CGME iterate
\hb@xt@.01(2.7) 
and with the th canonical vector of .
3 Some results on LSQR in [32, 33, 34]
From and (LABEL:yk) we have
\hb@xt@.01(3.1) 
which is the minimum 2norm solution to the problem that perturbs in (LABEL:eq1) to its rank approximation . Recall that . Analogous to (LABEL:tsvd), LSQR now solves a sequence of problems
\hb@xt@.01(3.2) 
for starting with onwards, where in (LABEL:eq1) is replaced by a rank approximation of it. Clearly, if is a near best rank approximation to with an approximate accuracy and the singular values of approximate the large in natural order for , then LSQR has the same regularization ability as the TSVD method and thus has the full regularization because (i) and are the regularized solutions to the two perturbed problems of (LABEL:eq1) that replace by two rank approximations with the same quality, respectively; (ii) and solve the two essentially same regularization problems (LABEL:tsvd) and (LABEL:lsqrreg), respectively.
If the fail to approximate the large in natural order for some , at which the semiconvergence of LSQR occurs, then we must have , and vice versa; see [32, Theorem 3.1]. In this case, although there has been no definitive conclusion on the full or partial regularization of LSQR hitherto, the analysis on the TSVD method and the Tikhonov regularization method [22, 24] clearly shows that the core requirement on a regularization method depend on its ability to acquire the dominant SVD components of and meanwhile suppress the SVD components associated with the small singular values. Therefore, the more accurate the rank approximation is and the singular values of a projection matrix are approximations to some of the large singular values of , the better the regularization ability of the method has, so that the best regularized solution obtained by it is more accurate.
Define
\hb@xt@.01(3.3) 
which measures the accuracy of the rank approximation to involved in LSQR. Since the best rank approximation satisfies , we have
Let us define a near best rank approximation to . is called a near best rank approximation to if is closer to than to :
\hb@xt@.01(3.4) 
The author [33] has derived accurate estimates for and some approximation properties of for the three kinds of illposed problems. The results have shown that, for severely and moderately illposed problems with for suitable and and for , must be a near best rank approximation to , and the Ritz values approximate the large singular values of in natural order. This means that LSQR has the full regularization for these two kinds of problems once and suitably. However, for moderately illposed problems with not enough and mildly illposed problems, is generally not a near best rank approximation, and the Ritz values do not approximate the large singular values of in natural order for some . Furthermore, for the illposed problems with and , the situation is subtle. The author [34] has shown that a near best rank approximation to does not mean the approximations of the Ritz values to the large singular values in natural order.
In particular, the author [33, Theorem 6.1] has shown that
\hb@xt@.01(3.5) 
with
\hb@xt@.01(3.6) 
\hb@xt@.01(3.7)  
\hb@xt@.01(3.8) 
These notation and results will be frequently used later.
4 The regularization of CGME
Note that . We obtain
\hb@xt@.01(4.1) 
Therefore, analogous to (LABEL:tsvd) and (LABEL:lsqrreg), CGME solves a sequence of problems
\hb@xt@.01(4.2) 
for the regularized solution starting with onwards, where in (LABEL:eq1) is replaced by a rank approximation of it.
Just as LSQR, if is a near best rank approximation to and the singular values of approximate the large ones of in natural order for , then CGME has the full regularization.
By (LABEL:eqmform1), (LABEL:eqmform2) and (LABEL:Bk), the rank approximation involved in LSQR is
\hb@xt@.01(4.3) 
By (LABEL:gammak), we have For CGME, by (LABEL:eqmform2) and (LABEL:bbar), we obtain
\hb@xt@.01(4.4) 
Therefore, is the solution to (LABEL:cgmereg) in which the rank approximation to is , whose approximation accuracy is
\hb@xt@.01(4.5) 
Theorem 4.1
For the rank approximations to , , with the definition we have
\hb@xt@.01(4.6)  
\hb@xt@.01(4.7) 
Proof. We give two proofs of the upper bound in (LABEL:cgmelowup). The first is as follows. Since , from (LABEL:eqmform2) we obtain
which is the upper bound in (LABEL:cgmelowup) by replacing the index with .
Taking in (LABEL:Bk) and augmenting such that is orthogonal, we have
\hb@xt@.01(4.8) 
where all the entries and , , of are positive, and is orthogonal. Then by the orthogonal invariance of the 2norm we obtain
\hb@xt@.01(4.9) 
with defined by (LABEL:gk1). It is straightforward to justify that the singular values of strictly interlace those of by noting that is an unreduced symmetric tridiagonal matrix, from which and the lower bound of (LABEL:cgmelowup) follows.
Based on (LABEL:cgmeapp), we can also give the second proof of the upper bound in (LABEL:cgmelowup). Observe from (LABEL:gk1) that is the matrix deleting the first row of . Applying the strict interlacing property of singular values to and , we obtain , which yields the upper bound of (LABEL:cgmelowup).
From (LABEL:cgmeapp), notice that is the matrix deleting the first row of and the first column, which is zero, of the resulting matrix. Applying the strict interlacing property of singular values to and establishes (LABEL:mono).
(LABEL:cgmelowup) indicates that is definitely a less accurate rank approximation to than in LSQR. (LABEL:mono) shows the strict monotonic decreasing property of . Moreover, keep in mind that . Then a combination of it and the results in Section LABEL:lsqr indicates that, unlike in LSQR, there is no guarantee that is a near best rank approximation to even for severely and moderately illposed problems, because simply lies between and and there do not exist any sufficient conditions on and that enforce to be closer to , let alone closer to .
In the following we investigate the approximation behavior of the singular values of . Before proceeding, it is necessary to have a closer look at Algorithm 1 and distinguish some subtleties when is rectangular, i.e., , and square, i.e., , respectively.
As we have described in Section LABEL:methods, Algorithm 1 does not break down before step . For the rectangular case , Algorithm 1 is exactly what is presented there, all the and are positive, , and we generate and at step and . As a consequence, by definition (LABEL:leftbidiag), we have
\hb@xt@.01(4.10) 
It is known from (LABEL:fulllb) that the singular values of are identical to the singular values of . Therefore, the singular values of are and zero.
For the square case , however, we must have , that is, the last row of is zero; otherwise, we would obtain an orthonormal matrix , which is impossible since is already an orthogonal matrix. After Algorithm 1 is run to completion, we have
whose singular values are exactly equal to the singular values of , which are identical to the singular values of .
By the definition (LABEL:leftbidiag) of , from (LABEL:eqmform2) and the above description, for both the rectangular and square cases we obtain
\hb@xt@.01(4.11) 
with for and for , which are unreduced symmetric tridiagonal matrices. For , the eigenvalues of are just