Regularizing Effects of the Krylov Iterative Solvers CGME and LSMR For Linear Discrete Ill-Posed Problems with an Application to Truncated Randomized SVDs††thanks: This work was supported in part by the National Science Foundation of China (No. 11771249)
For the large-scale linear discrete ill-posed problem or with contaminated by Gaussian white noise, there are four commonly used Krylov solvers: LSQR and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to , CGME, the CG method applied to or with , and LSMR, the minimal residual (MINRES) method applied to . These methods have intrinsic regularizing effects, where the number of iterations plays the role of the regularization parameter. In this paper, we analyze the regularizing effects of CGME and LSMR and establish a number of results, which include the filtered SVD expansion of CGME iterates and prove that the 2-norm filtering best possible regularized solutions by CGME are less accurate those by LSQR and the 2-norm filtering best regularized solutions by LSMR are at least as accurate as those obtained by LSQR. We also prove that the semi-convergence of CGME always occurs no later than that of LSQR and the semi-convergence of LSMR always occurs no sooner than that of LSQR. In the meantime, using the analysis approach for CGME, we substantially improve a fundamental result on the accuracy of the truncated rank approximate SVD of generated by randomized algorithms. Numerical experiments confirm our theory on CGME and LSMR.
Key words. Discrete ill-posed, rank approximations, TSVD solution, semi-convergence, regularized solution, regularizing effects, CGME, LSMR, LSQR, CGLS
AMS subject classifications. 65F22, 15A18, 65F10, 65F20, 65R32, 65J20, 65R30
1 Introduction and Preliminaries
Consider the linear discrete ill-posed problem
where the norm is the 2-norm of a vector or matrix, and is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. Since the results in this paper hold for both the overdetermined () and underdtermined () cases, we simply assume that . (LABEL:eq1) typically arises from the discretization of the first kind Fredholm integral equation
where the kernel and are known functions, while is the unknown function to be sought. If is non-degenerate and satisfies the Picard condition, there exists the unique square integrable solution ; see [10, 22, 24, 38]. Here for brevity we assume that and belong to the same set with . Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others; see, e.g., [1, 9, 10, 24, 36, 37, 38, 40, 48]. The right-hand side is assumed to be contaminated by a Gaussian white noise , caused by measurement, modeling or discretization errors, where is noise-free and . Because of the presence of noise and the extreme ill-conditioning of , the naive solution of (LABEL:eq1) bears no relation to the true solution , where denotes the Moore-Penrose inverse of a matrix. Therefore, one has to use regularization to extract a best possible approximation to .
For a Gaussian white noise , throughout the paper, we always assume that satisfies the discrete Picard condition with some constant for arbitrarily large [1, 13, 20, 21, 22, 24, 37]. Without loss of generality, assume that . Then a dominating regularization approach is to solve the problem
with slightly [22, 24], where is a regularization matrix and its suitable choice is based on a-prior information on . Typically, is either the identity matrix or the scaled discrete approximation of a first or second order derivative operator.
We are concerned with the case in this paper. If , (LABEL:posed), in principle, can be transformed into the standard-form problem (LABEL:posed) [22, 24], which is a 2-norm filtering regularization problem. The solutions of (LABEL:eq1) and (LABEL:posed) can be fully analyzed by the singular value decomposition (SVD) of :
where and are orthogonal, with the singular values assumed to be simple, the superscript denotes the transpose of a matrix or vector, and denotes a zero matrix. With (LABEL:eqsvd), we have
The discrete Picard condition means that, on average, the Fourier coefficients decay faster than and enables regularization to compute useful approximations to , which results in the following popular model that is used throughout Hansen’s books [22, 24] and the references therein as well as the current paper:
where is a model parameter that controls the decay rates of .
The covariance matrix of the Gaussian white noise is , and the expected value . With (LABEL:eqsvd), it holds that , so that and ; see, e.g., [22, p.70-1] and [24, p.41-2]. The noise thus affects more or less equally. With (LABEL:picard), relation (LABEL:eq4) shows that, for large singular values, is dominant relative to . Once from some onwards, the noise dominates , and the terms overwhelm for small singular values and thus must be suppressed. The transition point is such that
starting with onwards, where is a best rank approximation to with respect to the 2-norm with , and ; it holds that [3, p.12], and solves (LABEL:tsvd), called the TSVD regularized solution.
and is the 2-norm filtering best TSVD regularized solution to (LABEL:eq1). has the minimal error . The index plays the role of the regularization parameter. It has been observed and justified that essentially has the minimum 2-norm error; see , , [22, p.109-11] and [24, Sections 4.2 and 4.4]. Naturally, we can take as the standard reference when assessing the regularization ability of a 2-norm filtering regularization method. A number of parameter-choice methods have been developed for finding , such as the discrepancy principle, the L-curve criterion, and the generalized cross validation (GCV) [22, 24].
For large, the TSVD method is generally too demanding, and only iterative regularization methods are computationally viable. Krylov iterative solvers are a major class of methods that compute iterates from a sequence of low dimensional Krylov subspaces to approximate [1, 10, 14, 17, 22, 24, 38]. Of them, the CGLS or CGNR method [15, 26] and its mathematically equivalent LSQR method , the CGME or CGNE method [3, 4, 6, 17, 18] and the LSMR method [4, 5, 12] have been commonly used. These methods are deterministic 2-norm filtering regularization methods and have been intensively studied [1, 8, 14, 17, 18, 22, 24, 27, 28], and they have general regularizing effects and exhibit semi-convergence [40, p.89]; see also [3, p.314], [4, p.733], [22, p.135] and [24, p.110]: The iterates first converge to , then the noise starts to deteriorate the iterates so that they start to diverge from and instead converge to . If we stop at the right time, then we have a regularization method, where the iteration number plays the role of the regularization parameter.
The behavior of ill-posed problems depends on the decay rate of . The following characterization of the degree of ill-posedness of (LABEL:eq1) was introduced in  and has been widely used [1, 10, 22, 24]: If with , , then (LABEL:eq1) is severely ill-posed; if , then (LABEL:eq1) is mildly or moderately ill-posed for or . We mention that the requirement must be met for a linear compact operator equation [19, 22].
Hanke and Hansen  address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult; see also . Much work has been done on the regularizing effects of LSQR and CGLS; see, e.g., [11, 18, 22, 24, 27, 28, 30, 32, 33, 34, 43, 46]. It has long been known (cf. [19, 22, 23, 24]) that if the singular values of the projection matrices involved in LSQR, called the Ritz values, approximate the large singular values in natural order then the regularizing effects of LSQR are as good as those of the TSVD method and the two methods can compute 2-norm filtering best possible regularized solutions with the same accuracy. As we will see clearly, the same results hold for CGME and LSMR when the singular values of projection matrices approximate the large singular values of and in this order, respectively.
If a regularized solution of (LABEL:eq1) is at least as accurate as , it is called a best possible regularized solution. If the regularized solution by an iterative solver at semi-convergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver has only the partial regularization. This definition is introduced in [30, 32]. In terms of it, a fundamental concern is: Do CGLS, LSQR, CGME and LSMR have the full or partial regularization for severely, moderately and mildly ill-posed problems?
For the cases that are simple and multiple, the author [32, 33, 34] has proved that, for LSQR, the Ritz values converge to the large singular values of in natural order and Lanczos bidiagonalization always generates a near best rank approximation until for the severely and moderately ill-posed problems with suitable and , meaning that LSQR and CGLS have the full regularization. If such desired properties fail to hold, nothing has been known on the full or partial regularization of LSQR. Nevertheless, beyond ones’ common expectation, numerical experiments on a number of 1D and 2D real-life ill-posed problems have demonstrated that LSQR always has the full regularization [32, 33]. Obviously, the regularization theory on LSQR is still far from complete, and its in-depth exploration deserves high attention.
In this paper, we analyze the regularization of CGME and LSMR under the assumption that all the singular values are simple. We establish a number of results from several viewpoints, and prove that the regularization ability of CGME is generally inferior to that of LSQR, that is, the 2-norm filtering best regularized solutions obtained by CGME at semi-convergence are generally less accurate than those obtained by LSQR. Particularly, we derive the filtered SVD expansion of CGME iterates, and prove that the semi-convergence of CGME always occurs no later than that of LSQR and can be much earlier than the latter. In the meantime, we show how to extract a rank approximation from the rank approximation to generated in CGME at iteration , which is as accurate as the rank approximation in LSQR. Based on such rank approximation at iteration , we propose a modified CGME (MCGME) method whose regularization ability is shown to be very comparable to that of LSQR. For LSMR, we present a number of results and prove that its regularizing effects are at least as good as those of LSQR and the two methods compute the 2-norm filtering best regularized solutions with essentially the same accuracy. We also show that the semi-convergence of LSMR always occurs no sooner than that of LSQR.
As a windfall, motivated by our analysis approach used for CGME, we improve a fundamental result, Theorem 9.3 presented in Halko et al. , on the accuracy of the truncated rank SVD approximation to generated by randomized algorithms, which have formed a highly intensive topic and have been used in numerous disciplines over the years. As remarked by Halko et al. in  (cf. Remark 9.1 there), their bound appears “conservative, but a complete theoretical understanding lacks”. Our new bounds for the approximation accuracy are unconditionally sharper than theirs. The new bounds perfectly explain why their bound appears to be an overestimate.
The paper is organized as follows. In Section LABEL:methods, we review LSQR, CGME and LSMR. In Section LABEL:lsqr, we review some results on LSQR in [32, 33, 34] and take them as a comparison standard to assess the regularizing effects of CGME and LSMR. In Section LABEL:cgme, we derive a number of regularizing properties of CGME and propose the MCGME method. In Section LABEL:randomappro, we consider the accuracy of the truncated rank randomized SVD approximation  and present our sharper bounds. In Section LABEL:lsmr, we study the regularization ability of LSMR. In Section LABEL:numer, we report numerical experiments to confirm our theory. We conclude the paper in Section LABEL:conclu.
Throughout the paper, we denote by the dimensional Krylov subspace generated by the matrix and the vector , and by the bold letter the zero matrix with orders clear from the context.
2 The LSQR, CGME and LSMR algorithms
These three algorithms are all based on the Lanczos bidiagonalization process, which computes two orthonormal bases and of and for , respectively. We describe the process as Algorithm 1.
Algorithm 1: -step Lanczos bidiagonalization process
Take , and define .
Algorithm 1 can be written in the matrix form
where denotes the -th canonical basis vector of , , and
It is known from (LABEL:eqmform1) that
Algorithm 1 cannot break down before step when , are simple since is supposed to have nonzero components in the directions of . The singular values of , called the Ritz values of with respect to the left and right subspaces and , are all simple.
Write and . At iteration , LSQR  solves
for the iterate
where is the first canonical basis vector of , and decreases monotonically with respect to .
for the iterate . The error norm decreases monotonically with respect to . Let be the matrix consisting of the first rows of , i.e.,
Then the CGME iterate
and with the -th canonical vector of .
From and (LABEL:yk) we have
which is the minimum 2-norm solution to the problem that perturbs in (LABEL:eq1) to its rank approximation . Recall that . Analogous to (LABEL:tsvd), LSQR now solves a sequence of problems
for starting with onwards, where in (LABEL:eq1) is replaced by a rank approximation of it. Clearly, if is a near best rank approximation to with an approximate accuracy and the singular values of approximate the large in natural order for , then LSQR has the same regularization ability as the TSVD method and thus has the full regularization because (i) and are the regularized solutions to the two perturbed problems of (LABEL:eq1) that replace by two rank approximations with the same quality, respectively; (ii) and solve the two essentially same regularization problems (LABEL:tsvd) and (LABEL:lsqrreg), respectively.
If the fail to approximate the large in natural order for some , at which the semi-convergence of LSQR occurs, then we must have , and vice versa; see [32, Theorem 3.1]. In this case, although there has been no definitive conclusion on the full or partial regularization of LSQR hitherto, the analysis on the TSVD method and the Tikhonov regularization method [22, 24] clearly shows that the core requirement on a regularization method depend on its ability to acquire the dominant SVD components of and meanwhile suppress the SVD components associated with the small singular values. Therefore, the more accurate the rank approximation is and the singular values of a projection matrix are approximations to some of the large singular values of , the better the regularization ability of the method has, so that the best regularized solution obtained by it is more accurate.
which measures the accuracy of the rank approximation to involved in LSQR. Since the best rank approximation satisfies , we have
Let us define a near best rank approximation to . is called a near best rank approximation to if is closer to than to :
The author  has derived accurate estimates for and some approximation properties of for the three kinds of ill-posed problems. The results have shown that, for severely and moderately ill-posed problems with for suitable and and for , must be a near best rank approximation to , and the Ritz values approximate the large singular values of in natural order. This means that LSQR has the full regularization for these two kinds of problems once and suitably. However, for moderately ill-posed problems with not enough and mildly ill-posed problems, is generally not a near best rank approximation, and the Ritz values do not approximate the large singular values of in natural order for some . Furthermore, for the ill-posed problems with and , the situation is subtle. The author  has shown that a near best rank approximation to does not mean the approximations of the Ritz values to the large singular values in natural order.
In particular, the author [33, Theorem 6.1] has shown that
These notation and results will be frequently used later.
4 The regularization of CGME
Note that . We obtain
Therefore, analogous to (LABEL:tsvd) and (LABEL:lsqrreg), CGME solves a sequence of problems
for the regularized solution starting with onwards, where in (LABEL:eq1) is replaced by a rank approximation of it.
Just as LSQR, if is a near best rank approximation to and the singular values of approximate the large ones of in natural order for , then CGME has the full regularization.
By (LABEL:eqmform1), (LABEL:eqmform2) and (LABEL:Bk), the rank approximation involved in LSQR is
By (LABEL:gammak), we have For CGME, by (LABEL:eqmform2) and (LABEL:bbar), we obtain
Therefore, is the solution to (LABEL:cgmereg) in which the rank approximation to is , whose approximation accuracy is
For the rank approximations to , , with the definition we have
Proof. We give two proofs of the upper bound in (LABEL:cgmelowup). The first is as follows. Since , from (LABEL:eqmform2) we obtain
which is the upper bound in (LABEL:cgmelowup) by replacing the index with .
Taking in (LABEL:Bk) and augmenting such that is orthogonal, we have
where all the entries and , , of are positive, and is orthogonal. Then by the orthogonal invariance of the 2-norm we obtain
with defined by (LABEL:gk1). It is straightforward to justify that the singular values of strictly interlace those of by noting that is an unreduced symmetric tridiagonal matrix, from which and the lower bound of (LABEL:cgmelowup) follows.
Based on (LABEL:cgmeapp), we can also give the second proof of the upper bound in (LABEL:cgmelowup). Observe from (LABEL:gk1) that is the matrix deleting the first row of . Applying the strict interlacing property of singular values to and , we obtain , which yields the upper bound of (LABEL:cgmelowup).
From (LABEL:cgmeapp), notice that is the matrix deleting the first row of and the first column, which is zero, of the resulting matrix. Applying the strict interlacing property of singular values to and establishes (LABEL:mono).
(LABEL:cgmelowup) indicates that is definitely a less accurate rank approximation to than in LSQR. (LABEL:mono) shows the strict monotonic decreasing property of . Moreover, keep in mind that . Then a combination of it and the results in Section LABEL:lsqr indicates that, unlike in LSQR, there is no guarantee that is a near best rank approximation to even for severely and moderately ill-posed problems, because simply lies between and and there do not exist any sufficient conditions on and that enforce to be closer to , let alone closer to .
In the following we investigate the approximation behavior of the singular values of . Before proceeding, it is necessary to have a closer look at Algorithm 1 and distinguish some subtleties when is rectangular, i.e., , and square, i.e., , respectively.
As we have described in Section LABEL:methods, Algorithm 1 does not break down before step . For the rectangular case , Algorithm 1 is exactly what is presented there, all the and are positive, , and we generate and at step and . As a consequence, by definition (LABEL:leftbidiag), we have
It is known from (LABEL:fulllb) that the singular values of are identical to the singular values of . Therefore, the singular values of are and zero.
For the square case , however, we must have , that is, the last row of is zero; otherwise, we would obtain an orthonormal matrix , which is impossible since is already an orthogonal matrix. After Algorithm 1 is run to completion, we have
whose singular values are exactly equal to the singular values of , which are identical to the singular values of .
By the definition (LABEL:leftbidiag) of , from (LABEL:eqmform2) and the above description, for both the rectangular and square cases we obtain
with for and for , which are unreduced symmetric tridiagonal matrices. For , the eigenvalues of are just