Some Results on the Regularization of LSQR for Large-Scale Discrete Ill-Posed ProblemsThis work was supported in part by the National Basic Research Program of China 2011CB302400 and the National Science Foundation of China (No. 11371219)

# Some Results on the Regularization of LSQR for Large-Scale Discrete Ill-Posed Problems

## Abstract

LSQR, a Lanczos bidiagonalization based Krylov subspace iterative method, and its mathematically equivalent CGLS applied to normal equations system, are commonly used for large-scale discrete ill-posed problems. It is well known that LSQR and CGLS have regularizing effects, where the number of iterations plays the role of the regularization parameter. However, it has long been unknown whether the regularizing effects are good enough to find best possible regularized solutions. Here a best possible regularized solution means that it is at least as accurate as the best regularized solution obtained by the truncated singular value decomposition (TSVD) method. In this paper, we establish bounds for the distance between the -dimensional Krylov subspace and the -dimensional dominant right singular space. They show that the Krylov subspace captures the dominant right singular space better for severely and moderately ill-posed problems than for mildly ill-posed problems. Our general conclusions are that LSQR has better regularizing effects for the first two kinds of problems than for the third kind, and a hybrid LSQR with additional regularization is generally needed for mildly ill-posed problems. Exploiting the established bounds, we derive an estimate for the accuracy of the rank approximation generated by Lanczos bidiagonalization. Numerical experiments illustrate that the regularizing effects of LSQR are good enough to compute best possible regularized solutions for severely and moderately ill-posed problems, stronger than our theory predicts, but they are not for mildly ill-posed problems and additional regularization is needed.

I
\slugger

simaxxxxxxxxx–x

ll-posed problem, regularization, Lanczos bidiagonalization, LSQR, CGLS, hybrid

{AMS}

65F22, 65J20, 15A18

## 1 Introduction

We consider the iterative solution of large-scale discrete ill-posed problems

 (1) minx∈Rn∥Ax−b∥,   A∈Rm×n, b∈Rm,  m≥n,

where the norm is the 2-norm of a vector or matrix, and the matrix is extremely ill conditioned with its singular values decaying gradually to zero without a noticeable gap. This kind of problem arises in many science and engineering areas, such as signal processing and image restoration, typically when discretizing Fredholm integral equations of the first-kind [20, 22]. In particular, the right-hand side is affected by noise, caused by measurement or discretization errors, i.e.,

 b=^b+e,

where represents the Gaussian white noise vector and denotes the noise-free right-hand side, and it is supposed that . Because of the presence of noise in and the ill-conditioning of , the naive solution of (1) is meaningless and far from the true solution , where the superscript denotes the Moore-Penrose generalized inverse of a matrix. Therefore, it is necessary to use regularization to determine a best possible approximation to [14, 18, 20, 22].

The solution of (1) can be analyzed by the SVD of :

 (2) A=U(Σ0)VT,

where and are orthogonal matrices, and the entries of the diagonal matrix are the singular values of , which are assumed to be simple throughout the paper and labelled in decreasing order . With (2), we obtain

 (3) xnaive=n∑i=1uTibσivi=n∑i=1uTi^bσivi+n∑i=1uTieσivi=xtrue+n∑i=1uTieσivi.

Throughout the paper, we assume that satisfies the discrete Picard condition: On average, the coefficients decay faster than the singular values. To be definitive, for simplicity we assume that these coefficients satisfy a widely used model in the literature, e.g., [20, p. 81, 111 and 153] and [22, p. 68]:

 (4) ∣uTi^b∣=σ1+βi,  β>0, i=1,2,…,n.

Let be the transition point such that and [22, p. 98]. Then the TSVD method computes

 xTSVDk=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩k∑i=1uTibσivi≈k∑i=1uTi^bσivi,k≤k0;k∑i=1uTibσivi≈k0∑i=1uTi^bσivi+k∑i=k0+1uTieσivi,k>k0,

which can be written as , the solution of the modified problem that replaces by its best rank approximation in (1), where , and . Remarkably, is the minimum-norm least squares solution of the perturbed problem that replaces in (1) by its best rank approximation , and the best possible TSVD solution of (1) by the TSVD method is [22, p. 98]. A number of approaches have been proposed for determining , such as discrepancy principle, discrete L-curve and generalized cross validation; see, e.g., [1, 2, 20, 27, 35] for comparisons of the classical and new ones. In our numerical experiments, we use the L-curve criterion in the TSVD method and hybrid LSQR. The TSVD method has been widely studied; see, e.g., [4, 20, 22, 29].

For a small and moderate (1), the TSVD method has been used as a general-purpose reliable and efficient numerical method for solving (1). As a result, we will take the TSVD solution as a standard reference when assessing the regularizing effects of iterative solvers and accuracy of iterates under consideration in this paper.

As well known, it is generally not feasible to compute SVD when (1) is large. In this case, one typically projects (1) onto a sequence of low dimensional Krylov subspaces and gets a sequence of iterative solutions [18, 20, 22, 37]. The Conjugate Gradient (CG) method has been used when is symmetric definite [18]. As a CG-type method applied to the semidefinite linear system or the normal equations system , the CGLS algorithm has been studied; see [6, 20, 22] and the references therein. The LSQR algorithm [33], which is mathematically equivalent to CGLS, has attracted great attention, and is known to have regularizing effects and exhibits semi-convergence (see [20, p. 135], [22, p. 110], and also [5, 19, 23, 32]): The iterates tend to be better and better approximations to the exact solution and their norms increase slowly and the residual norms decrease. In later stages, however, the noise starts to deteriorate the iterates, so that they will start to diverge from and instead converge to the naive solution , while their norms increase considerably and the residual norms stabilize. Such phenomenon is due to the fact that a projected problem inherits the ill-conditioning of (1). That is, as the iterations proceed, the noise progressively enters the solution subspace, so that a small singular value of the projected problem appears and the regularized solution is deteriorated.

As far as an iterative solver for solving (1) is concerned, a central problem is whether or not a pure iterative solver has already obtained a best possible regularized solution at semi-convergence, namely whether or not the regularized solution at semi-convergence is at least as accurate as . As it appears, for Krylov subspace based iterative solvers, their regularizing effects critically rely on how well the underlying -dimensional Krylov subspace captures the -dimensional dominant right singular subspace of . The richer information the Krylov subspace contains on the -dimensional dominant right singular subspace, the less possible a small Ritz value of the resulting projected problem appears and thus the better regularizing effects the solver has. To precisely describe the regularizing effects of an iterative solver, we introduce the term of full or partial regularization: If the iterative solver itself computes a best possible regularized solution at semi-convergence, it is said to have the full regularization; in this case, no additional regularization is needed. Here, as defined in the abstract, a best possible regularized solution means that it is at least as accurate as the best regularized solution obtained by the truncated singular value decomposition (TSVD) method. Otherwise, it is said to have the partial regularization; in this case, in order to compute a best possible regularized solution, its hybrid variant, e.g., a hybrid LSQR, is needed that combines the solver with additional regularization [5, 13, 28, 30, 31, 32], which aims to remove the effects of small Ritz values, and expand the Krylov subspace until it captures all the dominant SVD components needed and the method obtains a best possible regularized solution. The study of the regularizing effects of LSQR and CGLS has been receiving intensive attention for years; see [20, 22] and the references therein. However, there has yet been no definitive result or assertion on their full or partial regularization.

To proceed, we need the following definition of the degree of ill-posedness, which follows Hofmann’s book [24] and has been commonly used in the literature, e.g., [20, 22]: If there exists a positive real number such that the singular values satisfy , then the problem is termed as mildly or moderately ill-posed if or ; if with considerably, , then the problem is termed severely ill-posed. It is clear that the singular values of a severely ill-posed problem decay exponentially at the same rate , while those of a moderately or mildly ill-posed problem decay more and more slowly at the decreasing rate approaching one with increasing , which, for the same , is smaller for the moderately ill-posed problem than it for the mildly ill-posed problem.

Other minimum-residual methods have also gained attention for solving (1). For problems with symmetric, MINRES and its preferred variant MR-II are alternatives and have been shown to have regularizing effects [18]. When is nonsymmetric and multiplication with is difficult or impractical to compute, GMRES and its preferred variant RRGMRES are candidates [10, 30]. The hybrid approach based on the Arnoldi process was first introduced in [11], and has been studied in [9, 11, 12, 28]. Recently, Gazzola et al. [15, 16, 17, 31] have studied more methods based on the Lanczos bidiagonalization, the Arnoldi process and the nonsymmetric Lanczos process for the severely ill-posed problem (1). They have described a general framework of the hybrid methods and present Krylov-Tikhonov methods with different parameter choice strategies employed.

In this paper, we focus on LSQR. We derive bounds for the 2-norm distance between the underlying -dimensional Krylov subspace and the -dimensional right singular space. There has been no rigorous and quantitative result on the distance before. The results indicate that the -dimensional Krylov subspace captures the -dimensional dominant right singular space better for severely and moderately ill-posed problems than for mildly ill-posed problems. As a result, LSQR has better regularizing effects for the first two kinds of problems than for the third kind. By the bounds and the analysis on them, we draw a definitive conclusion that LSQR generally has only the partial regularization for mildly ill-posed problems, so that a hybrid LSQR with additional explicit regularization is needed to compute a best possible regularized solution. We also use the bounds to derive an estimate for the accuracy of the rank approximation, generated by Lanczos bidiagonalization, to , which is closely related to the regularization of LSQR. Our results help to further understand the regularization of LSQR, though they appear less sharp. In addition, we derive a bound on the diagonal entries of the bidiagonal matrices generated by the Lanczos bidigonalization process, showing how fast they decay. Numerical experiments confirm our theory that LSQR has only the partial regularization for mildly ill-posed problems and a hybrid LSQR is needed to compute best possible regularized solutions. Strikingly, the experiments demonstrate that LSQR has the full regularization for severely and moderately ill-posed problems. Our theory gives a partial support for the observed general phenomena. Throughout the paper, all the computation is assumed in exact arithmetic. Since CGLS is mathematically equivalent to LSQR, all the assertions on LSQR apply to CGLS.

This paper is organized as follows. In Section 2, we describe the LSQR algorithm, and then present our theoretical results on LSQR with a detailed analysis. In Section 3, we report numerical experiments to justify the partial regularization of LSQR for mildly ill-posed problems. We also report some definitive and general phenomena observed. Finally, we conclude the paper in Section 4.

Throughout the paper, we denote by the -dimensional Krylov subspace generated by the matrix and the vector , by the Frobenius norm of a matrix, and by the identity matrix with order clear from the context.

## 2 The regularization of LSQR

LSQR for solving (1) is based on the Lanczos bidiagonalization process, which starts with and, at step (iteration) , computes two orthonormal bases and of the Krylov subspaces and , respectively.

Define the matrices and . Then the -step Lanczos bidiagonalization can be written in the matrix form

 (5) AQk = Pk+1Bk, (6) ATPk+1 = QkBTk+αk+1qk+1eTk+1,

where denotes the -th canonical basis vector of and the quantities and denote the diagonal and subdiagonal elements of the lower bidiagonal matrix , respectively. At iteration , LSQR computes the solution with

 y(k)=argminy∈Rk∥∥b∥e1−Bky∥.

Note that . We get

 (7) x(k)=Qky(k)=∥b∥QkB†ke1=QkB†kPTk+1b.

As stated in the introduction, LSQR exhibits semi-convergence at some iteration: The iterates become better approximations to until some iteration , and the noise will dominate the after that iteration. The iteration number plays the role of the regularization parameter. However, semi-convergence does not necessarily mean that LSQR finds a best possible regularized solution as may become ill-conditioned before but does not yet contain all the needed dominant SVD components of . In this case, in order to get a best possible regularized solution, one has to use a hybrid LSQR method, as described in the introduction. The significance of (7) is that the LSQR iterates can be interpreted as the minimum-norm least squares solutions of the perturbed problems that replace in (1) by its rank approximations , whose nonzero singular values are just those of . If the singular values of approximate the large singular values of in natural order for , then LSQR must have the full regularization, and the regularized solution is best possible and is as comparably accurate as the best possible regularized solution by the TSVD method.

Hansen’s analysis [20, p. 146] shows that the LSQR iterates have the filtered SVD expansions:

 x(k)=n∑i=1f(k)iuTibσivi,

where , and are the singular values of . In our context, if we have for some , the factors , are not small, meaning that is already deteriorated and becomes a poorer regularized solution, namely, LSQR surely does not have full regularization. As a matter of fact, in terms of the best possible solution , it is easily justified that the full regularization of LSQR is equivalent to requiring that the singular values of approximate the largest singular values of in natural order for , so it is impossible to have for .

The regularizing effects of LSQR critically depend on what mainly contains and provides. Note that the eigenpairs of are the squares of singular values and right singular vectors of , and the tridiagonal matrix is the projected matrix of onto the subspace , which is obtained by applying the symmetric Lanczos tridiagonalization process to starting with [6]. We have a general claim deduced from [6, 34] and exploited widely in [20, 22]: The more information the subspace contains on the dominant right singular vectors, the more possible and accurate the Ritz values approximate the largest singular values of ; on the other hand, the less information it contains on the other right singular vectors, the less accurate a small Ritz value is if it appears. For our problem, since the small singular values of are clustered and close to zero, it is expected that a small Ritz value will show up as grows large, and it starts to appear more late when contains less information on the other right singular vectors. In this sense, we say that LSQR has better regularizing effects since contains more dominant SVD components.

Using the definition of canonical angles between the two subspaces and of the same dimension [36, p. 250], we have the following theorem, which shows how well the subspace , on which LSQR and CGLS work, captures the -dimensional dominant right singular space.

{theorem}

Let the SVD of be (2), and assume that its singular values are distinct and satisfy with . Let be the subspace spanned by the columns of , and . Then

 (8) ∥sinΘ(Vk,Vsk)∥=∥Δk∥√1+∥Δk∥2

with the matrix to be defined by (10) and

 (9) ∥Δk∥F≤σk+1σkmaxnj=k+1|uTjb|minkj=1|uTjb|√k(n−k)(1+O(e−2α)), k=1,2,…,n−1.

Proof. Let consist of the first columns of defined in (2). We see is spanned by the columns of the matrix with

 Unknown environment '%

Partition the matrices and as follows:

 D=(D100D2),   Tk=(Tk1Tk2),

where . Since is a Vandermonde matrix with distinct for , it is nonsingular. Thus, by the SVD of , we have

 Kk(ATA,ATb)=span{VDTk}=span{V([]cD1Tk1D2Tk2)}=span{V(IΔk)}

with

 (10) Δk=D2Tk2T−1k1D−11.

Define . Then and the columns of form an orthonormal basis of .

Write . By definition, we obtain

 ∥sinΘ(Vk,Vsk)∥ =∥∥∥(V⊥k)TZk(ZTkZk)−12∥∥∥ =∥∥∥(V⊥k)TV(IΔk)(I+ΔTkΔk)−12∥∥∥ =∥Δk(I+ΔTkΔk)−12∥=∥Δk∥√1+∥Δk∥2,

which proves (8) and indicates that is monotonically increasing with respect to .

We next estimate . We have

 ∥Δk∥ Missing or unrecognized delimiter for \right (11) =σk+1σkmaxnj=k+1|vTjb|minkj=1|vTjb|∥∥Tk2T−1k1∥∥F.

So we need to estimate . It is easily justified that the -th column of consists of the coefficients of the Lagrange polynomial

 L(k)i(λ)=k∏j=1,j≠iσ2j−λσ2j−σ2i

that interpolates the elements of the -th canonical basis vector at the abscissas . Consequently, the -th column of is

 Tk2T−1k1e(k)i=(L(k)i(σ2k+1),…,L(k)i(σ2n))T,

from which we obtain

 (12) Tk2T−1k1=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝L(k)1(σ2k+1)L(k)2(σ2k+1)…L(k)k(σ2k+1)L(k)1(σ2k+2)L(k)2(σ2k+2)…L(k)k(σ2k+2)⋮⋮⋮L(k)1(σ2n)L(k)2(σ2n)…L(k)k(σ2n)⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠.

Since is monotonic for , it is bounded by . Furthermore, let . Then for and we have

 |L(k)i(0)| ≤|L(k)i0(0)|=k∏j=1,j≠i0∣∣ ∣∣σ2jσ2j−σ2i0∣∣ ∣∣=i0−1∏j=1σ2jσ2j−σ2i0⋅k∏j=i0+1σ2jσ2i0−σ2j =i0−1∏j=111−O(e−2(i0−j)α)k∏j=i0+11O(e2(j−i0)α)−1 =i0−1∏j=111−O(e−2(i0−j)α)k∏j=i0+111−O(e−2(j−i0)α)1k∏j=i0+1O(e2(j−i0)α) (13) =(1+i0∑j=1O(e−2jα))(1+k−i0+1∑j=1O(e−2jα))k∏j=i0+1O(e2(j−i0)α)

by absorbing those higher order terms into . Note that in the above numerator we have

 1+i0∑j=1O(e−2jα)=1+O(i0∑j=1e−2jα)=1+O(e−2α1−e−2α(1−e−2i0α)),

and

 1+k−i0+1∑j=1O(e−2jα)=1+O(k−i0+1∑j=1e−2jα)=1+O(e−2α1−e−2α(1−e−2(k−i0+1)α)).

It is then easily seen that their product is

 1+O(2e−2α1−e−2α)+O((e−2α1−e−2α)2)=1+O(2e−2α1−e−2α)=1+O(e−2α).

On the other hand, by definition, the denominator in (13) is exactly one for , and it is strictly bigger than one for . Therefore, for any , we have . From this and (12) it follows that

 ∥∥Tk2T−1k1∥∥F≤√k∥∥Tk2T−1k1e(k)k∥∥≤√k(n−k)|L(k)i0(0)|=√k(n−k)(1+O(e−2α)).

Therefore, for and considerably, from (11) we have

 (14) ∥Δk∥F ≤σk+1σkmaxnj=k+1|uTjb|minkj=1|uTjb|√k(n−k)(1+O(e−2α)).\endproof

Remark 2.1   We point out that (9) should not be sharp. As we have seen from the proof, the factor seems intrinsic and unavoidable, but the factor in (9) is an overestimate and can certainly be reduced. (14) is an overestimate since for not near to is considerably smaller than , but we replace all them by their maximum . In fact, our derivation clearly illustrates that the smaller is, the smaller than .

Recall the discrete Picard condition (4). Then

 (15) ck=maxnj=k+1|uTjb|minkj=1|uTjb|=maxnj=k+1(|uTj^b+uTje|)minkj=1(|uTj^b+uTje|)≈σ1+βk+1+|uTk+1e|σ1+βk+|uTke|.

We observe that almost remains constant for . For , note that all the almost remain the same. Thus, we have , meaning that does not capture as well as it does for .

Remark 2.2   The theorem can be extended to moderately ill-posed problems with the singular values considerably and not big since, in a similar manner to the proof of Theorem 2, we can obtain by the first order Taylor expansion

 |L(k)i0(0)| ≈|L(k)k(0)|=k−1∏j=1σ2jσ2j−σ2k =k−1∏j=111−O((jk)2α)≈1+k−1∑j=1O((jk)2α)=O(1),

which, unlike for severely ill-posed problems, depends on and increases slowly with for considerably. However, for mildly ill-posed problems, from above we have considerably for .

Remark 2.3  A combination of (8) and (9) and the above analysis indicate that captures better for severely ill-posed problems than for moderately ill-posed problems. There are two reasons for this. The first is that the factors are basically fixed constants for severely ill-posed problems as increases, and they are smaller than the counterparts for moderately ill-posed problems unless the degree of its ill-posedness is far bigger than one and small. The second is that the factor is smaller for severely ill-posed problems than the factor for moderately ill-posed problems for the same .

Remark 2.4   The situation is fundamentally different for mildly ill-posed problems: Firstly, we always have substantially for and any , which is considerably bigger than for moderately ill-posed problems for the same . Secondly, defined by (15) is closer to one than that for moderately ill-posed problems for . Thirdly, for the same noise level and , we see from the discrete Picard condition (4) and the definition of that is bigger for a mildly ill-posed problem than that for a moderately ill-posed problem. All of them show that captures considerably better for severely and moderately ill-posed problems than for mildly ill-posed problems for . In other words, our results illustrate that contains more information on the other right singular vectors for mildly ill-posed problems, compared with severely and moderately ill-posed problems. The bigger , the more it contains. Therefore, captures more effectively for severely and moderately ill-posed problems than mildly ill-posed problems. That is, contains more information on the other right singular vectors for mildly ill-posed problems, making the appearance of a small Ritz value more possible before and LSQR has better regularizing effects for the first two kinds of problems than for the third kind. Note that LSQR, at most, has the full regularization, i.e., there is no Ritz value smaller than for , for severely and moderately ill-posed problems. Our analysis indicates that LSQR generally has only the partial regularization for mildly ill-posed problem and a hybrid LSQR should be used.

Remark 2.5   Relation (9) and indicate that captures better for severely ill-posed problems than for moderately ill-posed problems. There are two reasons for this. First, the all the are basically a fixed constant for severely ill-posed problems, which is smaller than those ratios for moderately ill-posed problems unless is rather big and small. Second, the quantities for severely ill-posed problems are smaller than the corresponding for moderately ill-posed problems.

Let us investigate more and get insight into the regularization of LSQR. Define

 (16) γk=∥∥A−Pk+1BkQTk∥∥,

which measures the quality of the rank approximation to . Based on (9), we can derive the following estimate for .

{theorem}

Assume that (1) is severely or moderately ill posed. Then

 (17) σk+1≤γk≤σk+1+σ1∥sinΘ(Vk,Vsk)∥.

Proof. Let be the best rank approximation to with respect to the 2-norm, where , and . Since is of rank , the lower bound in (17) is trivial by noting that . We now prove the upper bound. From (5), we obtain

 ∥∥A−Pk+1BkQTk∥∥ =∥∥A−AQkQTk∥∥.

It is easily known that with having orthonormal columns. Then by the definition of we obtain

 ∥∥A−AQkQTk∥∥ =∥∥(A−UkΣkVTk+UkΣkVTk)(I−QkQTk)∥∥ ≤∥∥(A−UkΣkVTk)(I−QkQTk)∥∥+∥∥UkΣkVTk(I−QkQTk)∥∥ ≤σk+1+∥Σk∥∥∥VTk(I−QkQTk)∥∥ =σk+1+σ1∥sinΘ(Vk,Vsk)∥.\endproof

Numerically, it has been extensively observed in the literature that the decay as fast as and, more precisely, for severely ill-posed problems; see, e.g., [3, 16, 17]. They mean that the are very good rank approximations to . Recall that the TSVD method generates the best regularized solution . As a result, if , the LSQR iterate is reasonably close to the TSVD solution for is reasonably small. This means that LSQR has the full regularization and does not need any additional regularization to improve . As our experiments will indicate in detail, these observed phenomena are of generality for both severely and moderately ill-posed problems and thus should have strong theoretical supports. Compared to the observations, our (17) appears to be a considerrable overestimate.

We next present some results on appearing in (6). If , the Lanczos bidiagonalization process terminates, and we have found exact singular triples of [26]. In our context, since has only simple singular values and has components in all the left singular vectors, early termination is impossible in exact arithmetic, but small is possible. We aim to investigate how fast decays. We first give a refinement of a result in [17].

{theorem}

Let be the SVD of , where and are orthogonal, and , and define and . Then

 (18) A~Vk−~UkΘk = 0, (19) ∥∥AT~Uk−~VkΘTk∥∥ =