On Regularizing Effects of MINRES and MR-II for Large-Scale Symmetric Discrete Ill-Posed Problems
For large scale symmetric discrete ill-posed problems, MINRES and MR-II are often used iterative regularization solvers. We call a regularized solution best possible if it is at least as accurate as the best regularized solution obtained by the truncated singular value decomposition (TSVD) method. In this paper, we analyze their regularizing effects and establish the following results: (i) the filtered SVD expression are derived for the regularized solutions by MINRES; (ii) a hybrid MINRES that uses explicit regularization within projected problems is needed to compute a best possible regularized solution to a given ill-posed problem; (iii) the th iterate by MINRES is more accurate than the th iterate by MR-II until the semi-convergence of MINRES, but MR-II has globally better regularizing effects than MINRES; (iv) bounds are obtained for the 2-norm distance between an underlying -dimensional Krylov subspace and the -dimensional dominant eigenspace. They show that MR-II has better regularizing effects for severely and moderately ill-posed problems than for mildly ill-posed problems, and a hybrid MR-II is needed to get a best possible regularized solution for mildly ill-posed problems; (v) bounds are derived for the entries generated by the symmetric Lanczos process that MR-II is based on, showing how fast they decay. Numerical experiments confirm our assertions. Stronger than our theory, the regularizing effects of MR-II are experimentally shown to be good enough to obtain best possible regularized solutions for severely and moderately ill-posed problems.
Keywords: Symmetric ill-posed problem, regularization, partial regularization, full regularization, semi-convergence, MR-II, MINRES, LSQR, hybrid.
Mathematics Subject Classifications (2010): 65F22, 65J20, 15A18.
Consider the large scale discrete linear ill-posed problem
where is symmetric and extremely ill conditioned with its singular values decaying gradually to zero without a noticeable gap. This kind of problem arises in many science and engineering areas . The right-hand side is noisy and typically affected by a white noise, caused by measurement, modeling or discretization errors, i.e.,
where represents a white noise vector and denotes the noise-free right-hand side, and it is supposed that . Because of the presence of noise in and the high ill-conditioning of , the naive solution of (1.1) is far from the true solution and meaningless. Therefore, one needs to use regularization to determine a regularized solution so that it is close to as much as possible [8, 10].
For symmetric, its SVD is closely related to its spectral decomposition as follows:
where and are orthogonal, whose columns are the left and right singular vectors of , respectively, the diagonal matrix with the singular values labeled as , is a signature matrix such that with the the eigenvalues of , and . Obviously, with the sign depending on . With (1.2), we can express the naive solution of (1.1) as
Throughout the paper, we assume that satisfies the discrete Picard condition [8, 10]: On average, the coefficients decay faster than the singular values . This is a necessary hypothesis that controls the size of and makes regularization possible to find meaningful approximations to [8, 10]. For the sake of simplicity, we assume that they satisfy a widely used model [8, 10]:
where with and the first columns of and , respectively, , and the Moore-Penrose generalized inverse of a matrix. Obviously, is the minimum 2-norm least squares solution of the perturbed problem that replaces in (1.1) by its best rank approximation .
Let denote the transition point such that and , which divides the eigenvalues or equivalently the singular values into the dominant or large ones for and the small ones for . It is known from [8, p. 176] and [10, p. 86-88] that the best TSVD regularized solution is , which consists of the dominant SVD components of , i.e., the dominant spectral components corresponding to the first large eigenvalues in magnitude when is symmetric. A number of approaches have been proposed for determining , such as discrepancy principle, the discrete L-curve criterion and the generalized cross validation (GCV); see, e.g., [1, 4, 8, 15, 21] for comparisons of the classical and new ones. In our numerical experiments, we do this using the L-curve criterion in the TSVD method and Krylov iterative methods.
The TSVD method is important in its own right and plays a central role in analyzing the standard-form Tikhonov regularization [8, 10]. It and the standard-form Tikhonov regularization expand their solutions in the basis vectors and produce very similar solutions with essentially the minimum 2-norm error; see [8, p. 109-111] and [10, Sections 4.2 and 4.4]. Therefore, the TSVD method can get a best regularized solution to (1.1), and it has long been used as a general-purpose reliable and efficient numerical method for solving a small and/or moderate sized (1.1) [8, 10]. As a result, we will take the TSVD solution as a standard reference when assessing the regularizing effects of iterative solvers and accuracy of iterates under consideration in this paper.
For (1.1) is large, it is generally impractical to compute the spectral decomposition of . In this case, one typically solves it iteratively via some Krylov subspace methods [8, 10]. For (1.1) with a general matrix , the mathematically equivalent LSQR  and CGLS  have been most commonly used for years and have been shown to have intrinsic regularizing effects [6, 8, 10]. They exhibit the semi-convergence; see , [8, p. 135], [10, p. 110]: The iterates tend to be better and better approximations to the exact solution and their norms increase slowly and the residual norms decrease. In later stages, however, the noise starts to deteriorate the iterates, so that they will start to diverge from and instead converge to , while their norms increase considerably and the residual norms stabilize. For LSQR, the semi-convergence is due to the fact that the projected problem at some iteration starts to inherit the ill-conditioning of (1.1), that is, the noise progressively enters the solution subspace, so that a small singular value of the projected problem appears and the regularized solution is deteriorated [8, 10].
As far as an iterative solver for solving (1.1) is concerned, a central problem is whether or not it has already obtained a best possible regularized solution at semi-convergence. Here, as defined in the abstract, a best possible regularized solution means that it is at least as accurate as the best TSVD solultion . This problem has been intensively studied but has had no definitive solutions. For Krylov iterative solvers, their regularizing effects critically rely on how well the underlying -dimensional Krylov subspace captures the dominant right singular vectors of [8, 10]. The richer information the Krylov subspace contains on the dominant right singular vectors, the less possible it is that the resulting projected problem has a small singular value. That is, the solvers capture the large SVD components of more effectively, and thus have better regularizing effects.
To precisely measure the regularizing effects, we introduce the term of full or partial regularization. If a pure iterative solver itself computes a best possible regularized solution at semi-convergence, it is said to have the full regularization; in this case, no additional regularization is necessary. Otherwise, it is said to have the partial regularization; in this case, a sophisticated hybrid variant is needed that combines the solver with some additional regularization in order to improve the accuracy of the regularized solution by the iterative solver at semi-convergence [8, 10]. It appears that the regularizing effects are closely related to the degree of ill-posedness of the problem. To this end, we introduce the following definition of the degree of ill-posedness, which follows Hofmann’s book  and has been commonly used in the literature, e.g., [8, 10]: If there exists a positive real number such that the singular values satisfy , the problem is termed as mildly or moderately ill-posed if or ; if with considerably, , the problem is termed severely ill-posed. Clearly, the singular values of a severely ill-posed problem decay exponentially at the same rate , while those of a moderately or mildly ill-posed problem decay more and more slowly at the decreasing rate approaching one with increasing , which, for the same , is smaller for the moderately ill-posed problem than it for the mildly ill-posed problem.
For symmetric, its -dimensional dominant eigenspace is identical to the -dimensional dominant left and right singular subspaces. In this case, MINRES and its variant MR-II are natural alternatives to LSQR and CGLS [5, 7, 10]. MR-II was originally designed for solving singular and inconsistent linear systems, and it uses the starting vector and restricts the resulting Krylov subspace to the range of . Thus, the iterates are orthogonal to the null space of , and MR-II computes the minimum 2-norm least squares solution [3, 5]. For (1.1), we are not interested in such solution but a regularized solution that is close to as much as possible. MINRES and MR-II have been shown to have regularizing effects and exhibit semi-convergence [11, 14, 16], and MR-II usually provides better regularized solutions than MINRES. Intuitively, this is because the noise in the initial Krylov vector is filtered by multiplication with [6, 14]. Different implementations associated with MR-II have been studied [5, 17].
In this paper, we first prove that the MINRES iterates are filtered SVD solutions, similar to the form of the LSQR iterates. Based on this result, we show why MINRES, in general, has only the partial regularization, independent of the degree of ill-posedness of (1.1). As a result, a hybrid MINRES that combines MINRES with a regularization method applied to the lower dimensional projected problems should be used to compute a best possible regularized solution; see [10, Section 6.4] for details. Afterwards, we take a closer look at the regularization of MINRES and MR-II in more detail, which, from a new perspective, shows that MINRES has only the partial regularization. We prove that, though MR-II has globally better regularizing effects than MINRES, the th MINRES iterate is always more accurate than the th MR-II iterate until the semi-convergence of MINRES. In a manner different from those used in [14, 16], we then analyze the regularizing effects of MR-II and draw some definitive conclusions. We establish bounds for the 2-norm distance between the underlying -dimensional Krylov subspace and the -dimensional dominant eigenspace. The bounds indicate that the Krylov subspace better captures the -dimensional dominant eigenspace for severely and moderately ill-posed problems than for mildly ill-posed problems. As a consequence, MR-II has better regularizing effects for the first two kinds of problems than for the third kind, for which MR-II has only the partial regularization. We then use the results to derive an estimate for the accuracy of the rank approximation generated by MR-II to . Finally, we derive estimates for the entries generated by the symmetric Lanczos process that MR-II is based on, and show how fast they decay.
The paper is organized as follows. In Section 2, we describe MINRES and MR-II. In Section 3, we prove that the MINRES iterates are filtered SVD solutions, followed by an analysis on the regularizing effects of MINRES. In Section 4, we compare the regularizing effects of MINRES and MR-II, and shed light on some new features of them. In Section 5, we present our theoretical results on MR-II with a detailed analysis. In Section 6, we numerically confirm our theory that MINRES has only the partial regularization for a general ill-posed problem and its hybrid variant is needed. Also, we experimentally illustrate that MR-II has the full regularization for severely and moderately ill-posed problems, which is stronger than our theory, and it has the partial regularization for mildly ill-posed problems. We also compare MR-II with LSQR, demonstrating that MR-II is as effective as and at least twice as efficient as LSQR. We conclude the paper in Section 7.
Throughout the paper, we denote by the -dimensional Krylov subspace generated by the matrix and the vector , by and the 2-norm of a matrix or vector and the Frobenius norm of a matrix, respectively, and by the identity matrix with order clear from the context.
2 MINRES and MR-II
MINRES  is based on the symmetric Lanczos process that constructs an orthonormal basis of the Krylov subspace . Let . The -step symmetric Lanczos process can be written in the matrix form
where has orthonormal columns which form , and is a tridiagonal matrix with its leading submatrix symmetric.
At iteration , MINRES solves for the iterate with
where is the first canonical vector of dimension . For our analysis purpose, it is important to write
which is the minimum 2-norm least squares solution of the perturbed problem that replace in (1.1) by its rank approximation .
MR-II  is a variant of MINRES applied to which excludes the noisy . The method is based on the -step symmetric Lanczos process
where has orthonormal columns with , is a tridiagonal matrix with the diagonals , the subdiagonals and the superdiagonals , and the first rows of is symmetric. The columns of form an orthonormal basis of . Mathematically, since the eigenvalues of are simple and has nonzero components in the directions of all the eigenvectors of , the Lanczos process can be run to steps without breakdown, i.e., and .
At iteration , MR-II solves for the iterate with
Similar to (2.2), we have the expression
which is the minimum 2-norm least squares solution of the perturbed problem that replace in (1.1) by its rank approximation .
The significance of (2.2) and (2.5) lies that the MINRES and MR-II iterates are the minimum 2-norm least squares solutions of the perturbed problems that replace in (1.1) by its rank approximations and , respectively, whose nonzero singular values are just those of and , respectively. If the singular values of or approximate the large singular values of in natural order for , then and are near best rank approximations to with accuracy similar to that of the best rank approximation . If this is the case, MINRES and MR-II must have the full regularization, and and are best possible regularized solutions and are as accurate as the best TSVD regularized solution .
3 The regularizing effects of MINRES
Similar to the CGLS and LSQR iterates [8, p. 146], we can establish the following result on the MINRES iterates.
For MINRES to solve (1.1) with the starting vector , the th iterate has the form
where the filters with , and , are the harmonic Ritz values of with respect to and labeled as .
Proof. From , the residual of the MINRES iterate can be written as
where the residual polynomial has the form
with the the harmonic Ritz values of with respect to . From (3.2), we get
Substituting into the above gives
Relation (3.1) shows that the MINRES iterate has a filtered SVD expansion. For a general symmetric , the harmonic Ritz values have an attractive feature: they usually favor extreme eigenvalues of , provided that a Krylov subspace contains substantial information on all the eigenvectors . In our current context, if at least a small harmonic Ritz value in magnitude starts to appear for some , i.e., , the corresponding filter factors , , are not small, meaning that is already deteriorated. On the other hand, if no small harmonic Ritz value in magnitude appears before , the are expected to become better approximations to until . Unfortunately, since includes the noisy , which contains non-negligible components of corresponding to small eigenvalues , it is generally possible that a small harmonic Ritz value can appear for . This demonstrates that, in general, MINRES only has the partial regularization and cannot obtain a best possible regularized solution.
4 Regularization relationships between MINRES and MR-II
It was known a long time ago that MR-II has better regularizing effects than MINRES, that is, MR-II obtains a more accurate regularized solution than MINRES does . Such phenomenon is simply due to the fact that for MINRES includes the noisy and for MR-II contains less information on corresponding to small eigenvalues in magnitude since the noise in the starting vector is filtered by multiplication with . Previously, we have given an analysis on the regularizing effects of MINRES and shown that a hybrid MINRES is generally needed for an ill-posed problem, independent of the degree of ill-posedness of (1.1). Next we shed more light on the regularization of MINRES, compare it with MR-II, and reveal some new features of them.
To simplify our discussions, without loss of generality, we can well assume that for a standard nonsingular linear system, the smaller residual, the more accurate the approximate solution is. Given the residual minimization property of MINRES and MR-II, one might be confused that, since , the th MINRES iterate should be at least as accurate as the th MR-II iterate . This is true for solving the standard linear system where the right-hand side is supposed to be exact, but it is nontrivial and depends for solving an ill-posed problem, for which the is noisy and we are concerned with regularized approximations to the true solution other than the naive solution . Our previous analysis has shown that a small harmonic Ritz value generally appears for MINRES before some iteration , causing that MINRES has only the partial regularization. On the other hand, however, note that the regularized solutions by MINRES converge to until the semi-convergence of MINRES. As a result, because , the th MINRES iterate is more accurate than the th MR-II iterate only until the semi-convergence of MINRES.
We can also explain the partial regularization of MINRES in terms of the rank approximation to as follows: Since the -dimensional dominant eigenspace of is identical to its -dimensional dominant left and right singular subspaces, contains substantial information on all the . As a result, it is generally possible that the projected matrix has a singular value smaller than for some . This means that is a poor rank approximation to , causing, from (2.1), that is generally large, i.e., is already deteriorated. Conversely, if no singular value of is smaller than and the semi-convergence of MINRES does not yet occur, the MINRES iterate should be at least as accurate as the MR-II iterate because of .
In summary, we need to use a hybrid MINRES with the TSVD method or the standard-form Tikhonov regularization applied to the projected problem in (2.1) to expand the Krylov subspace until it contains all the dominant spectral components and a best regularized solution is found, in which the additional regularization aims to remove the effects of small singular values of , similar to the hybrid LSQR see [10, Section 6.4].
5 Regularizing effects of MR-II
Before proceeding, we point out that, unlike (3.1) for the MINRES iterates , we have found that the MR-II iterates do not have filtered SVD expansions of similar form. Even so, we can establish a number of other results that help to better understand the regularization of MR-II. We first investigate a fundamental problem: how well does the underlying subspace capture the dimensional dominant eigenspace of ? This problem is of basic importance because it critically affects the accuracy of as a rank approximation to .
In terms of the definition of canonical angles between the two subspaces and of the same dimension [22, p. 250], we present the following result.
Proof. Note that is spanned by the columns of the matrix with
Partition and as follows:
where . Since is a Vandermonde matrix with distinct for , it is nonsingular. Noting , we have
Define . Then , and the columns of form an orthonormal basis of .
Write . By definition, we obtain
which proves (5.1).
We next estimate and establish upper bound for the right-hand side of (5.1). We have
We now estimate . It is easily justified that the th column of consists of the coefficients of the Lagrange polynomial
that interpolates the elements of the th canonical basis vector at the abscissas . Consequently, the th column of is
from which we obtain
For a fixed satisfying , let . Then we have
Therefore, for and , making use of Taylor series expansions, we get
by absorbing those higher order terms into . Note that in the above numerator we have
It is easy to check that for any the product of the above three terms is no more than
By definition, the factor in the denominator of (5.7), which is exactly one when , and it is bigger than one when ; the other factor is between and . Therefore, for any and , we have
From this estimate and (5.5) it follows that
As a result, for , from (5.4) we have
Remark 5.1 Trivially, we have
But in our context it is impossible to have since is not a zero matrix. We have seen from the proof that the factor in it is intrinsic and unavoidable in (5.2). But the factor in (5.2) is an overestimate and can certainly be reduced. The reason is that (5.10) is an overestimate since for not near to is considerably smaller than , but we replace all them by the maximum . In fact, our derivation before (5.8) and (5.9) when replacing by clearly illustrates that the smaller is, the smaller than , .
Recall the discrete Picard condition (1.4). Then the coefficients