Estimation of the Regularization Parameter in Linear Discrete IllPosed Problems Using the Picard parameter^{†}^{†}thanks: This research was supported by the Israel Science Foundation (grant No. 132/14) and by Rosa and Emilio Segré Research Award.
Abstract
Accurate determination of the regularization parameter in inverse problems still represents an analytical challenge, owing mainly to the considerable difficulty to separate the unknown noise from the signal. We present a new approach for determining the parameter for the generalform Tikhonov regularization of linear illposed problems. In our approach the parameter is found by approximate minimization of the distance between the unknown noiseless data and the data reconstructed from the regularized solution. We approximate this distance by employing the Picard parameter to separate the noise from the data in the coordinate system of the generalized SVD. A simple and reliable algorithm for the estimation of the Picard parameter enables accurate implementation of the above procedure. We demonstrate the effectiveness of our method on several numerical examples^{1}^{1}1A MATLABbased implementation of the proposed algorithms can be found at https://www.weizmann.ac.il/condmat/superc/software/.
Key words. illposed problem, inverse problem, generalized SVD, Picard parameter, Tikhonov regularization, regularization parameter
AMS subject classifications. 65R30, 65R32, 65F22
1 Introduction
The Tikhonov regularization method [34] is one of the most widely applied methods for solution of linear illposed problems. It is well known that the accuracy of the solution obtained using the Tikhonov regularization method depends crucially on the chosen regularization parameter. This parameter is often obtained using either one of the following methods  the Generalized CrossValidation (GCV) [37, 11], Lcurve [15, 21], Quasioptimality [1, 2], Stein’s Unbiased Risk Estimate (SURE) [32, 30], or other methods. However, none of the abovementioned methods consistently finds a nearoptimal regularization parameter for all test problems and noise realizations in our numerical examples. In particular for rankdeficient problems the abovementioned methods tend to produce solutions that significantly differ from the true solutions.
We can state the problem formally as follows. Given an illconditioned matrix and vector contaminated by noise, we solve the linear system
(1.2) 
Linear discrete illposed problems of the form LABEL:eq:Ax=b arise in a variety of settings, including the discretization of Fredholm integral equations of the first kind [23, 12, 5, 35, 16], image deblurring problems [20, 8, 40, 4, 27], machine learning algorithms [3, 36, 31, 6, 39] and more. The method of Tikhonov regularization replaces the original illposed problem LABEL:eq:Ax=b with a minimization problem
(1.3) 
where is the norm, is a regularization matrix and is a regularization parameter. The problem LABEL:eq:Pbx is said to be in standard form if , where is the identity matrix, and in general form if [22, 13, 10]. The development of an accurate and reliable method for determination of the regularization parameter is the main subject of this paper.
The present work is closely related to that of O’Leary [26] and Taroudaki and O’Leary [33], who suggest a method for nearoptimal estimation of the regularization parameter for standardform Tikhonov regularization. The essence of this method is to determine the regularization parameter by approximate minimization of the meansquare error (MSE)
(1.4) 
where is the leastsquares solution of , is the unperturbed data vector and is the regularized solution of LABEL:eq:Pbx. Since the vector is not known in practice, the authors of [33] approximate the MSE LABEL:eq:MSEdefn by separating signal from noise using the coordinates of the perturbed data with respect to the basis of the left singular vectors of , termed the Fourier coefficients of . Specifically, the authors of [33] demonstrate the existence of an index they term the ’Picard parameter’, which separates noise dominated coefficients of from clean ones. Using the Picard parameter, the SVD expansion of the MSE is split into two parts  one containing the information about the unperturbed data and another containing the noise. The first part is replaced with its expected value, while the second part is rewritten in terms of the known Fourier coefficients of the data. The Picard parameter is either estimated manually in [33], representing the index where the plot of the Fourier coefficients of the data levels off, or, in the case where the noise is Gaussian, by the Lilliefors test, as we explain in LABEL:sec:PictInd.
Although the method developed in [33, 26] provides accurate results in a large percentage of cases and is shown to be competitive with standard methods, it holds two significant limitations. First, it does not allow the use of which is necessary in many applications in order to incorporate various desirable properties in the solution [7, 14, 22, 13]. In particular, is often chosen to be the discrete approximation of a derivative operator to control various degrees of smoothness of the solution. The second, and arguably more important limitation is that the method is inaccurate for some noise realizations due to inaccurate estimation of the Picard parameter. This stems from the algorithm’s reliance on applying statistical tests to noisy sequences, which often give inconsistent results.
In this paper we strive to overcome the above limitations. To handle the case , we replace the SVD of used in [33, 26] with the generalized singular value decomposition (GSVD) of the pair [28, 18]. It is significantly more difficult, however, to minimize the MSE LABEL:eq:MSEdefn in the GSVD basis [33, sect. 2]. For this reason, we replace the MSE with the predictive meansquare error (PMSE)
(1.5) 
where is the data reconstructed from the regularized solution, termed the ’predicted data’. For simplicity, we assume that the unperturbed system is consistent, so that , implying that the PMSE can be written as
(1.6) 
The PMSE LABEL:eq:PMSEdefn has a simple expansion in terms of the GSVD basis, but it does not measure the error in the solution directly as the MSE. In principle, it is therefore possible that an algorithm successfully minimizing the PMSE will produce a suboptimal solution with high MSE in some problems. Nevertheless, the PMSE has approximately the same minimizer as the MSE in a variety of settings [25, 9] and the two minimizers were shown to coincide under certain assumptions [38]. To the best of our knowledge however, full characterization of the cases in which the minimizers of the MSE and PMSE are equal is unavailable. Therefore, we also provide a method for approximately minimizing the MSE as in [33], but for the more general case of . We then show that the expansion of the MSE in terms of the GSVD is numerically unstable and hence the accuracy of its approximation is limited. It is therefore advantageous to minimize the PMSE in cases where its minimizer is known to be close to that of the MSE, as it can be approximated better and hence leads to a better choice of .
To determine , we write the PMSE in terms of the GSVD of under very relaxed assumptions about the sizes and ranks of the matrices involved in LABEL:eq:Pbx. Next, we approximate by estimating the Picard parameter, splitting the GSVD expansion and modifying the noisedependent terms as in [33, 26]. The regularization parameter is found by minimization of the resulting approximation of . We term this procedure for determination of the Series Splitting (SS) method.
As an alternative to the SS method, approximate minimization of LABEL:eq:PMSEdefn can be performed using a more general twostep approach. Particularly, we can obtain an approximation by applying an accurate filter based on the Picard parameter to , substitute for in LABEL:eq:PMSEdefn and minimize the resulting norm . Our implementation of the filter employs the Picard parameter to remove the noisedominated components of the data in the GSVD coordinate system. The advantage of this approach is its generalizability to other filters and regularization methods beside Tikhonov LABEL:eq:Pbx, as it requires only a data filter and an algorithm for calculating the regularized solution given a regularization parameter. We term this method the Data Filtering (DF) approach.
According to our numerical examples, the accuracy of the Picard parameter estimation algorithm [33] is somewhat limited. We suggest that this can be significantly improved by a simple modification. Specifically, we propose to test the sequence of the Fourier coefficients in an order reverse to the one proposed in [33] and at a higher confidence level. The performance of this algorithm can also be improved by providing upper and lower bounds on the Picard parameter, to limit the number of required tests. However, in spite of the improvement in accuracy, the algorithm remains prone to errors due to its reliance on noisy series of the Fourier coefficient of the data. To altogether avoid the dependence on noisy sequences, we proceed a step further to propose a new method that relies on averages of the squared moduli of the Fourier coefficients. We prove that the sequence of these averages decreases with increasing index of the Fourier coefficients, eventually converging to the value of the noise variance at the Picard parameter. In the resulting setting both the noise variance and the Picard parameter can be estimated reliably by detecting the levelling off of the above sequence of averages.
The SS method is closely related to the SURE and GCV methods, both of which approximately minimize LABEL:eq:PMSEdefn. In contrast to the SS and DF methods, GCV and SURE do not split the sum in the GSVD expansion of LABEL:eq:PMSEdefn. Moreover, we show that both of them rely on replacing the whole sum in the expansion of LABEL:eq:PMSEdefn with its expected value, which results in an approximation less accurate than the one that could be achieved using the SS and DF methods. We provide a detailed comparison of the methods in a series of numerical examples.
The structure of this paper is as follows. In LABEL:sec:prob we formulate the problem of the Tikhonov regularization and solve it using the generalized singular value decomposition (GSVD). In LABEL:sec:minPMSE, we develop the SS and DF methods to approximately minimize the PMSE and the algorithms for estimation of the Picard parameter. In LABEL:sec:others, we discuss the SURE and GCV methods. Finally, in LABEL:sec:numEx we present the results of the numerical simulations.
2 Formulation of the problem
We solve the linear illposed problem (LABEL:eq:Ax=b) by the Tikhonov regularization using a general regularization matrix . Throughout the paper, we make the following assumptions:

The problem LABEL:eq:Pbx has a unique solution for any . This implies , where denotes the nullspace of a matrix (see [18, sect. 5.1.1]).

The data vector is perturbed by an additive noise so , and the components of are independent random variables taken from a normal distribution with zero mean and constant variance .

The unperturbed system is consistent, so .

The generalized singular values of decay to zero with no significant gap. The smallest generalized singular values cluster at (machine) zero. This property is common for discrete illposed problems  see [18, sect. 2.1.2].

The problem satisfies the discrete Picard condition [14].

The minimizers of the MSE LABEL:eq:MSEdefn and the PMSE LABEL:eq:PMSEdefn are close to one another. This has been demonstrated in numerous numerical experiments such as [25, 9] and can be proved analytically for certain problems, as was done in [38] and [37, Sect. 8.4]. However, to the best of our knowledge, no general characterization of cases in which the two minimizers are close is available. Therefore, for the completeness of this presentation, we provide a method for minimization of the MSE, in addition to the one minimizing the PMSE.
The Tikhonov minimization problem LABEL:eq:Pbx is equivalent to the normal equation
(2.7) 
where denotes the conjugate transpose, thereby yielding the Tikhonov solution as
(2.8) 
We can express LABEL:eq:TikhSolnUseless in a more convenient form, using the GSVD [28] of the pair . To do so, let , and . Using these definitions, the GSVD of the matrices and is given by
(2.9) 
where

, are unitary,

is invertible,

and are real diagonal matrices (note that are not the singular values of ),

, are zero matrices,

and are and identity matrices, respectively.
Note that the above zero and identity matrices can be empty. The values are arranged in decreasing order and in increasing order so that
(2.10) 
The pairs satisfy the identity
(2.11) 
The quantities are the generalized singular values of the pair . According to LABEL:eq:SingValArrange the sequence is arranged in decreasing order. Note, however, that the restriction of and to does not apply to .
To relate the parameters and to the ranks of and , we observe that
(2.12) 
where and are diagonal matrices of the following form 
(2.13) 
Since and constitute diagonalizations of and , it follows that the ranks of and are equal to the number of nonzero elements in and , respectively, yielding the relations
(2.14) 
For simplicity, we denote the columns of the matrices and by and , respectively. We also drop the indices from the column notations when referring to the entire column. The Fourier coefficients of the data and of the noise, with respect to the basis , are denoted by and respectively. Using these definitions and the decomposition LABEL:eq:GSVD, the Tikhonov solution can be written as
(2.15) 
If is nonsingular, the generalized singular values are the regular singular values of [18, sect. 2.1.2], [15]. In particular, if the SVD of is given by
(2.16) 
where , and the matrices , and are obtained from the GSVD LABEL:eq:GSVD. Furthermore, denoting the columns of by it is easy to show that for and for . Thus, when our expression for the Tikhonov solution LABEL:eq:TikhSolnDisc and the Fourier coefficients and coincide with the ones given in [33, 26].
The factors in LABEL:eq:TikhSolnDisc can be viewed as filters applied to the noisy data coefficients , since they satisfy and , and thereby dampen the coefficients for large . While these coefficients correspond to the noise, the coefficients for small correspond to the true data and remain almost unchanged. In general, as discussed in [33], the coefficients can be replaced with more general filter factors , with a similar dampening effect. While we shall focus on the Tikhonov filters in this paper, all our subsequent derivations can be easily generalized to arbitrary filter factors such as those considered in [33].
3 Estimation of the regularization parameter
In this section we consider the problem of choosing a nearoptimal value of the regularization parameter in LABEL:eq:TikhSolnDisc. The existence of such a value of is guaranteed by the discrete Picard condition [14], [18, sect. 4.5], [33, sect. 2.1], which requires the sequence to decay faster than the generalized singular values . Since according to LABEL:assump3, we have from some index on, the discrete Picard condition implies the existence of an index called the Picard parameter such that , or equivalently, for all . This property of the Picard parameter is used below to approximate the PMSE.
3.1 The Series Splitting method
We assess the quality of by measuring the distance LABEL:eq:PMSEdefn between the unperturbed data and the predicted data
(3.17) 
We can rewrite LABEL:eq:PMSEdefn as
(3.18)  
where
(3.19) 
is the squared residual norm,
(3.20) 
and denotes the real part.
Noting that the term in LABEL:eq:PMSEnRewrite is independent of and can therefore be neglected, we find that it is sufficient to minimize
(3.21) 
A direct evaluation of is not possible since the function depends on the unknown noise vector . Nonetheless, we can approximate accurately using the Picard parameter [33, 26]. Recalling that the Picard parameter is the smallest index for which is satisfied for all , we can split the sequence into two parts  the first part , which contains the information about the unperturbed data and the second part , which contains the noise. Therefore, when we can approximate the unknown term in LABEL:eq:TN by . When however, the coefficients and differ significantly, so we choose to approximate the term in LABEL:eq:TN by replacing it with its expected value. Denoting the expected value by , and noting that where is not random, we can deduce using LABEL:assump2 that
(3.22)  
see [20, sect. 6.6], [33, 26]. Therefore, for given and we can approximate by splitting the series LABEL:eq:TN similarly to [33] to obtain
(3.23) 
The regularization parameter is then found by minimizing LABEL:eq:gl using the approximation of given by LABEL:eq:TnApprox.
We can show that is limited to the interval and therefore it is always possible to split as in LABEL:eq:TnApprox. To justify the lower bound, we note that the nullspace of is spanned by the vectors , as they constitute a set of linearly independent vectors satisfying . Since the vectors are smooth (by LABEL:assumpNL) and has a typical smoothing effect [18, p. 21], are also smooth and satisfy for . Therefore, the smooth vector is wellrepresented by vectors , while the nonsmooth noise vector is represented mostly by with . Thus, we have for , implying that for and so, . To justify the upperbound, we note that is the last generalized singular value of which is numerically nonzero and, by LABEL:assump3, where is the machine zero. Thus, by the discrete Picard condition, and we can conclude that . If as in [33, 26], we have and therefore only the upper bound is nontrivial. The SS algorithm is summarized in LABEL:alg:SSAlg.
3.2 Approximate minimization of the MSE
As discussed above, our approach approximately minimizes the PMSE LABEL:eq:PMSEdefn, in contrast to the approach of [33, 26] which minimizes the MSE LABEL:eq:MSEdefn assuming . To illustrate the difference between the two approaches and to give an alternative method of solution for problems whose PMSE and MSE minimizers may not coincide, we repeat the above derivation for the MSE and approximate it in the general case, .
We begin with the observation that , the leastsquares solution to can be expressed in the GSVD basis as
(3.24) 
We can rewrite LABEL:eq:TrueSolnGSVD as
(3.25) 
where is the leastsquares solution of the perturbed problem LABEL:eq:Ax=b obtained by substituting in LABEL:eq:TikhSolnDisc and
(3.26) 
is the leastsquares solution for the pure noise problem . The MSE can then be expanded as
(3.27) 
where
(3.28) 
and
(3.29)  
The first term in LABEL:eq:MSEexpand, , can be readily evaluated while the second term, , can be dropped entirely as it does not depend on . The third term, cannot be evaluated as it depends on the coefficients of the unknown noise vector as shown in LABEL:eq:DN and must therefore be approximated.
To approximate we first rewrite the sums in LABEL:eq:DN as
(3.30)  
When , we can approximate the coefficients of the noise, , by the coefficients of the data, , so that . However, when we approximate terms involving in LABEL:eq:DNsplit by replacing them with their expected value as we do for in LABEL:eq:TnApprox. The main difference is that LABEL:eq:TnApprox, in addition to the case for which as in LABEL:eq:Expect1, contains cross terms with . However these terms can be neglected since LABEL:assump2 implies
(3.31) 
Consequently, we can drop the first and third sum in LABEL:eq:DNsplit, approximate as
(3.32) 
and estimate the minimum of the MSE in LABEL:eq:MSEexpand by minimizing
(3.33) 
The problem with this approach is that LABEL:eq:residualNormRecSpace and LABEL:eq:DNapprox are numerically unstable due to the division by for . Specifically, due to the illconditioning of , the values decay quickly to zero^{2}^{2}2More precisely, LABEL:assump4 requires , but since and by LABEL:eq:SingValNormalize, we have . so that . Therefore, terms that include division by for large and completely dominate the value of and, due to finite machine precision, eliminate the contribution of the terms with small and . This contrasts with the fact that for a desirable choice of , the solution should be smooth and its error should therefore depend significantly on terms with . To circumvent this problem, we drop the terms with in the sums LABEL:eq:residualNormRecSpace and LABEL:eq:DNapprox, similarly to [33], so that
(3.34) 
(3.35) 
and minimize , thereby retaining only terms in which is relatively large. Note that if , the above method of minimizing becomes identical to Algorithm 1 in [33] (implemented with the Tikhonov filter factors).
In spite of the resulting numerical stability of the above scheme upon dropping of terms with , we lose information that might have improved the accuracy of the approximation of the MSE if there was no instability. In contrast, no such division by is necessary for minimization of LABEL:eq:PMSEdefn, in which case we can retain all the terms upon the series splitting in LABEL:eq:TnApprox. Thus, we expect our approximation LABEL:eq:gl of norm (1.4) to yield a better estimate of compared to the approximation minimizing , where there is no discrepancy between the minimal of MSE and PMSE. This situation is quite common, as it is shown in [37, Sect. 8.4] and [38] that the PMSE in LABEL:eq:PMSEdefn and the MSE LABEL:eq:MSEdefn have approximately the same minimizer in a variety of settings, and numerical results supporting this are available in e.g. [25, 9]. We therefore focus below on minimizing the PMSE and not the MSE, and demonstrate in the numerical examples of LABEL:sec:numEx that this approach yields superior results.
3.3 Estimating the Picard parameter and the variance of the noise
We begin with a brief discussion of the methods for estimation of the Picard parameter , suggested in [33]. The Picard parameter can be graphically deduced from the plot of the sequence versus . Specifically, due to the discrete Picard condition, the plot of is expected to decay on average with increasing index and to leveloff at the Picard parameter. This levellingoff can be found manually from the plot, see [33, sect. 2.2]. The drawback of this approach is that due to the significant variance of plot the point at which the plot levelsoff cannot be unambiguously determined. In order to reduce this ambiguity, as well as to automate the method, it is suggested in [33, sect. 2.3] to use the Lilliefors test for normality on subsequences of . Specifically, this method sets to the smallest index for which the sequence is dominated by Gaussian noise. By applying the Lilliefors test at 95% confidence to the sequences for , the Picard parameter is chosen to be the smallest index after which the test fails 10 consecutive times. If the test fails immediately at , it is proposed to set and , which signifies that the data is noiseless. Alternative tests that assume distributions different from the Gaussian distribution can be utilized in a similar way [33]. Once is found, the variance can be estimated as the sample variance of the sequence using the expression
(3.36) 
where we note that, according to LABEL:assump2, the noise terms have a zero mean, and therefore the mean of the sequence is negligible.
One can improve the accuracy of the method described above by initializing the estimate of to its lower bound and applying the Lilliefors test to the sequences for at a 99.9% confidence level. The value of is set to the smallest index for which the Lilliefors test indicates that the sequence is normally distributed. If the test fails for , we set and . Once is estimated, we can find the variance using LABEL:eq:estS2. This modified algorithm is summarized in LABEL:alg:PicIndLilAlg.
The dependence of the estimation of the Picard parameter upon statistical tests can be avoided using the following new method. This method is based on an averaging of the Fourier coefficients, which reduces the variance of the sequence (see LABEL:fig:VxVSWx(a)), enabling more reliable automatic detection of the levellingoff of these coefficients. We note first that the sequence decreases on average until it levelsoff at and oscillates about with a nonnegligible variance. To show this, we observe that
(3.37)  
Therefore, due to the discrete Picard condition, the expected value must decrease on average with increasing and become constant at for . While the actual curve of deviates from its expected value, these deviations are random and thus the curve of oscillates about its expected value. This implies that the general trend of to decrease on average and to leveloff also applies to , which, in its turn, justifies the graphical method of [33]. However, instead of flattening at as , the curve of oscillates about for with a nonnegligible variance given by
(3.38) 
where is the fourth moment of the noise distribution. For the derivation of LABEL:eq:VarOfBeta we again use for . In particular, for Gaussian noise we have and therefore . Thus, due to the significant variance of the sequence , any estimation of from the levellingoff of is prone to error. Therefore, instead of , we consider the sequence of averages given by
(3.39) 
Since the sequence decays on average, so does the sequence . To demonstrate this, we note that
(3.40) 
and since decays on average, so does , as follows from the inequality
(3.41) 
In addition, LABEL:eq:ExpVk implies that for and therefore the sequence levelsoff at and oscillates about , similar to . However, the variance of for is significantly smaller compared to that of , making its curve significantly flatter and more suitable for the estimation of and . Specifically, the variance of for is given by
(3.42) 
and for the Gaussian noise it is
(3.43) 
Since the variance decays as the curve of remains practically flat for a wide range of indices .
In LABEL:fig:VxVSWx we illustrate the difference between the sequences and for estimation of the Picard parameter. Even though both sequences decrease on average until they leveloff and oscillate about , the plot of shown in LABEL:fig:VxVSWx(a) remains almost constant for , whereas shown in LABEL:fig:VxVSWx(b) oscillates with a significant variance. Due to these oscillations the exact point at which the plot of levelsoff cannot be unambiguously determined. In contrast, in LABEL:fig:VxVSWx(a) flattens almost completely and therefore the point at which it levelsoff is easily found to be . Note that the nonzero variance of becomes significant at about , where .
To estimate from the flatness of we suggest the following simple rule. We set to the smallest index for which the relative change in is small enough, so that
(3.44) 
for some bound , where we require in LABEL:eq:PicIndCond1. If does not satisfy LABEL:eq:PicIndCond1 for any we set and . The upper bound of arises because it should satisfy , as discussed in LABEL:sec:minPMSE, and due to LABEL:eq:PicIndCond1. Once is found using LABEL:eq:PicIndCond1, we estimate the variance as which follows from LABEL:eq:estS2.
Condition LABEL:eq:PicIndCond1 depends on two free parameters, and . Step size must be large enough to ensure that the flattening of is not due to random oscillations. However, it also needs to be small enough, , so as not to exclude the flat part of from consideration. Similarly, the value of has to be small enough to detect the levelling off of but large enough to account for its small but nonzero variance. In practice however, we found the criterion LABEL:eq:PicIndCond1 to be robust to the choice of and . We summarize the above procedure for estimation of and in LABEL:alg:PicIndAlg. Note that the estimate of from LABEL:alg:PicIndAlg is used as input into LABEL:alg:SSAlg of LABEL:sec:SS.