Gradient-based methods for sparse recovery ††thanks: October 25, 2009. This material is based upon work supported by the National Science Foundation under Grant 0619080.
The convergence rate is analyzed for the SpaSRA algorithm (Sparse Reconstruction by Separable Approximation) for minimizing a sum where is smooth and is convex, but possibly nonsmooth. It is shown that if is convex, then the error in the objective function at iteration , for sufficiently large, is bounded by for suitable choices of and . Moreover, if the objective function is strongly convex, then the convergence is -linear. An improved version of the algorithm based on a cycle version of the BB iteration and an adaptive line search is given. The performance of the algorithm is investigated using applications in the areas of signal processing and image reconstruction.
AMS subject classifications. 90C06, 90C25, 65Y20, 94A08
Key words. SpaRSA, ISTA, sparse recovery, sublinear convergence, linear convergence, image reconstruction, denoising, compressed sensing, nonsmooth optimization, nonmonotone convergence, BB method
In this paper we consider the following optimization problem
where is a smooth function, and is convex. The function , usually called the regularizer or regularization function, is finite for all , but possibly nonsmooth. An important application of (LABEL:main), found in the signal processing literature, is the well-known problem (called basis pursuit denoising in )
where (usually ), , , and is the -norm.
Recently, Wright, Nowak, and Figueiredo  introduced the Sparse Reconstruction by Separable Approximation algorithm (SpaRSA) for solving (LABEL:main). The algorithm has been shown to work well in practice. In  the authors establish global convergence of SpaRSA. In this paper, we prove an estimate of the form for the error in the objective function when is convex. If the objective function is strongly convex, then the convergence of the objective function and the iterates is at least R-linear. A strategy is presented for improving the performance of SpaRSA based on a cyclic Barzilai-Borwein step [8, 9, 13, 19] and an adaptive choice  for the reference function value in the line search. The paper concludes with a series of numerical experiments in the areas of signal processing and image reconstruction.
Throughout the paper denotes the gradient of , a row vector. The gradient of , arranged as a column vector, is . The subscript often represents the iteration number in an algorithm, and stands for . denotes , the Euclidean norm. is the subdifferential at , a set of row vectors. If , then
for all .
2 The SpaRSA algorithm
The SpaRSA algorithm, as presented in , is as follows:
|Sparse Reconstruction by Separable Approximation (SpaRSA)|
|Given , , , and starting guess .|
In other words, at iteration , is the maximum of the most recent values for the objective function. Note that if , then
Hence, is a stationary point.
The overall structure of the SpaRSA algorithm is closely related to that of the Iterative Shrinkage Thresholding Algorithm (ISTA) [6, 10, 12, 16, 23]. ISTA, however, employs a fixed choice for related to the Lipschitz constant for , while SpaRSA employs a nonmonotone line search. A sublinear convergence result for a monotone line search version of ISTA is given by Beck and Teboulle  and by Nesterov . In Section LABEL:convergence we give a sublinear convergence result for the nonmonotone SpaRSA, while Section LABEL:strong_convexity gives a linear convergence result when the objective function is strongly convex.
In  it is shown that the line search in Step 2 terminates for a finite when is Lipschitz continuously differentiable. Here we weaken this condition by only requiring Lipschitz continuity over a bounded set.
Let be the level set defined by
We make the following assumptions:
The level set is contained in the interior of a compact, convex set , and is Lipschitz continuously differentiable on .
is convex and is finite for all .
If , then there exists with the property that
whenever where is obtained as in Step of SpaRSA.
Proof. Let be defined by
where . Since is a strongly convex quadratic, its level sets are compact, and the minimizer in Step 2 exists. Since is the minimizer of , we have
This is rearranged to obtain
where . Taking norms yields
By Theorem 23.4 and Corollary 24.5.1 in  and by the compactness of , there exists a constant , independent of , such that . Consequently, we have
Since is compact and lies in the interior of , the distance from to the boundary of is positive. Choose so that . Hence, when , since .
Let denote the Lipschitz constant for on and suppose that . Since and , we have . Moreover, due to the convexity of , the line segment connecting and lies in . Proceeding as in , a Taylor expansion around yields
Adding to both sides, we have
Hence, the proposition holds with
Suppose . In Step 2 of SpaRSA, is chosen so that . Hence, there exists such that . In other words, if the hypothesis “” of Proposition LABEL:StepProposition is satisfied at step , then a choice for exists which satisfies this hypothesis at step .
We now show that the GLL reference value satisfies the condition of Proposition LABEL:StepProposition for each . The condition is a trivial consequence of the definition of . Also, by the definition, we have . For , according to Step 2 of SpaRSA. Hence, is a decreasing function of . In particular, .
3 Convergence estimate for convex functions
In this section we give a sublinear convergence estimate for the error in the objective function value assuming is convex and the assumptions of Proposition LABEL:StepProposition hold.
By (A1) and (A2), (LABEL:main) has a solution and an associated objective function value . The convergence of the objective function values to is a consequence of the analysis in :
If (A1) and (A2) hold and for every , then
Proof. By [24, Lemma 4], the objective function values approach a limit denoted . By [24, Theorem 1], all accumulation points of the iterates are stationary points. An accumulation point exists since is compact and the iterates are all contained in , as shown in Remark LABEL:GLL_OK. Since and are both convex, a stationary point is a global minimizer of . Hence, .
Our sublinear convergence result is the following:
If (A1) and (A2) hold, is convex, and for all , then there exist constants and such that
for sufficiently large.
Proof. By (LABEL:phiPhi) with replaced by , we have
where . Since minimizes and is convex, it follows that
where is the terminating value of at step . Combining (LABEL:h1) and (LABEL:h3) gives
where is an upper bound for the implied by Proposition LABEL:StepProposition. By the convexity of and with for any , we have
where . Combining this with (LABEL:h4) yields
for any . Define
and let denote the index where the maximum is attained. Since in Step 2 of SpaRSA, it follows that is a nonincreasing function of . By (LABEL:h5) with and by the monotonicity of , we have
for any . Since both and lie in , it follows that
Step 2 of SpaRSA implies that
where . We take and again exploit the monotonicity of to obtain
Combining (LABEL:phi_i)–(LABEL:s_k_i) gives
for every , The minimum on the right side is attained with the choice
As a consequence of Lemma LABEL:phi_k_converge, converges to . Hence, the minimizing also approaches 0 as tends to . Choose large enough that the minimizing is less than 1. It follows from (LABEL:phi_no_sk) that for this minimizing choice of , we have
Define . Subtracting from each side of (LABEL:h6) gives
We arrange this to obtain
By (LABEL:h7) , which implies that
We form the reciprocal of this last inequality to obtain
Applying this inequality recursively gives
where is chosen large enough to ensure that the minimizing in (LABEL:lambda_min) is less than 1 for all .
Suppose that with . Since , we have
The proof is completed by taking and .
4 Convergence estimate for strongly convex functions
In this section we prove that SpaRSA converges R-linearly when is a convex function and satisfies
for all , where . Hence, is a unique minimizer of . For example, if is a strongly convex function, then (LABEL:StrongConvexity) holds.
If (A1) and (A2) hold, is convex, satisfies , and for every , then there exist constants and such that
for every .
Proof. Let be defined as in (LABEL:phi). We will show that there exist such that
Let be chosen to satisfy the inequality
We consider 2 cases.
Case 1. .
By (LABEL:s_k_i), we have
This can be rearranged to obtain
which yields (LABEL:linear_bound).
Case 2. .
We utilize the inequality (LABEL:phi_i) but with different bounds for the and terms. For , we have
The first inequality is due to (LABEL:StrongConvexity) and the last inequality is since is monotone decreasing. By the definition of below (LABEL:phi), it follows that and
Inserting in (LABEL:phi_i) the bound (LABEL:bk) and the Case 2 requirement yields
for all . Subtract from each side to obtain
for all .
The which minimizes the coefficient of in (LABEL:e_i) is
If the minimizing is 1, then and the minimizing coefficient in (LABEL:e_i) is
since by (LABEL:c1). On the other hand, if the minimizing is less than 1, then and the minimizing coefficient is
since by (LABEL:c1). This completes the proof of (LABEL:linear_bound).
For , we have
Hence, (LABEL:linear_convergence) holds with and . This completes the proof.
The condition when combined with (LABEL:linear_convergence) shows that the iterates converge R-linearly to .
5 More general reference function values
The GLL reference function value , defined in (LABEL:GLL), often leads to greater efficiency when , when compared to the monotone choice . In practice, it is found that even more flexibility in the reference function value can further accelerate convergence. In  we prove convergence of the nonmonotone gradient projection method whenever the reference function satisfies the following conditions:
for each .
In  we provide a specific choice for which satisfies (R1)–(R3) and which gave more rapid convergence than the choice . To satisfy (R3), we could choose an integer and simply set every iterations. Another strategy, closer in spirit to what is used in the numerical experiments, is to choose a decrease parameter and set if . We now give convergence results for SpaRSA whenever the reference function value satisfies (R1)–(R3). In the first convergence result which follows, convexity of is not required.
If (A1) and (A2) hold and the reference function value satisfies (R1)–(R3), then the iterates of SpaRSA have a subsequence converging to a limit satisfying .
Proof. We first apply Proposition LABEL:StepProposition to show that Step 2 of SpaRSA is fulfilled for some choice of . This requires that we show for each . This holds for by (R1). Also, for , we have . Proceeding by induction, suppose that and for , 2, , . By Proposition LABEL:StepProposition, Step 2 of SpaRSA terminates at a finite and hence,
It follows that and . This completes the induction step, and hence, by Proposition LABEL:StepProposition, it follows that in every iteration, Step 2 of SpaRSA is fulfilled for a finite .
By Step 2 of SpaRSA, we have
where . In the third paragraph of the proof of Theorem 2.2 in , it is shown that when an inequality of this form is satisfied for a reference function value satisfying (R1)–(R3), then
Let denote a strictly increasing sequence with the property that tends to and approaches a limit denoted . That is,
Since tends to , it follows that also approaches . By the first-order optimality conditions for , we have
where denotes the value of in Step 2 of SpaRSA associated with . Again, by Proposition LABEL:StepProposition, we have the uniform bound . Taking the limit as tends to , it follows from Corollary 24.5.1 in  that
This completes the proof.
With a small change in (R3), we obtain either sublinear or linear convergence of the entire iteration sequence.
Suppose that (A1) and (A2) hold, is convex, the reference function value satisfies (R1) and (R2), and there is with the property that for each ,
Then there exist constants and such that
for sufficiently large. Moreover, if satisfies the strong convexity condition , then there exists and such that
for every .
Proof. Let , , denote an increasing sequence of integers with the property that for and when . Such a sequence exists since for each and (LABEL:kL) holds. Moreover, . Hence, we have
Let us define
Given , choose such that . Since , the set of function values maximized to obtain is contained in the set of function values maximized to obtain and we have
Combining (LABEL:Rmax) and (LABEL:z2) yields for each . In Step 2 of SpaRSA, the iterates are chosen to satisfy the condition
It follows that
Hence, the iterates also satisfy the GLL condition, but with memory of length instead of . By Theorem LABEL:theorem_sublinear, the iterates converge at least sublinearly. Moreover, if the strong convexity condition holds, then the convergence is R-linear by Theorem LABEL:theorem_linear.
6 Computational experiments
In this section, we compare the performance of SpaRSA with the GLL reference function value and the BB choice for in SpaRSA, to that of an adaptive implementation based on the reference function value given in the appendix of  and a cyclic BB choice for . We call this implementation Adaptive SpaRSA. This adaptive choice for satisfies (R1)–(R3) which ensures convergence in accordance with Theorem LABEL:liminf. By a cyclic choice for the BB parameter (see [8, 9, 13, 19]), we mean that is reused for several iterations. More precisely, for some integer (the cycle length), and for all , the value of at iteration is given by
The test problems are associated with applications in the areas of signal processing and image reconstruction. All experiments were carried out on a PC using Matlab 7.6 with a AMD Athlon 64 X2 dual core 3 Ghz processor and 3GB of memory running Windows Vista. Version 2.0 of SpaSRA was obtained from Mário Figueiredo’s webpage (http://www.lx.it.pt/mtf/SpaRSA/). The code was run with default parameters. Adaptive SpaRSA was written in Matlab with the following parameter values
The test problems, such as the basis pursuit denoising problem (LABEL:BPDN), involve a parameter . The choice of the cycle length was based on the value of :
As approaches zero, the optimization problem becomes more ill conditioned and the convergence speed improves when the cycle length is increased.
The stopping condition for both SpaRSA and Adaptive SpaRSA was
where denotes the final value for in Step 2 of SpaRSA, is the max-norm, and is the error tolerance. This termination condition is suggested by Vandenberghe in . As pointed out earlier, is a stationary point when . For other stopping criteria, see  or . In the following tables, “Ax” denotes the number of times that a vector is multiplied by or , “cpu” is the CPU time in seconds, and “Obj” is the objective function value.