Gradient-based methods for sparse recovery October 25, 2009. This material is based upon work supported by the National Science Foundation under Grant 0619080.

# Gradient-based methods for sparse recovery††thanks: October 25, 2009. This material is based upon work supported by the National Science Foundation under Grant 0619080.

William W. Hager hager@math.ufl.eduhager, PO Box 118105, Department of Mathematics, University of Florida, Gainesville, FL 32611-8105. Phone (352) 392-0281. Fax (352) 392-8357.    Dzung T. Phan dphan@math.ufl.edudphan, PO Box 118105, Department of Mathematics, University of Florida, Gainesville, FL 32611-8105. Phone (352) 392-0281. Fax (352) 392-8357.    Hongchao Zhang hozhang@math.lsu.eduhozhang, Department of Mathematics, 140 Lockett Hall, Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803-4918. Phone (225) 578-1982. Fax (225) 578-4276.
###### Abstract

The convergence rate is analyzed for the SpaSRA algorithm (Sparse Reconstruction by Separable Approximation) for minimizing a sum where is smooth and is convex, but possibly nonsmooth. It is shown that if is convex, then the error in the objective function at iteration , for sufficiently large, is bounded by for suitable choices of and . Moreover, if the objective function is strongly convex, then the convergence is -linear. An improved version of the algorithm based on a cycle version of the BB iteration and an adaptive line search is given. The performance of the algorithm is investigated using applications in the areas of signal processing and image reconstruction.

AMS subject classifications. 90C06, 90C25, 65Y20, 94A08

Key words. SpaRSA, ISTA, sparse recovery, sublinear convergence, linear convergence, image reconstruction, denoising, compressed sensing, nonsmooth optimization, nonmonotone convergence, BB method

## 1 Introduction

In this paper we consider the following optimization problem

 minx∈Rnϕ(x):=f(x)+ψ(x), \hb@xt@.01(1.1)

where is a smooth function, and is convex. The function , usually called the regularizer or regularization function, is finite for all , but possibly nonsmooth. An important application of (LABEL:main), found in the signal processing literature, is the well-known problem (called basis pursuit denoising in [7])

 minx∈Rn12∥Ax−b∥22+τ∥x∥1, \hb@xt@.01(1.2)

where (usually ), , , and is the -norm.

Recently, Wright, Nowak, and Figueiredo [24] introduced the Sparse Reconstruction by Separable Approximation algorithm (SpaRSA) for solving (LABEL:main). The algorithm has been shown to work well in practice. In [24] the authors establish global convergence of SpaRSA. In this paper, we prove an estimate of the form for the error in the objective function when is convex. If the objective function is strongly convex, then the convergence of the objective function and the iterates is at least R-linear. A strategy is presented for improving the performance of SpaRSA based on a cyclic Barzilai-Borwein step [8, 9, 13, 19] and an adaptive choice [15] for the reference function value in the line search. The paper concludes with a series of numerical experiments in the areas of signal processing and image reconstruction.

Throughout the paper denotes the gradient of , a row vector. The gradient of , arranged as a column vector, is . The subscript often represents the iteration number in an algorithm, and stands for . denotes , the Euclidean norm. is the subdifferential at , a set of row vectors. If , then

 ψ(x)≥ψ(y)+p(x−y)

for all .

## 2 The SpaRSA algorithm

The SpaRSA algorithm, as presented in [24], is as follows:

Sparse Reconstruction by Separable Approximation (SpaRSA)
Given , , , and starting guess .
Set .
 Step 1. Choose α0∈[αmin,αmax] Step 2. Set α=ηjα0 where j≥0 is the smallest integer such that ϕ(xk+1)≤ϕRk−σα∥xk+1−xk∥2 where xk+1=argmin{∇f(xk)z+α∥z−xk∥2+ψ(z):z∈Rn}. Step 3. If xk+1=xk, terminate. Step 4. Set k=k+1 and go to step 1.

The parameter in [24] was taken to be the BB parameter [1] with safeguards:

 α0=αBBk=min{∥αsk−yk∥:αmin≤α≤αmax} \hb@xt@.01(2.1)

where and . Also, in [24], the reference value is the GLL [14] reference value defined by

 ϕmaxk=max{ϕ(xk−j):0≤j

In other words, at iteration , is the maximum of the most recent values for the objective function. Note that if , then

 0∈∇f(xk)+∂ψ(xk+1)=∇f(xk+1)+∂ψ(xk+1).

Hence, is a stationary point.

The overall structure of the SpaRSA algorithm is closely related to that of the Iterative Shrinkage Thresholding Algorithm (ISTA) [6, 10, 12, 16, 23]. ISTA, however, employs a fixed choice for related to the Lipschitz constant for , while SpaRSA employs a nonmonotone line search. A sublinear convergence result for a monotone line search version of ISTA is given by Beck and Teboulle [2] and by Nesterov [18]. In Section LABEL:convergence we give a sublinear convergence result for the nonmonotone SpaRSA, while Section LABEL:strong_convexity gives a linear convergence result when the objective function is strongly convex.

In [24] it is shown that the line search in Step 2 terminates for a finite when is Lipschitz continuously differentiable. Here we weaken this condition by only requiring Lipschitz continuity over a bounded set.

###### Proposition 2.1

Let be the level set defined by

 \@fontswitchL={x∈Rn:ϕ(x)≤ϕ(x1)}. \hb@xt@.01(2.3)

We make the following assumptions:

• The level set is contained in the interior of a compact, convex set , and is Lipschitz continuously differentiable on .

• is convex and is finite for all .

If , then there exists with the property that

 ϕ(xk+1)≤ϕRk−σα∥xk+1−xk∥2

whenever where is obtained as in Step of SpaRSA.

Proof. Let be defined by

where . Since is a strongly convex quadratic, its level sets are compact, and the minimizer in Step 2 exists. Since is the minimizer of , we have

 Φk(xk+1) = f(xk)+∇f(xk)(xk+1−xk)+α∥xk+1−xk∥2+ψ(xk+1) ≤ Φk(xk)=f(xk)+ψ(xk).

This is rearranged to obtain

 α∥xk+1−xk∥2 ≤ ∇f(xk)(xk−xk+1)+ψ(xk)−ψ(xk+1) ≤ ∇f(xk)(xk−xk+1)+pk(xk−xk+1),

where . Taking norms yields

 ∥xk+1−xk∥≤(∥gk∥+∥pk∥)/α. \hb@xt@.01(2.4)

By Theorem 23.4 and Corollary 24.5.1 in [20] and by the compactness of , there exists a constant , independent of , such that . Consequently, we have

 ∥xk+1−xk∥≤c/α.

Since is compact and lies in the interior of , the distance from to the boundary of is positive. Choose so that . Hence, when , since .

Let denote the Lipschitz constant for on and suppose that . Since and , we have . Moreover, due to the convexity of , the line segment connecting and lies in . Proceeding as in [24], a Taylor expansion around yields

 f(xk+1)≤f(xk)+∇f(xk)(xk+1−xk)+.5λ∥xk+1−xk∥2.

Adding to both sides, we have

 ϕ(xk+1) ≤ Φk(xk+1)+(.5λ−α)∥xk+1−xk∥2 ≤ Φk(xk)+(.5λ−α)∥xk+1−xk∥2 = ϕ(xk)+(.5λ−α)∥xk+1−xk∥2 ≤ ϕRk+(.5λ−α)∥xk+1−xk∥2since ϕ(xk)≤ϕRk ≤ ϕRk−σα∥xk+1−xk∥2if .5λ−α≤−σα.

Hence, the proposition holds with

 ¯α=max{β,λ2(1−σ)}.

###### Remark 1

Suppose . In Step 2 of SpaRSA, is chosen so that . Hence, there exists such that . In other words, if the hypothesis “” of Proposition LABEL:StepProposition is satisfied at step , then a choice for exists which satisfies this hypothesis at step .

###### Remark 2

We now show that the GLL reference value satisfies the condition of Proposition LABEL:StepProposition for each . The condition is a trivial consequence of the definition of . Also, by the definition, we have . For , according to Step 2 of SpaRSA. Hence, is a decreasing function of . In particular, .

## 3 Convergence estimate for convex functions

In this section we give a sublinear convergence estimate for the error in the objective function value assuming is convex and the assumptions of Proposition LABEL:StepProposition hold.

By (A1) and (A2), (LABEL:main) has a solution and an associated objective function value . The convergence of the objective function values to is a consequence of the analysis in [24]:

###### Lemma 3.1

If (A1) and (A2) hold and for every , then

 limk→∞ϕ(xk)=ϕ∗.

Proof. By [24, Lemma 4], the objective function values approach a limit denoted . By [24, Theorem 1], all accumulation points of the iterates are stationary points. An accumulation point exists since is compact and the iterates are all contained in , as shown in Remark LABEL:GLL_OK. Since and are both convex, a stationary point is a global minimizer of . Hence, .

Our sublinear convergence result is the following:

###### Theorem 3.2

If (A1) and (A2) hold, is convex, and for all , then there exist constants and such that

 ϕ(xk)−ϕ∗≤ab+k

for sufficiently large.

Proof. By (LABEL:phiPhi) with replaced by , we have

 ϕ(xk)≤Φk−1(xk)+b0∥sk∥2,b0=.5λ, \hb@xt@.01(3.1)

where . Since minimizes and is convex, it follows that

 Φk−1(xk) = minz∈Rn{f(xk−1)+∇f(xk−1)(z−xk−1)+αk−1∥z−xk−1∥2+ψ(z)} \hb@xt@.01(3.2) ≤ min{f(z)+ψ(z)+αk−1∥z−xk−1∥2:z∈Rn} = min{ϕ(z)+αk−1∥z−xk−1∥2:z∈Rn},

where is the terminating value of at step . Combining (LABEL:h1) and (LABEL:h3) gives

 ϕ(xk)≤min{ϕ(z)+¯β∥z−xk−1∥2:z∈Rn}+b0∥sk∥2, \hb@xt@.01(3.3)

where is an upper bound for the implied by Proposition LABEL:StepProposition. By the convexity of and with for any , we have

 minz∈Rnϕ(z)+¯β∥z−xk−1∥2 ≤ ϕ((1−λ)xk−1+λx∗)+¯βλ2∥xk−1−x∗∥2 ≤ (1−λ)ϕ(xk−1)+λϕ∗+¯βλ2∥xk−1−x∗∥2 = (1−λ)ϕ(xk−1)+λϕ∗+bkλ2,

where . Combining this with (LABEL:h4) yields

 ϕ(xk) ≤ (1−λ)ϕ(xk−1)+λϕ∗+bkλ2+b0∥sk∥2 \hb@xt@.01(3.4) ≤ (1−λ)ϕRk−1+λϕ∗+bkλ2+b0∥sk∥2

for any . Define

 ϕi=max{ϕ(xk):(i−1)M

and let denote the index where the maximum is attained. Since in Step 2 of SpaRSA, it follows that is a nonincreasing function of . By (LABEL:h5) with and by the monotonicity of , we have

 ϕi≤(1−λ)ϕi−1+λϕ∗+bkiλ2+b0∥ski∥2 \hb@xt@.01(3.6)

for any . Since both and lie in , it follows that

 bk=¯β∥xk−1−x∗∥2≤¯β(diameter of \@fontswitchL)2:=b2<∞. \hb@xt@.01(3.7)

Step 2 of SpaRSA implies that

 ∥sk∥2≤(ϕRk−1−ϕ(xk))/b1

where . We take and again exploit the monotonicity of to obtain

 ∥ski∥2≤(ϕi−1−ϕi)/b1. \hb@xt@.01(3.8)

Combining (LABEL:phi_i)–(LABEL:s_k_i) gives

 ϕi≤(1−λ)ϕi−1+λϕ∗+b2λ2+b3(ϕi−1−ϕi),b3=b0/b1, \hb@xt@.01(3.9)

for every , The minimum on the right side is attained with the choice

 λ=min{1,ϕi−1−ϕ∗2b2}. \hb@xt@.01(3.10)

As a consequence of Lemma LABEL:phi_k_converge, converges to . Hence, the minimizing also approaches 0 as tends to . Choose large enough that the minimizing is less than 1. It follows from (LABEL:phi_no_sk) that for this minimizing choice of , we have

 ϕi≤ϕi−1−(ϕi−1−ϕ∗)24b2+b3(ϕi−1−ϕi). \hb@xt@.01(3.11)

Define . Subtracting from each side of (LABEL:h6) gives

 ei ≤ ei−1−e2i−1/(4b2)+b3(ei−1−ei) = (1+b3)ei−1−e2i−1/(4b2)−b3ei.

We arrange this to obtain

 ei≤ei−1−b4e2i−1where b4=14b2(1+b3). \hb@xt@.01(3.12)

By (LABEL:h7) , which implies that

 ei≤ei−1−b4ei−1eiorei≤ei−11+b4ei−1.

We form the reciprocal of this last inequality to obtain

 1ei≥1ei−1+b4.

Applying this inequality recursively gives

 1ei≥1ej+(i−j)b4orei≤ej1+(i−j)b4ej,

where is chosen large enough to ensure that the minimizing in (LABEL:lambda_min) is less than 1 for all .

Suppose that with . Since , we have

 ϕ(xk)−ϕ∗≤ei≤ej1+(i−j)b4ej≤ej1−jb4ej+kb4ej/M.

The proof is completed by taking and .

## 4 Convergence estimate for strongly convex functions

In this section we prove that SpaRSA converges R-linearly when is a convex function and satisfies

 ϕ(y)≥ϕ(x∗)+μ∥y−x∗∥2 \hb@xt@.01(4.1)

for all , where . Hence, is a unique minimizer of . For example, if is a strongly convex function, then (LABEL:StrongConvexity) holds.

###### Theorem 4.1

If (A1) and (A2) hold, is convex, satisfies , and for every , then there exist constants and such that

 ϕ(xk)−ϕ∗≤cθk(ϕ(x1)−ϕ∗) \hb@xt@.01(4.2)

for every .

Proof. Let be defined as in (LABEL:phi). We will show that there exist such that

 ϕi−ϕ∗≤γ(ϕi−1−ϕ∗). \hb@xt@.01(4.3)

Let be chosen to satisfy the inequality

 0

We consider 2 cases.

Case 1. .

By (LABEL:s_k_i), we have

 c1(ϕi−1−ϕ∗)≤(ϕi−1−ϕi)/b1.

This can be rearranged to obtain

 ϕi−ϕ∗≤(1−b1c1)(ϕi−1−ϕ∗),

which yields (LABEL:linear_bound).

Case 2. .

We utilize the inequality (LABEL:phi_i) but with different bounds for the and terms. For , we have

 bk:=¯β∥xk−1−x∗∥2 ≤ ¯βμ(ϕ(xk−1)−ϕ∗)≤¯βμ(ϕRk−1−ϕ∗) ≤ ¯βμ(ϕR(i−1)M−ϕ∗)=b5(ϕi−1−ϕ∗),b5=¯βμ.

The first inequality is due to (LABEL:StrongConvexity) and the last inequality is since is monotone decreasing. By the definition of below (LABEL:phi), it follows that and

 bki≤b5(ϕi−1−ϕ∗). \hb@xt@.01(4.5)

Inserting in (LABEL:phi_i) the bound (LABEL:bk) and the Case 2 requirement yields

 ϕi≤(1−λ)ϕi−1+λϕ∗+b5(ϕi−1−ϕ∗)λ2+b0c1(ϕi−1−ϕ∗)

for all . Subtract from each side to obtain

 ei≤[1+b0c1−λ+b5λ2]ei−1 \hb@xt@.01(4.6)

for all .

The which minimizes the coefficient of in (LABEL:e_i) is

 λ=min{1,12b5}.

If the minimizing is 1, then and the minimizing coefficient in (LABEL:e_i) is

 γ=b0c1+b5≤b0c1+1/2<1

since by (LABEL:c1). On the other hand, if the minimizing is less than 1, then and the minimizing coefficient is

 γ=1+b0c1−14b5<1

since by (LABEL:c1). This completes the proof of (LABEL:linear_bound).

For , we have

 ϕ(xk)−ϕ∗≤ei≤γi−1e1≤1γ(γ1/M)k(ϕ(x1)−ϕ∗).

Hence, (LABEL:linear_convergence) holds with and . This completes the proof.

###### Remark 3

The condition when combined with (LABEL:linear_convergence) shows that the iterates converge R-linearly to .

## 5 More general reference function values

The GLL reference function value , defined in (LABEL:GLL), often leads to greater efficiency when , when compared to the monotone choice . In practice, it is found that even more flexibility in the reference function value can further accelerate convergence. In [15] we prove convergence of the nonmonotone gradient projection method whenever the reference function satisfies the following conditions:

• .

• for each .

• infinitely often.

In [15] we provide a specific choice for which satisfies (R1)–(R3) and which gave more rapid convergence than the choice . To satisfy (R3), we could choose an integer and simply set every iterations. Another strategy, closer in spirit to what is used in the numerical experiments, is to choose a decrease parameter and set if . We now give convergence results for SpaRSA whenever the reference function value satisfies (R1)–(R3). In the first convergence result which follows, convexity of is not required.

###### Theorem 5.1

If (A1) and (A2) hold and the reference function value satisfies (R1)(R3), then the iterates of SpaRSA have a subsequence converging to a limit satisfying .

Proof. We first apply Proposition LABEL:StepProposition to show that Step 2 of SpaRSA is fulfilled for some choice of . This requires that we show for each . This holds for by (R1). Also, for , we have . Proceeding by induction, suppose that and for , 2, , . By Proposition LABEL:StepProposition, Step 2 of SpaRSA terminates at a finite and hence,

 ϕ(xk+1)≤ϕRk≤ϕ(x1).

It follows that and . This completes the induction step, and hence, by Proposition LABEL:StepProposition, it follows that in every iteration, Step 2 of SpaRSA is fulfilled for a finite .

By Step 2 of SpaRSA, we have

 ϕ(xk)≤ϕRk−1−σαmin∥sk∥2,

where . In the third paragraph of the proof of Theorem 2.2 in [15], it is shown that when an inequality of this form is satisfied for a reference function value satisfying (R1)–(R3), then

 liminfk→∞∥sk∥=0.

Let denote a strictly increasing sequence with the property that tends to and approaches a limit denoted . That is,

 limi→∞ski=0andlimi→∞xki=¯x.

Since tends to , it follows that also approaches . By the first-order optimality conditions for , we have

 0∈∇f(xki−1)+2αki(xki−xki−1)+∂ψ(xki), \hb@xt@.01(5.1)

where denotes the value of in Step 2 of SpaRSA associated with . Again, by Proposition LABEL:StepProposition, we have the uniform bound . Taking the limit as tends to , it follows from Corollary 24.5.1 in [20] that

 0∈∇f(¯x)+∂ψ(¯x).

This completes the proof.

With a small change in (R3), we obtain either sublinear or linear convergence of the entire iteration sequence.

###### Theorem 5.2

Suppose that (A1) and (A2) hold, is convex, the reference function value satisfies (R1) and (R2), and there is with the property that for each ,

 ϕRj≤ϕmaxjfor some j∈[k,k+L). \hb@xt@.01(5.2)

Then there exist constants and such that

 ϕ(xk)−ϕ∗≤ab+k

for sufficiently large. Moreover, if satisfies the strong convexity condition , then there exists and such that

 ϕ(xk)−ϕ∗≤cθk(ϕ(x1)−ϕ∗)

for every .

Proof. Let , , denote an increasing sequence of integers with the property that for and when . Such a sequence exists since for each and (LABEL:kL) holds. Moreover, . Hence, we have

 ϕRj≤ϕRki≤ϕmaxki,when ki≤j

Let us define

 ϕmax+j=max{ϕ(xj−i:0≤i

Given , choose such that . Since , the set of function values maximized to obtain is contained in the set of function values maximized to obtain and we have

 ϕmaxki≤ϕmax+j. \hb@xt@.01(5.4)

Combining (LABEL:Rmax) and (LABEL:z2) yields for each . In Step 2 of SpaRSA, the iterates are chosen to satisfy the condition

 ϕ(xk+1)≤ϕRk−σα∥xk+1−xk∥2.

It follows that

 ϕ(xk+1)≤ϕmax+k−σα∥xk+1−xk∥2.

Hence, the iterates also satisfy the GLL condition, but with memory of length instead of . By Theorem LABEL:theorem_sublinear, the iterates converge at least sublinearly. Moreover, if the strong convexity condition holds, then the convergence is R-linear by Theorem LABEL:theorem_linear.

## 6 Computational experiments

In this section, we compare the performance of SpaRSA with the GLL reference function value and the BB choice for in SpaRSA, to that of an adaptive implementation based on the reference function value given in the appendix of [15] and a cyclic BB choice for . We call this implementation Adaptive SpaRSA. This adaptive choice for satisfies (R1)–(R3) which ensures convergence in accordance with Theorem LABEL:liminf. By a cyclic choice for the BB parameter (see [8, 9, 13, 19]), we mean that is reused for several iterations. More precisely, for some integer (the cycle length), and for all , the value of at iteration is given by

 (α0)k=αBB(i−1)m+1.

The test problems are associated with applications in the areas of signal processing and image reconstruction. All experiments were carried out on a PC using Matlab 7.6 with a AMD Athlon 64 X2 dual core 3 Ghz processor and 3GB of memory running Windows Vista. Version 2.0 of SpaSRA was obtained from Mário Figueiredo’s webpage (http://www.lx.it.pt/mtf/SpaRSA/). The code was run with default parameters. Adaptive SpaRSA was written in Matlab with the following parameter values

 αmin=10−30,αmax=1030,η=5,σ=10−4,M=10.

The test problems, such as the basis pursuit denoising problem (LABEL:BPDN), involve a parameter . The choice of the cycle length was based on the value of :

 m=1 if τ≥10−2, otherwise m=3.

As approaches zero, the optimization problem becomes more ill conditioned and the convergence speed improves when the cycle length is increased.

The stopping condition for both SpaRSA and Adaptive SpaRSA was

 αk∥xk+1−xk∥∞≤ϵ,

where denotes the final value for in Step 2 of SpaRSA, is the max-norm, and is the error tolerance. This termination condition is suggested by Vandenberghe in [22]. As pointed out earlier, is a stationary point when . For other stopping criteria, see [16] or [24]. In the following tables, “Ax” denotes the number of times that a vector is multiplied by or , “cpu” is the CPU time in seconds, and “Obj” is the objective function value.

### 6.1 ℓ2−ℓ1 problems

We compare the performance of Adaptive SpaRSA with SpaRSA by solving problems of form (LABEL:BPDN) using the randomly generated data introduced in [17, 24]. The matrix is a random