Orthonormal Expansion \ell_{1}-Minimization Algorithms for Compressed Sensing

# Orthonormal Expansion ℓ1-Minimization Algorithms for Compressed Sensing

Zai Yang, Cishen Zhang, Jun Deng, and Wenmiao Lu To appear in IEEE Transactions on Signal Processing. Copyright (c) 2011 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.Z. Yang, J. Deng and W. Lu are with the School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore (e-mail: yang0248@e.ntu.edu.sg; de0001un@e.ntu.edu.sg; wenmiao@ntu.edu.sg). C. Zhang is with the Faculty of Engineering and Industrial Sciences, Swinburne University of Technology, Hawthorn VIC 3122, Australia (e-mail: cishenzhang@swin.edu.au).
###### Abstract

Compressed sensing aims at reconstructing sparse signals from significantly reduced number of samples, and a popular reconstruction approach is -norm minimization. In this correspondence, a method called orthonormal expansion is presented to reformulate the basis pursuit problem for noiseless compressed sensing. Two algorithms are proposed based on convex optimization: one exactly solves the problem and the other is a relaxed version of the first one. The latter can be considered as a modified iterative soft thresholding algorithm and is easy to implement. Numerical simulation shows that, in dealing with noise-free measurements of sparse signals, the relaxed version is accurate, fast and competitive to the recent state-of-the-art algorithms. Its practical application is demonstrated in a more general case where signals of interest are approximately sparse and measurements are contaminated with noise.

Augmented Lagrange multiplier, Compressed sensing, minimization, Orthonormal expansion, Phase transition, Sparsity-undersampling tradeoff.

## I Introduction

Compressed sensing (CS) aims at reconstructing a signal from its significantly reduced measurements with a priori knowledge that the signal is (approximately) sparse [1, 2, 3]. In CS, the signal of interest is acquired by taking measurements of the form

 \emphb=\emphA\emphxo+\emphe,

where is the sampling matrix, is the vector of our measurements or samples, is the measurement noise, with and being sizes of the signal and acquired samples, respectively. A standard approach to reconstructing the original signal is to solve

which is known as the basis pursuit denoising (BPDN) problem, with . Another frequently discussed approach is to solve the problem in its Lagrangian form

It follows from the knowledge of convex optimization that and are equivalent with appropriate choices of and . In general, decreases as decreases. In the limiting case of , both and converges to the following basis pursuit (BP) problem in noiseless CS:

 (BP)min\emphx∥∥\emphx∥∥1, subject to \emphA\emphx=\emphb.

It is important to develop accurate and computationally efficient algorithms to deal with the minimization problems of high dimensional signals in CS, such as an image of pixels. One popular approach for solving is the iterative soft thresholding (IST) algorithm of the form (stating from ) [4, 5]

 \emphxt+1=Sλ(\emphxt+\emphA′\emphzt),\emphzt=\emphb−\emphA\emphxt, (1)

where denotes the transpose operator and is the soft thresholding operator with a threshold , which will be defined more precisely in Section II-A. IST has a concise form and is easy to implement, but its convergence can be very slow in some cases [5], especially for small . Other algorithms proposed to solve include interior-point method [6], conjugate gradient method[7] and fixed-point continuation[8]. Few algorithms can accurately solve large-scale BPDN problem with a low computational complexity. The -magic package[9] includes a primal log barrier code solving , in which the conjugate gradient method may not find a precise Newton step in the large-scale mode. NESTA[10] approximately solves based on Nesterov’s smoothing technique[11], with continuation.

In the case of strictly sparse signals and noise-free measurements, many fast algorithms have been proposed to exactly reconstruct . One class of algorithms uses the greedy pursuit method, which iteratively refines the support and entries of a sparse solution to yield a better approximation of , such as OMP[12], StOMP[13] and CoSaMP[14]. These algorithms, however, may not produce satisfactory sparsity-undersampling tradeoff compared with because of their greedy operations. As mentioned before, is equivalent to as . Hence, can be solved with high accuracy using algorithms for by setting to a small value, e.g. . IST has attracted much attention because of its simple form. In the case where is small, however, the standard IST in (1) can be very slow. To improve its speed, a fixed-point continuation (FPC) strategy is exploited [8], in which is decreased in a continuation scheme and a -linear convergence rate is achieved. Further, FPC-AS [15] is developed to improve the performance of FPC by introducing an active set, inspired by greedy pursuit algorithms. An alternative approach to improving the speed of IST is to use an aggressive continuation, which takes the form

 \emphxt+1=Sλt(\emphxt+\emphA′\emphzt),\emphzt=\emphb−\emphA\emphxt, (2)

where may decrease in each iteration. The algorithm of this form typically has a worse sparsity-undersampling tradeoff than [16]. Such a disadvantage is partially overcome by approximately message passing (AMP) algorithm[17], in which a modification is introduced in the current residual :

 \emphxt+1=Sλt(\emphxt+\emphA′\emphzt),\emphzt=\emphb−\emphA\emphxt+N∥∥\emphxt∥∥0n\emphzt−1, (3)

where counts the number of nonzero entries of x. It is noted that AMP having the same spasity-undersampling tradeoff as is only established based on heuristic arguments and numerical simulations. Moreover, it cannot be easily extended to deal with more general complex-valued sparse signals, though real-valued signals are only considered in this correspondence.

This correspondence focuses on solving the basis pursuit problem in noiseless CS. We assume that is an identity matrix, i.e., the rows of A are orthonormal. This is reasonable since most fast transforms in CS are of this form, such as discrete cosine transform (DCT), discrete Fourier transform (DFT) and some wavelet transforms, e.g. Haar wavelet transform. Such an assumption has also been used in other algorithms, e.g. NESTA. A novel method called orthonormal expansion is introduced to reformulate based on this assumption. The exact OrthoNormal Expansion minimization (eONE-L1) algorithm is then proposed to exactly solve based on the augmented Lagrange multiplier (ALM) method, which is a convex optimization method.

The relaxed ONE-L1 (rONE-L1) algorithm is further developed to simplify eONE-L1. It is shown that rONE-L1 converges at least exponentially and is in the form of modified IST in (2). In the case of strictly sparse signals and noise-free measurements, numerical simulations show that rONE-L1 has the same sparsity-undersampling tradeoff as does. Under the same conditions, rONE-L1 is compared with state-of-the-art algorithms, including FPC-AS, AMP and NESTA. It is shown that rONE-L1 is faster than AMP and NESTA when the number of measurements is just enough to exactly reconstruct the original sparse signal using minimization. In a general case of approximately sparse signals and noise-contaminated measurements, where AMP is omitted for its poor performance, an example of 2D image reconstruction from its partial-DCT measurements demonstrates that rONE-L1 outperforms NESTA and FPC-AS in terms of computational efficiency and reconstruction quality, respectively.

The rest of the correspondence is organized as follows. Section II introduces the proposed exact and relaxed ONE-L1 algorithms followed by their implementation details. Section III reports the efficiency of our algorithm through numerical simulations in both noise-free and noise-contaminated cases. Conclusions are drawn in Section IV.

## Ii ONE-L1 Algorithms

### Ii-a Preliminary: Soft Thresholding Operator

For , the soft thresholding of with a threshold is defined as:

 Sλ(w)=sgn(w)⋅(|w|−λ)+,

where and

 sgn(w)={w/|w|,w≠0;0,w=0.

The operator can be extended to vector variables by its element-wise operation.

The soft thresholding operator can be applied to solve the following -norm regularized least square problem [4], i.e.,

 Sλ(\emphw)=argmin\emphv{λ∥∥\emphv∥∥1+12∥∥\emphw−\emphv∥∥22}. (4)

where , and is the unique minimizer.

### Ii-B Problem Description

Consider the -minimization problem with the sampling matrix A satisfying that , where I is an identity matrix. We call that A is a partially orthonormal matrix hereafter as its rows are usually randomly selected from an orthonormal matrix in practice, e.g. partial-DCT matrix. Hence, there exists another partially orthonormal matrix , whose rows are orthogonal to those of A, such that is orthonormal. Let . The problem is then equivalent to

 (BPo)min\emphx,\emphp,Γ(\emphp)=\emphb∥∥\emphx∥∥1, subject to Φ\emphx=\emphp,

where is an operator projecting the vector p onto its first entries.

In , the sampling matrix A is expanded into an orthonormal matrix . It corresponds to the scenario where the full sampling is carried out with the sampling matrix and p is the vector containing all measurements. Note that only b, as a part of p, is actually observed. To expand the sampling matrix A into an orthonormal matrix is a key step to show that the ALM method exactly solves and, hence, . The next subsection describes the proposed algorithm, referred to as orthonormal expansion -minimization.

### Ii-C ALM Based ONE-L1 Algorithms

In this subsection we solve the -minimization problem using the ALM method[18, 19]. The ALM method is similar to the quadratic penalty method except an additional Lagrange multiplier term. Compared with the quadratic penalty method, ALM method has some salient properties, e.g. the ease of parameter tuning and the convergence speed. The augmented Lagrangian function is

 L(\emphx,\emphp,\emphy,μ)=∥∥\emphx∥∥1+⟨\emphp−Φ\emphx,\emphy⟩+μ2∥∥\emphp−Φ\emphx∥∥22, (5)

where Lagrange multiplier , and is the inner product of . Eq. (5) can be expressed as follows:

 L(\emphx,\emphp,\emphy,μ)=∥∥\emphx∥∥1+μ2∥∥\emphp−Φ\emphx+μ−1\emphy∥∥22−12μ∥∥\emphy∥∥22. (6)

Subsequently, we have the following optimization problem :

 (SP)min\emphx,\emphp,Γ(\emphp)=\emphbL(\emphx,\emphp,\emphy,μ).

Instead of solving , let us consider the two related problems

 (SP1)min\emphxL(\emphx,\emphp,\emphy,μ),

and

 (SP2)min\emphp,Γ(\emphp)=\emphbL(\emphx,\emphp,\emphy,μ).

Note that problem is similar to the -regularized problem in (4). In general, cannot be directly solved using the soft thresholding operator as that in (4) since there is a matrix product of and x in the term of -norm. However, the soft thresholding operator does apply to the special case where is orthonormal. Given for any , we can apply the soft thresholding to obtain

 Sμ−1(Φ′(\emphp+μ−1\emphy))=argmin\emphxL(\emphx,\emphp,\emphy,μ). (7)

To solve , we let to obtain , i.e.,

 [\emphb¯¯¯¯Γ(Φ\emphx−μ−1\emphy)]=argmin\emphp,Γ(\emphp)=bL(\emphx,\emphp,\emphy,μ), (8)

where is the operator projecting the variable to its last entries. As a result, an iterative solution of is stated in the following Lemma 1.

###### Lemma 1

For fixed y and , the iterative algorithm given by

 \emphxj+1=Sμ−1(Φ′(\emphpj+μ−1\emphy)), (9) (10)

converges to an optimal solution of .

{proof}

Denote as , for simplicity. By the optimality and uniqueness of and , we have and the equality holds if and only if . Hence, the sequence is bounded and converges to a constant , i.e., as . Since the sequence is also bounded by , there exists a sub-sequence such that as , where is an accumulation point of . Correspondingly, , and moreover, .

We now use contradiction to show that is a fixed point of the algorithm. Suppose that there exist such that and , and . By (9), (10) and [4, Lemma 2.2], we have , i.e., , as . Meanwhile, . Hence, , which contradicts , resulting in that is a fixed point. Moreover, it follows from for any positive integer , that , as .

Note that orthonormal matrix and . We can obtain

 (11)

Meanwhile, is equivalent to

 min\emphxL(\emphx,[\emphb¯¯¯¯Γ(Φ\emphx−μ−1\emphy)])=∥∥\emphx∥∥1+μ2∥∥\emphA\emphx−\emphb−μ−1Γ(\emphy)∥∥22−12μ∥∥\emphy∥∥22. (12)

By [4, Proposition 3.10], is an optimal solution of the problem in (12) and equivalently, is an optimal solution of .

###### Remark 1
• Lemma 1 shows that to solve problem is equivalent to solve, iteratively, problems and .

• Reference [4, Proposition 3.10] only deals with the special case and it is, in fact, straightforward to extend the result to arbitrary A.

Following from the framework of the ALM method [19] and Lemma 1, the ALM based ONE-L1 algorithm is outlined in Algorithm 1, where is the optimal solution to in the th iteration and is the corresponding Lagrange multiplier.

Algorithm 1: Exact ONE-L1 Algorithm via ALM Method
Input: Expanded orthonormal matrix and observed sample data b.
1. ; ; ; ; .
2. while not converged do
3.  Lines 4-9 solve ;
4. , , ;
5.  while not converged do
6.   ;
7.   ;
8.   set ;
9.  end while
10. ;
11. choose ;
12. set
13. end while
Output: .

The convergence of Algorithm 1 is stated in the following theorem.

###### Theorem 1

Any accumulation point of sequence of Algorithm 1 is an optimal solution of and the convergence rate with respect to the outer iteration loop index is at least in the sense that

where .

{proof}

We first show that the sequence is bounded. By the optimality of we have

 0 ∈ ∂\emphxL(\emphx∗t+1,\emphp∗t+1,\emphy∗t,μt)=∂∥∥\emphx∗t+1∥∥1−Φ′\emphy∗t+1, 0 = ∂¯¯¯Γ(\emphp)L(\emphx∗t+1,\emphp∗t+1,\emphy∗t,μt)=¯¯¯¯Γ(\emphy∗t+1),

where denotes the partial differential operator with respect to x resulting in a set of subgradients. Hence, . It follows that and is bounded. By ,

 ∥∥\emphx∗t+1∥∥1=L(\emphx∗t+1,\emphp∗t+1,\emphy∗t,μt)−12μt(∥∥\emphy∗t+1∥∥22−∥∥\emphy∗t∥∥22)≤\emphx†−12μt(∥∥\emphy∗t+1∥∥22−∥∥\emphy∗t∥∥22).

By is bounded,

 ∥∥\emphx∗t+1∥∥1≤\emphx†+O(μ−1t). (13)

For any accumulation point of , without loss of generality, we have as . Hence, . In the mean time, and result in that is an optimal solution to .

Moreover, by and

 \emphx†=minΦ\emphx=\emphp,Γ(\emphp)=\emphb∥∥\emphx∥∥1=min\emphp,Γ(\emphp)=\emphb∥∥Φ′\emphp∥∥1≤∥∥Φ′\emphp∗t+1∥∥1,

we have , which establishes the theorem with (13).

Algorithm 1 contains, respectively, an inner and an outer iteration loops. Theorem 1 presents only the convergence rate of the outer loop. A natural way to speed up Algorithm 1 is to terminate the inner loop without convergence and use the obtained inner-loop solution as the initialization for the next iteration. This is similar to a continuation strategy and can be realized with reasonably set precision and step size [19]. When the continuation parameter increases very slowly, in a few iterations, the inner loop can produce a solution with high accuracy. In particular, for the purpose of fast and simple computing, we may update the variables in the inner loop only once before stepping into the outer loop operation. This results in a relaxed version of exact ONE-L1 algorithm (eONE-L1), namely relaxed ONE-L1 algorithm (rONE-L1) outlined in Algorithm 2.

Algorithm 2: Relaxed ONE-L1 Algorithm
Input: Expanded orthonormal matrix and observed sample data b.
1. ; ; ; ; .
2. while not converged do
3. ;
4. ;
5. ;
6. choose ;
7. set
8. end while
Output: .
###### Theorem 2

The iterative solution of Algorithm 2 converges to a feasible solution of if . It converges at least exponentially to if is an exponentially increasing sequence.

{proof}

We show first that sequences and are bounded, where . By the optimality of and we have

 0 ∈ ∂\emphxL(\emphxt+1,\emphpt,\emphyt,μt)=∂∥∥\emphxt+1∥∥1−Φ′^\emphyt+1, 0 = ∂¯¯¯Γ(\emphp)L(\emphxt+1,\emphpt+1,\emphyt,μt)=¯¯¯¯Γ(\emphyt+1).

Hence, and it follows that is bounded. Since , we obtain . This together with results in and the boundedness of . By , we have with being a constant. Then is a Cauchy sequence if , resulting in as . In the mean time, , . Hence, is a feasible solution of . Suppose that is an exponentially increasing sequence, i.e., with . By the boundedness of and we have

 ∥∥\emphpt−\emphpf∥∥2=∥∥ ∥∥+∞∑i=t(\emphpi−\emphpi+1)∥∥ ∥∥2≤+∞∑i=t∥∥\emphpi−\emphpi+1∥∥2≤Cμ−1t+∞∑i=0r−i=O(μ−1t).

Hence, converges at least exponentially to since exponentially converges to , and the same result holds for .

###### Remark 2

It is shown in Theorem 2 that faster growth of can result in faster convergence of . Intuitively, the reduced number of iterations for the inner loop problem may result in some error from the optimal solution of the inner loop. This will likely affect the accuracy of the final solution for . Therefore, the growth speed of provides a tradeoff between the convergence speed of the algorithm and the precision of the final solution, which will be illustrated in Section III through numerical simulations.

### Ii-D Relationship Between rONE-L1 and IST

The studies and applications of IST type algorithms have been very active in recent years because of their concise presentations. This subsection considers the relationship between rONE-L1 and IST. Note that in Algorithm 2 and . After some derivations, it can be shown that the rONE-L1 algorithm is equivalent to the following iteration (starting from , as , and , as ):

 \emphxt+1=Sλt(\emphxt+\emphA′\emphzt),\emphzt=\emphb−\emphA[(1+κt)\emphxt−κt\emphxt−1]+κt\emphzt−1, (14)

where and . Compared with the general form of IST in (2), one more term is added when computing the current residual in rONE-L1. Moreover, a weighted sum is used instead of the current solution . It will be shown later that these two changes essentially improve the sparsity-undersampling tradeoff.

###### Remark 3

Equations in (14) show that the expansion from the partially orthonormal matrix A to orthonormal is not at all involved in the actual implementation and computation of rONE-L1. The same claim also holds for eONE-L1 algorithm. Nevertheless, the orthonormal expansion is a key instrumentation in the derivation and analysis of Algorithms 1 and 2.

### Ii-E Implementation Details

As noted in Remark 3, the expansion from A to is not involved in the computing of ONE-L1 algorithms. In our implementations, we consider using exponentially increasing , i.e., fixing and . Let be an -quantile operator and , with applying to the vector variable elementwise, being the threshold in the first iteration and . In eONE-L1, a large can speed up the convergence of the outer loop iteration according to Theorem 1. However, simulations show that a larger can result in more iterations in the inner loop. We use as default. In rONE-L1, the value of provides a tradeoff between the convergence speed of the algorithm and the precision of the final solution. Our recommendation of to achieve the optimal sparsity-undersampling tradeoff is , which will be illustrated in Section III-A.

An iterative algorithm needs a termination criterion. The eONE-L1 algorithm is considered converged if with being a user-defined tolerance. The inner iteration is considered converged if . In our implementation, the default values are . The rONE-L1 algorithm is considered converged if , with as default.

## Iii Numerical Simulations

This subsection considers the sparsity-undersampling tradeoff of rONE-L1 in the case of strictly sparse signals and noise-free measurements. Phase transition is a measure of the sparsity-undersampling tradeoff in this case. Let the sampling ratio be and the sparsity ratio be , where is a measure of sparsity of x, and we call that x is -sparse if at most entries of x are nonzero. As with fixed and , the behavior of the phase transition of is controlled by [20, 21]. We denote this theoretical curve by , which is plotted in Fig 1.

We estimate the phase transition of rONE-L1 using a Monte Carlo method as in [22, 17]. Two matrix ensembles are considered, including Gaussian with and partial-DCT with . Here the finite- phase transition is defined as the value of at which the success probability to recover the original signal is . We consider equispaced values of in . For each , equispaced values of are generated in the interval . Then random problem instances are generated and solved with respect to each combination of , where , , and nonzero entries of sparse signals are generated from the standard Gaussian distribution. Success is declared if the relative root mean squared error (relative RMSE) , where is the recovered signal. The number of success among experiments is recorded. Finally, a generalized linear model is used to estimate the phase transition as in [22].

The experiment result is presented in Fig. 1. The observed phase transitions using the recommended value of strongly agree with the theoretical result of . It shows that the rONE-L1 algorithm has the optimal sparsity-undersampling tradeoff in the sense of minimization.

### Iii-B Comparison with IST

The rONE-L1 algorithm can be considered as a modified version of IST in (2). In this subsection we compare the sparsity-undersampling tradeoff and speed of these two algorithms. A similar method is adopted to estimate the phase transition of IST, which is implemented using the same parameter values as rONE-L1. Only nine values of in are considered with the partial-DCT matrix ensemble for time consideration. Another choice of is considered besides the recommended one. Correspondingly, the phase transition of rONE-L1 with is also estimated.

The observed phase transitions are shown in Fig. 1. As a modified version of IST, obviously, rONE-L1 makes a great improvement over IST in the sparsity-undersampling tradeoff. Meanwhile, comparison of the averaged number of iterations of the two algorithms shows that rONE-L1 is also faster than IST. For example, as and the recommended is used, rONE-L1 is about 6 times faster than IST.

### Iii-C Comparison with AMP, FPC-AS and NESTA in Noise-free Case

In this subsection, we report numerical simulation results comparing rONE-L1 with state-of-the-art algorithms, including AMP, FPC-AS and NESTA, in the case of sparse signals and noise-free measurements. Our experiments used FPC-AS v.1.21, NESTA v.1.1, and AMP codes provided by the author. We choose parameter values for FPC-AS and NESTA such that each method produces a solution with approximately the same precision as that produced by rONE-L1. In our experiments we consider the recovery of exactly sparse signals from partial-DCT measurements. We set and , and an ‘easy’ case where and a ‘hard’ case where are considered, respectively. 111Here ‘easy’ and ‘hard’ refer to the difficulty degree in recovering a sparse signal from a specific number of measurements. The setting is close to the phase transition of . Twenty random problems are created and solved for each algorithm with each combination of , and the minimum, maximum and averaged relative RMSE, number of calls of A and , and CPU time usages are recorded. All experiments are carried on Matlab v.7.7.0 on a PC with a Windows XP system and a 3GHz CPU. Default parameter values are used in eONE-L1 and rONE-L1.

AMP: terminating if .

FPC-AS: and , where is the termination criterion on the maximum norm of sub-gradient. FPC-AS solves the problem .

NESTA: , and the termination criterion . NESTA solves using the Nesterov algorithm [11], with continuation.

Our experiment results are presented in Table