Iterative Minimization for NonConvex Compressed Sensing
Abstract
An algorithmic framework, based on the difference of convex functions algorithm (DCA), is proposed for minimizing a class of concave sparse metrics for compressed sensing problems. The resulting algorithm iterates a sequence of minimization problems. An exact sparse recovery theory is established to show that the proposed framework always improves on the basis pursuit ( minimization) and inherits robustness from it. Numerical examples on success rates of sparse solution recovery illustrate further that, unlike most existing nonconvex compressed sensing solvers in the literature, our method always outperforms basis pursuit, no matter how illconditioned the measurement matrix is. Moreover, the iterative (IL) algorithm lead by a wide margin the stateoftheart algorithms on and logarithimic minimizations in the strongly coherent (highly illconditioned) regime, despite the same objective functions. Last but not least, in the application of magnetic resonance imaging (MRI), IL algorithm easily recovers the phantom image with just 7 line projections.
1 Introduction
Compressed sensing (CS) techniques [8, 5, 6, 17] enable efficient reconstruction of a sparse signal under linear measurements far less than its physical dimension. Mathematically, CS aims to recover an dimensional vector with few nonzero components from an underdetermined linear system of just equations, where is a known measurement matrix. The first CS technique is the convex minimization or the socalled basis pursuit [15]:
(1) 
Breakthrough results [8] have established that when matrix satisfies certain restricted isometry property (RIP), the solution to (1) is exactly . It was shown that with overwhelming probability, several random ensembles such as random Gaussian, random Bernoulli, and random partial Fourier matrices, are of RIP type [8, 13, 32]. Note that (1) is just a minimization principle rather than an algorithm for retrieving . Algorithms for solving (1) and its associated regularization problem [36]:
(2) 
include Bregman methods [43, 24], alternating direction algorithms [3, 40, 18], iterative thresholding methods [14, 1] among others [25].
Inspired by the success of basis pursuit, researchers then began to investigate various nonconvex CS models and algorithms. More and more empirical studies have shown that nonconvex CS methods usually outperform basis pursuit when matrix is RIPlike, in the sense that they require fewer linear measurements to reconstruct signals of interest. Instead of minimizing norm, it is natural to consider minimization of nonconvex (concave) sparse metrics, for instance, (quasi)norm () [11, 12, 27], capped [45, 30], and transformed [28, 44]. Another category of CS methods in spirit rely on support detection of . To name a few, there are orthogonal matching pursuit (OMP) [37], iterative hard thresholding (IHT) [2], (re)weighted scheme [7], iterative support detection (ISD) [38], and their variations [31, 46, 26].
On the other hand, it has been proved that even if is not RIPlike and contains highly correlated columns, basis pursuit still enables sparse recovery under certain conditions of involving its support [4]. In this scenario, most of the existing nonconvex CS methods, however, are not that robust to the conditioning of , as suggested by [41]. Their success rates will drop as columns of become more and more correlated. In [41], based on the difference of convex functions algorithm (DCA) [34, 35], the authors propose DCA for minimizing the difference of and norms [19, 42]. Extensive numerical experiments [41, 29, 30] imply that DCA algorithm consistently outperforms minimization, irrespective of the conditioning of .
Stimulated by the empirical evidence found in [41, 29, 30], we propose a general DCAbased CS framework for the minimization of a class of concave sparse metrics. More precisely, we consider the reconstruction of a sparse vector by minimizing sparsitypromoting metrics:
(3) 
Throughout the paper, we assume that always takes the form unless otherwise stated, where defined on satisfies:

is concave and increasing.

is continuous with the right derivative .
The first condition encourages zeros in rather than small entries, since changes rapidly around the origin; the second one is imposed for the good of the proposed algorithm, as will be seen later. A number of sparse metrics in the literature enjoy the above properties, including smoothly clipped absolute deviation (SCAD) [20], capped, transformed, and of course itself. Although () and logarithm functional do not meet the second condition, their smoothed versions and are differentiable at zero. These proposed properties will be essential in the algorithm design as well as in the proof of main results.
Our proposed algorithm calls for solving a sequence of minimization subproblems. The objective of each subproblem is plus a linear term, which is convex and tractable. We further validate robustness of this framework, by showing theoretically and numerically that it performs at least as well as basis pursuit in terms of uniform sparse recovery, independent of the conditioning of and sparsity metric.
The paper is organized as follows. In section 2, we overview RIP and coherence of sensing matrices, as well as descent property of DCA. In section 3, we provide the iterated framework for nonconvex minimization, with worked out examples on representative sparse objectives including the total variation. In section 4, we prove the main exact recovery results based on unique recovery property of minimization instead of RIP, which forms a theoretical basis of the better performance of DCA. In section 5, we compare iterative algorithms with two stateoftheart nonconvex CS algorithms, IRLS [27] and IRL [7], and ADMM, in CS test problems with varying degree of coherence. We find that iterative outperforms ADMM independent of the sensing matrix coherence, and leads IRLS [27] and IRL [7] in the highly coherent regime. This is consistent with earlier findings of DCA algorithm [41, 29, 30] to which our theory also applies. We also evaluate these two nonconvex metrics on a twodimensional example of reconstructing MRI from a small number of projections, our iterative algorithm succeed with 7 projections for both metrics. Using the same objective functions, the stateoftheart algorithms need at least 10 projections. Concluding remarks are in section 6.
Notations. Let us fix some notations. For any , is their inner product. is the vector of zeros, and similar to . is Hadamard (entrywise) product, meaning that . is the identity matrix of dimension . For any function on , is a subgradient of at . The is the signum function on defined as
For any set , is given by
2 Preliminaries
The wellknown CS concept during the past decade is the restricted isometry property (RIP) introduced by Candès et al. [8], which is used to characterize matrices that are nearly orthonormal.
Definition 2.1.
For each number , restricted isometry constant of is the smallest such that
for all with sparsity of . The matrix is said to satisfy the RIP with .
Mutual coherence [15] is another commonlyused concept closely related to the success of CS task.
Definition 2.2.
The coherence of a matrix is the maximum absolute value of the crosscorrelations between the columns of , namely
When matrix have small mutual coherence (incoherent) or small RIP constant, its columns tend to be more separated or distinguishable, which is intuitively favorable to identification of the supports of target signal. On the other hand, a highly coherent matrix with large coherence poses challenge to the reconstruction.
Next we give a brief review on the difference of convex functions algorithm (DCA). DCA has been widely applied to sparse optimization problems in several works [41, 19, 23, 28, 30]. For an objective function on the space , where and are lower semicontinuous proper convex functions, we call a DC decomposition of .
DCA takes the following form
Since , by the definition of subgradient, we have
Consequently,
The fact that minimizes was used in the first inequality above. Therefore, DCA permits a decreasing sequence , leading to its convergence provided is bounded from below.
3 Iterative framework
Our proposed iterative framework for solving (3) is built on minimization and DCA. Note that (3) can be equivalently written as
We then rewrite the above objective in DC decomposition form:
Clearly the first term on the righthand side is convex in terms of . We show below that the second term is also a convex function.
Proposition 3.1.
is convex in .
Proof.
For notational convenience, define on . Since p is concave on , we have that is convex on . We only need to show that is convex on , or equivalently, for all , ,
Case 1. If and have the same sign or one of them is . Since and is convex on , then the above inequality holds.
Case 2. If and are of the opposite sign. By the concavity of on , we have
that is, for all . Without loss of generality, we suppose . Then
In the first inequality above, we used the convexity of on , whereas in the second one, we used the fact that for . ∎
At the iteration, DCA calls for linearization of the second convex term at the current guess , and solving the resulting convex subproblem for . After converting back the linear constraint and removing the constant and the factor of , we iterate:
(4) 
where
Be aware that denotes subgradient of at rather than subgradient of at . In this way, the subproblem reduces to minimizing plus a linear term of , which can be effciently solved by a variety of stateoftheart algorithms for basis pursuit (with minor modifications). In Table 1, we list some nonconvex metrics and the corresponding iterative algorithm.
sparse metric  

capped  1  
transformed  
smoothed log  
smoothed 
For initialization, we take , which is basically minimization. The proposed algorithm is thus summarized in Algorithm 1 below. Due to the descending property of DCA, Algorithm 1 produces a convergent sequence . Beyond that, we shall not prove any stronger convergence result on the iterates itself in this paper. The reason is that the convergence analysis may vary individually by choice of sparse metric. We refer the readers to [41] and [44], in which subsequential convergence of is established for DCA and DCAtransformed respectively.
Extensions. Two natural extensions of (3) are regularized model:
(5) 
and denoising model:
(6) 
where is the measurement, is a general matrix, and are parameters. They find applications in magnetic resonance imaging [5], total variation denoising [33] and so on. We can show that DC decomposition of is
(7) 
The iterative frameworks are detailed in Algorithms 2 and 3 respectively.
4 Recovery results
Although in general global minimum is not guaranteed in minimization, we can show that its performance is provably robust to the conditioning of measurement matrix , by proving that it always tends to sharpen solution.
Let us take another look at the assumptions on which were crucial in the proof of Proposition 3.1. Since is concave and increasing on , we have
and thus . Now we are ready to show the main results.
Theorem 4.1 (Supportwise uniform recovery).
Let be an arbitrary but fixed index set. If basis pursuit uniquely recovers all supported on , so does (4).
Proof.
If nonzero entries of have the same magnitude, a stronger result holds that (4) recovers any fixed signal whenever basis pursuit does.
Theorem 4.2 (Recovery of equalheight signals).
Let be a signal with equalheight peaks supported on , i.e. . If the basis pursuit uniquely recovers , so does (4).
Proof.
If basis pursuit uniquely recovers , then for all , . This implies that for all and , . So for all and , we have .
Therefore,
(8) 
and also .
Remark 4.1.
Although the conditions proposed in section 1 are not applicable to the metric since it is not separable, it is not hard to generalize these conditions and iterative algorithm to accommodate this case. The resulting algorithm is exactly DCA in [41]. We can also readily extend the recovery theory to DCA, with and
Theorem 4.1 provides a theoretical explanation for the experimental observations made in [41, 29, 30] that DCA performs consistently better than minimization.
5 Numerical experiments
5.1 Exact recovery of sparse vectors
We reconstruct sparse vector using iterative algorithm (Algorithm 2 with ) for minimizing the regularized model (5) with smoothed norm (IL) and smoothed logarithm functional (ILlog), and compare them with two stateoftheart nonconvex CS algorithms, namely IRLS [27] and IRL [7]. Note that IRLS and IRL attempt to minimize and logarithm, respectively, and both involve a smoothing strategy in minimization. So it would be particularly interesting to compare IL with IRLS, and ILlog with IRL. is chosen for IRLS and IL in all experiments. We shall also include ADMM [3] for solving regularization (LASSO) in comparison.
Experiments are carried out as follows. We first sample a sensing matrix , and generate a test signal of sparsity supported on a random index set with i.i.d. Gaussian entries. We then compute the measurement and apply each solver to produce a reconstruction of . The reconstruction is called a success if
We run 100 independent realizations and record the corresponding success rates at different sparsity levels.
Matrix for test. We test on random Gaussian matrix whose columns satisfy
Gaussian matrices are RIPlike and have uncorrelated (incoherent) columns. For Gaussian matrix, we choose and .
We also use more illconditioned sensing matrix of significantly higher coherence. Specifically, a randomly oversampled partial DCT matrix is defined as
where whose components are uniformly and independently sampled from [0,1]. is the refinement factor. Coherence goes up as increases. In this setting, it is still possible to recover the sparse vector if its spikes are sufficiently separated. Specifically, we randomly select a (support of ) so that
where is called the minimum separation. It is necessary for to be at least 1 Rayleigh length (RL) which is unity in the frequency domain [21, 16]. In our case, the value of 1 RL equals . The testing matrix , i.e. , . We test at three coherence levels with . Note that for , for , and for . We also set in experiments.
Algorithm implementation.
For ADMM, we let , , , , , and the maximum number of iterations maxiter
= 5000 [3, 41]. For IRLS, maxiter
= 1000, tol
. For rewighted , the smoothing parameter is adaptively updated as introduced in [7], and the outer iteration criterion is stopped if the relative error between two consecutive iterates is less than .
The weighted minimization subproblems is solved by the YALL1 solver (available at http://yall1.blogs.rice.edu/). The tolerance for YALL1 was set to . All other settings of the algorithms are set to default ones.
For IL, we let , and the smoothing parameter , where is the output from the first iteration, which is also the solution to LASSO. denotes the largest entry of . We set to . For ILlog, . The subproblems are solved by alternating direction method of multipliers (ADMM), which is detailed in [41]. The parameters for solving subproblems are the same as that for ADMM.
Interpretation of results. The plot of success rates is shown in Figure 1. When is Guassian, we see that all nonconvex CS solvers are comparable and much better than ADMM, with IRLS being the best. For oversampled DCT matrices, we see that the success rates of IRLS and IRL drop as increases, whereas the proposed IL and ILlog are robust and consistently outperform ADMM.
Gaussian  DCT, F = 5 
DCT, F = 10  DCT, F = 15 
5.2 MRI reconstruction
We present an example of reconstructing the sheppLogan phantom image of size , to further demonstrate effectiveness of IL algorithm. In this application, the sparsity of the gradient of the image/signal denoted by is exploited, which leads to the following minimization problem:
where denotes the sampling mask in the frequency domain, and is the Fourier transform and the acquired data. With being the norm, the above formulation reduces to the celebrated total variation (TV) minimization:
The above unconstrained problem together with its regularized problem
(9) 
can be sovled efficiently by split Bregman method [24], known to be equivalent to ADMM [18]. For general sparse metric (or ), (7) gives the DC decomposition
and thus its IL algorithm for solving the regularized model takes the following form
Likewise the subproblem for updating above can also be solved by split Bregman, as the objective only differs by a linear term compared with (9).
Numerical results. In the experiment, we again choose to be the smoothed () and smoothed log respectively for the IL algorithm, and set smoothing parameter and regularization parameter for both implementations. The reconstruction results are shown in Figure 2. We find that 7 sampled projections are sufficient for both of the two penalties to recover the phantom image perfectly, in comparison to (TV) minimization which needs 10 projections for perfect image reconstruction by split Bregman. To the best of our knowledge, the other existing nonconvex solvers for minimizing either or log penalties did no better than 10 projections [7, 10, 9].
7 sampled projections  , rel. error = 0.4832 
IL, rel. error =  ILlog, rel. error = 
6 Conclusions
We developed an iterative framework for a broad class of Lipschitz continuous nonconvex sparsity promoting objectives, including those arising in statistics. The iterative algorithm is shown via theory and computation to improve on the minimization for CS problems independent of the coherence of the sensing matrices.
Acknowledgments. The authors would like to thank Yifei Lou (University of Texas at Dallas) and JongShi Pang (University of Southern California) for helpful discussions. The authors would also like to thank anonymous reviewers for their helpful comments.The work was partially supported by NSF grants DMS1222507 and DMS1522383.
References
 [1] A. Beck and M. Teboulle, A Fast Iterative ShrinkageThresholding Algorithm for Linear Inverse Problems, SIAM J. Imaging Sci., Vol. 2, No. 1, pp. 183202, 2009.
 [2] T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Applied and Computational Harmonic Analysis, 27.3 (2009): 265274.
 [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3(1):1122, 2011.
 [4] E. Candès and C. FernandezGranda, Towards a mathematical theory of superresolution, Communication on Pure and Applied Mathematics, 67(6):906956 (2014).
 [5] E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information, IEEE Transactions on Information Theory, 52(2):489509 (2006).
 [6] E. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, 59(8):12071223 (2006).
 [7] E. Candès, M. Wakin, and S. Boyd, Enhancing Sparsity by Reweighted Minimization, J. Fourier Anal. Appl., 14(5), pp. 877905, 2008.
 [8] E. Candès and T. Tao, Decoding by Linear Programming, IEEE Trans. Info. Theory, 51 (2005), pp. 42034215.
 [9] R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, Signal Process. Lett, 14(10), pp. 707710, 2007.
 [10] R. Chartrand, Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data, IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2009.
 [11] R. Chartrand and V. Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems, 24 (2008), pp. 114.
 [12] R. Chartrand and W. Yin, Iteratively Reweighted Algorithms for Compressive Sensing, IEEE international conference on acoustics, speech, and signal processing, pp. 38693872, 2008.
 [13] A. Cohen, W. Dahmen, and R. Devore, Compressed Sensing and Best Term Approximation, Journal of the American Mathematical Society, Vol 22, No. 1, pp. 221231, 2009.
 [14] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Applied Math., 57(2004), pp. 14131457.
 [15] D. Donoho, X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inform. Theory, 47, pp. 28452862, 2001.
 [16] D. Donoho. Superresolution via sparsity constraints, SIAM Journal on Mathematical Analysis, 23(5): 13091331 (1992).
 [17] D. Donoho. Compressed sensing, IEEE Transactions on Information Theory, 52(4), pp. 12891306, 2006.
 [18] E. Esser, Applications of LagrangianBased Alternating Direction Methods and Connections to Split Bregman, CAMreport 0931, UCLA, 2009.
 [19] E. Esser, Y. Lou, and J. Xin, A Method for Finding Structured Sparse Solutions to Nonnegative Least Squares Problems with Applications, SIAM J. Imaging Sciences, 6(4), pp. 20102046, 2013.
 [20] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal American Statistical Association, 96(456), pp. 13481360, 2001.
 [21] A. Fannjiang and W. Liao, Coherence PatternGuided Compressive Sensing with Unresolved Grids, SIAM J. Imaging Sciences, Vol. 5, No. 1, pp. 179202, 2012.
 [22] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing, Springer, 2013.
 [23] G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, Vol. 57, No. 12, pp. 46864698, 2009.
 [24] T. Goldstein and S. Osher, The split Bregman method for L1 regularized problems, SIAM Journal on Imaging Sciences, 2(1), pp. 323343, 2009.
 [25] E. Hale, W. Yin, and Y. Zhang, Fixedpoint continuation for minimization: Methodology and convergence, SIAM J. Optim., 19, pp. 11071130, 2008.
 [26] X. Huang, L. Shi, and M. Yan, Nonconvex Sorted Minimization for Sparse Approximation, Journal of the Operations Research Society of China, Volume 3, Issue 2, pp. 207229, 2015.
 [27] M. Lai, Y. Xu, and W. Yin, Improved Iteratively reweighted least squares for unconstrained smoothed minimization, SIAM J. Numer. Anal., Vol. 51, Issue 2, pp. 927957, 2013.
 [28] J. Lv, and Y. Fan, A unified approach to model selection and sparse recovery using regularized least squares, Annals of Statistics, 37(6A), pp. 34983528, 2009.
 [29] Y. Lou, P. Yin, Q. He, and J. Xin, Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of and , J. Sci. Comput., 64 (2015), pp. 178196.
 [30] Y. Lou, P. Yin, and J. Xin, Point Source Superresolution via Nonconvex Based Methods, J. Sci. Computing, to appear.
 [31] D. Needell and J. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal., 26 (2009), 301221.
 [32] H. Rauhut, Compresssive Sensing and Structured Random Matrices, Radon Series Comp. Appl. Math., Vol. 9, pp. 192, 2010.
 [33] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60(1992), pp. 259268.
 [34] P.D. Tao and L.T.H. An, Convex analysis approach to d.c. programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, vol. 22, no. 1, pp. 289355, 1997.
 [35] P.D. Tao and L.T.H. An, A DC optimization algorithm for solving the trustregion subproblem, SIAM Journal on Optimization, 8(2), pp. 476505, 1998.
 [36] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Royal. Statist. Soc., 58(1): 267288, 1996.
 [37] J. Tropp and A. Gilbert, Signal recovery from partial information via orthogonal matching pursuit, IEEE Trans. Inform. Theory, 53(12):46554666, 2007.
 [38] Y. Wang and W. Yin, Sparse signal reconstruction via iterative support detection, SIAM Journal on Imaging Sciences, 3.3 (2010): 462491.
 [39] Z. Xu, X. Chang, F. Xu, H. Zhang, regularization: an iterative thresholding method, IEEE Transactions on Neural Networks and Learning Systems, 23, pp. 10131027, 2012.
 [40] J. Yang and Y. Zhang, Alternating direction algorithms for problems in compressive sensing, SIAM J. Sci. Comput., 33(1), pp. 250278, 2011.
 [41] P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of for Compressed Sensing, SIAM J. Sci. Comput., 37 (2015), pp. A536A563.
 [42] P. Yin, E. Esser, J. Xin, Ratio and Difference of L1 and L2 Norms and Sparse Representation with Coherent Dictionaries, Commun. Information and Systems, 14(2), 2014, pp. 87109.
 [43] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for minimization with applications to compressed sensing, SIAM J. Imaging Sci., 1 (2008), pp. 143168.
 [44] S. Zhang and J. Xin, Minimization of Transformed L1 Penalty: Theory, Difference of Convex Function Algorithm, and Robust Application in Compressed Sensing, preprint, arXiv:1411.5735.
 [45] T. Zhang, Multistage convex relaxation for learning with sparse regularization, NIPS proceedings, 2008.
 [46] S. Zhou, N. Xiu, Y. Wang, L. Kong, and H. Qi, A nullspacebased weighted minimization approach to compressed sensing, Information and Inference, 2016: iaw002v1iaw002.