Iterative Minimization for Non-Convex Compressed Sensing
An algorithmic framework, based on the difference of convex functions algorithm (DCA), is proposed for minimizing a class of concave sparse metrics for compressed sensing problems. The resulting algorithm iterates a sequence of minimization problems. An exact sparse recovery theory is established to show that the proposed framework always improves on the basis pursuit ( minimization) and inherits robustness from it. Numerical examples on success rates of sparse solution recovery illustrate further that, unlike most existing non-convex compressed sensing solvers in the literature, our method always out-performs basis pursuit, no matter how ill-conditioned the measurement matrix is. Moreover, the iterative (IL) algorithm lead by a wide margin the state-of-the-art algorithms on and logarithimic minimizations in the strongly coherent (highly ill-conditioned) regime, despite the same objective functions. Last but not least, in the application of magnetic resonance imaging (MRI), IL algorithm easily recovers the phantom image with just 7 line projections.
Compressed sensing (CS) techniques [8, 5, 6, 17] enable efficient reconstruction of a sparse signal under linear measurements far less than its physical dimension. Mathematically, CS aims to recover an -dimensional vector with few non-zero components from an under-determined linear system of just equations, where is a known measurement matrix. The first CS technique is the convex minimization or the so-called basis pursuit :
Breakthrough results  have established that when matrix satisfies certain restricted isometry property (RIP), the solution to (1) is exactly . It was shown that with overwhelming probability, several random ensembles such as random Gaussian, random Bernoulli, and random partial Fourier matrices, are of RIP type [8, 13, 32]. Note that (1) is just a minimization principle rather than an algorithm for retrieving . Algorithms for solving (1) and its associated regularization problem :
Inspired by the success of basis pursuit, researchers then began to investigate various non-convex CS models and algorithms. More and more empirical studies have shown that non-convex CS methods usually outperform basis pursuit when matrix is RIP-like, in the sense that they require fewer linear measurements to reconstruct signals of interest. Instead of minimizing norm, it is natural to consider minimization of non-convex (concave) sparse metrics, for instance, (quasi-)norm () [11, 12, 27], capped- [45, 30], and transformed- [28, 44]. Another category of CS methods in spirit rely on support detection of . To name a few, there are orthogonal matching pursuit (OMP) , iterative hard thresholding (IHT) , (re)weighted- scheme , iterative support detection (ISD) , and their variations [31, 46, 26].
On the other hand, it has been proved that even if is not RIP-like and contains highly correlated columns, basis pursuit still enables sparse recovery under certain conditions of involving its support . In this scenario, most of the existing non-convex CS methods, however, are not that robust to the conditioning of , as suggested by . Their success rates will drop as columns of become more and more correlated. In , based on the difference of convex functions algorithm (DCA) [34, 35], the authors propose DCA- for minimizing the difference of and norms [19, 42]. Extensive numerical experiments [41, 29, 30] imply that DCA- algorithm consistently outperforms minimization, irrespective of the conditioning of .
Stimulated by the empirical evidence found in [41, 29, 30], we propose a general DCA-based CS framework for the minimization of a class of concave sparse metrics. More precisely, we consider the reconstruction of a sparse vector by minimizing sparsity-promoting metrics:
Throughout the paper, we assume that always takes the form unless otherwise stated, where defined on satisfies:
is concave and increasing.
is continuous with the right derivative .
The first condition encourages zeros in rather than small entries, since changes rapidly around the origin; the second one is imposed for the good of the proposed algorithm, as will be seen later. A number of sparse metrics in the literature enjoy the above properties, including smoothly clipped absolute deviation (SCAD) , capped-, transformed-, and of course itself. Although () and logarithm functional do not meet the second condition, their smoothed versions and are differentiable at zero. These proposed properties will be essential in the algorithm design as well as in the proof of main results.
Our proposed algorithm calls for solving a sequence of minimization subproblems. The objective of each subproblem is plus a linear term, which is convex and tractable. We further validate robustness of this framework, by showing theoretically and numerically that it performs at least as well as basis pursuit in terms of uniform sparse recovery, independent of the conditioning of and sparsity metric.
The paper is organized as follows. In section 2, we overview RIP and coherence of sensing matrices, as well as descent property of DCA. In section 3, we provide the iterated framework for non-convex minimization, with worked out examples on representative sparse objectives including the total variation. In section 4, we prove the main exact recovery results based on unique recovery property of minimization instead of RIP, which forms a theoretical basis of the better performance of DCA. In section 5, we compare iterative algorithms with two state-of-the-art non-convex CS algorithms, IRLS-  and IRL , and ADMM-, in CS test problems with varying degree of coherence. We find that iterative outperforms ADMM- independent of the sensing matrix coherence, and leads IRLS-  and IRL  in the highly coherent regime. This is consistent with earlier findings of DCA- algorithm [41, 29, 30] to which our theory also applies. We also evaluate these two non-convex metrics on a two-dimensional example of reconstructing MRI from a small number of projections, our iterative algorithm succeed with 7 projections for both metrics. Using the same objective functions, the state-of-the-art algorithms need at least 10 projections. Concluding remarks are in section 6.
Notations. Let us fix some notations. For any , is their inner product. is the vector of zeros, and similar to . is Hadamard (entry-wise) product, meaning that . is the identity matrix of dimension . For any function on , is a subgradient of at . The is the signum function on defined as
For any set , is given by
The well-known CS concept during the past decade is the restricted isometry property (RIP) introduced by Candès et al. , which is used to characterize matrices that are nearly orthonormal.
For each number , -restricted isometry constant of is the smallest such that
for all with sparsity of . The matrix is said to satisfy the -RIP with .
Mutual coherence  is another commonly-used concept closely related to the success of CS task.
The coherence of a matrix is the maximum absolute value of the cross-correlations between the columns of , namely
When matrix have small mutual coherence (incoherent) or small RIP constant, its columns tend to be more separated or distinguishable, which is intuitively favorable to identification of the supports of target signal. On the other hand, a highly coherent matrix with large coherence poses challenge to the reconstruction.
Next we give a brief review on the difference of convex functions algorithm (DCA). DCA has been widely applied to sparse optimization problems in several works [41, 19, 23, 28, 30]. For an objective function on the space , where and are lower semicontinuous proper convex functions, we call a DC decomposition of .
DCA takes the following form
Since , by the definition of subgradient, we have
The fact that minimizes was used in the first inequality above. Therefore, DCA permits a decreasing sequence , leading to its convergence provided is bounded from below.
3 Iterative framework
We then rewrite the above objective in DC decomposition form:
Clearly the first term on the right-hand side is convex in terms of . We show below that the second term is also a convex function.
is convex in .
For notational convenience, define on . Since p is concave on , we have that is convex on . We only need to show that is convex on , or equivalently, for all , ,
Case 1. If and have the same sign or one of them is . Since and is convex on , then the above inequality holds.
Case 2. If and are of the opposite sign. By the concavity of on , we have
that is, for all . Without loss of generality, we suppose . Then
In the first inequality above, we used the convexity of on , whereas in the second one, we used the fact that for . ∎
At the iteration, DCA calls for linearization of the second convex term at the current guess , and solving the resulting convex subproblem for . After converting back the linear constraint and removing the constant and the factor of , we iterate:
Be aware that denotes subgradient of at rather than subgradient of at . In this way, the subproblem reduces to minimizing plus a linear term of , which can be effciently solved by a variety of state-of-the-art algorithms for basis pursuit (with minor modifications). In Table 1, we list some non-convex metrics and the corresponding iterative algorithm.
For initialization, we take , which is basically minimization. The proposed algorithm is thus summarized in Algorithm 1 below. Due to the descending property of DCA, Algorithm 1 produces a convergent sequence . Beyond that, we shall not prove any stronger convergence result on the iterates itself in this paper. The reason is that the convergence analysis may vary individually by choice of sparse metric. We refer the readers to  and , in which subsequential convergence of is established for DCA- and DCA-transformed- respectively.
Extensions. Two natural extensions of (3) are regularized model:
and denoising model:
where is the measurement, is a general matrix, and are parameters. They find applications in magnetic resonance imaging , total variation denoising  and so on. We can show that DC decomposition of is
4 Recovery results
Although in general global minimum is not guaranteed in minimization, we can show that its performance is provably robust to the conditioning of measurement matrix , by proving that it always tends to sharpen solution.
Let us take another look at the assumptions on which were crucial in the proof of Proposition 3.1. Since is concave and increasing on , we have
and thus . Now we are ready to show the main results.
Theorem 4.1 (Support-wise uniform recovery).
Let be an arbitrary but fixed index set. If basis pursuit uniquely recovers all supported on , so does (4).
If nonzero entries of have the same magnitude, a stronger result holds that (4) recovers any fixed signal whenever basis pursuit does.
Theorem 4.2 (Recovery of equal-height signals).
Let be a signal with equal-height peaks supported on , i.e. . If the basis pursuit uniquely recovers , so does (4).
If basis pursuit uniquely recovers , then for all , . This implies that for all and , . So for all and , we have .
and also .
Although the conditions proposed in section 1 are not applicable to the metric since it is not separable, it is not hard to generalize these conditions and iterative algorithm to accommodate this case. The resulting algorithm is exactly DCA- in . We can also readily extend the recovery theory to DCA-, with and
5 Numerical experiments
5.1 Exact recovery of sparse vectors
We reconstruct sparse vector using iterative algorithm (Algorithm 2 with ) for minimizing the regularized model (5) with smoothed norm (IL-) and smoothed logarithm functional (IL-log), and compare them with two state-of-the-art non-convex CS algorithms, namely IRLS-  and IRL . Note that IRLS- and IRL attempt to minimize and logarithm, respectively, and both involve a smoothing strategy in minimization. So it would be particularly interesting to compare IL- with IRLS-, and IL-log with IRL. is chosen for IRLS- and IL- in all experiments. We shall also include ADMM-  for solving regularization (LASSO) in comparison.
Experiments are carried out as follows. We first sample a sensing matrix , and generate a test signal of sparsity supported on a random index set with i.i.d. Gaussian entries. We then compute the measurement and apply each solver to produce a reconstruction of . The reconstruction is called a success if
We run 100 independent realizations and record the corresponding success rates at different sparsity levels.
Matrix for test. We test on random Gaussian matrix whose columns satisfy
Gaussian matrices are RIP-like and have uncorrelated (incoherent) columns. For Gaussian matrix, we choose and .
We also use more ill-conditioned sensing matrix of significantly higher coherence. Specifically, a randomly oversampled partial DCT matrix is defined as
where whose components are uniformly and independently sampled from [0,1]. is the refinement factor. Coherence goes up as increases. In this setting, it is still possible to recover the sparse vector if its spikes are sufficiently separated. Specifically, we randomly select a (support of ) so that
where is called the minimum separation. It is necessary for to be at least 1 Rayleigh length (RL) which is unity in the frequency domain [21, 16]. In our case, the value of 1 RL equals . The testing matrix , i.e. , . We test at three coherence levels with . Note that for , for , and for . We also set in experiments.
For ADMM-, we let , , , , , and the maximum number of iterations
maxiter = 5000 [3, 41]. For IRLS-,
maxiter = 1000,
tol. For rewighted , the smoothing parameter is adaptively updated as introduced in , and the outer iteration criterion is stopped if the relative error between two consecutive iterates is less than .
The weighted minimization subproblems is solved by the YALL1 solver (available at http://yall1.blogs.rice.edu/). The tolerance for YALL1 was set to . All other settings of the algorithms are set to default ones.
For IL-, we let , and the smoothing parameter , where is the output from the first iteration, which is also the solution to LASSO. denotes the largest entry of . We set to . For IL-log, . The subproblems are solved by alternating direction method of multipliers (ADMM), which is detailed in . The parameters for solving subproblems are the same as that for ADMM-.
Interpretation of results. The plot of success rates is shown in Figure 1. When is Guassian, we see that all non-convex CS solvers are comparable and much better than ADMM-, with IRLS- being the best. For oversampled DCT matrices, we see that the success rates of IRLS- and IRL drop as increases, whereas the proposed IL- and IL-log are robust and consistently outperform ADMM-.
|Gaussian||DCT, F = 5|
|DCT, F = 10||DCT, F = 15|
5.2 MRI reconstruction
We present an example of reconstructing the shepp-Logan phantom image of size , to further demonstrate effectiveness of IL algorithm. In this application, the sparsity of the gradient of the image/signal denoted by is exploited, which leads to the following minimization problem:
where denotes the sampling mask in the frequency domain, and is the Fourier transform and the acquired data. With being the norm, the above formulation reduces to the celebrated total variation (TV) minimization:
The above unconstrained problem together with its regularized problem
and thus its IL algorithm for solving the regularized model takes the following form
Likewise the subproblem for updating above can also be solved by split Bregman, as the objective only differs by a linear term compared with (9).
Numerical results. In the experiment, we again choose to be the smoothed () and smoothed log respectively for the IL algorithm, and set smoothing parameter and regularization parameter for both implementations. The reconstruction results are shown in Figure 2. We find that 7 sampled projections are sufficient for both of the two penalties to recover the phantom image perfectly, in comparison to (TV) minimization which needs 10 projections for perfect image reconstruction by split Bregman. To the best of our knowledge, the other existing non-convex solvers for minimizing either or log penalties did no better than 10 projections [7, 10, 9].
|7 sampled projections||, rel. error = 0.4832|
|IL-, rel. error =||IL-log, rel. error =|
We developed an iterative framework for a broad class of Lipschitz continuous non-convex sparsity promoting objectives, including those arising in statistics. The iterative algorithm is shown via theory and computation to improve on the minimization for CS problems independent of the coherence of the sensing matrices.
Acknowledgments. The authors would like to thank Yifei Lou (University of Texas at Dallas) and Jong-Shi Pang (University of Southern California) for helpful discussions. The authors would also like to thank anonymous reviewers for their helpful comments.The work was partially supported by NSF grants DMS-1222507 and DMS-1522383.
-  A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM J. Imaging Sci., Vol. 2, No. 1, pp. 183-202, 2009.
-  T. Blumensath and M. Davies, Iterative hard thresholding for compressed sensing, Applied and Computational Harmonic Analysis, 27.3 (2009): 265-274.
-  S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3(1):1-122, 2011.
-  E. Candès and C. Fernandez-Granda, Towards a mathematical theory of super-resolution, Communication on Pure and Applied Mathematics, 67(6):906-956 (2014).
-  E. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information, IEEE Transactions on Information Theory, 52(2):489-509 (2006).
-  E. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, 59(8):1207-1223 (2006).
-  E. Candès, M. Wakin, and S. Boyd, Enhancing Sparsity by Reweighted Minimization, J. Fourier Anal. Appl., 14(5), pp. 877-905, 2008.
-  E. Candès and T. Tao, Decoding by Linear Programming, IEEE Trans. Info. Theory, 51 (2005), pp. 4203-4215.
-  R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, Signal Process. Lett, 14(10), pp. 707-710, 2007.
-  R. Chartrand, Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data, IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2009.
-  R. Chartrand and V. Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems, 24 (2008), pp. 1-14.
-  R. Chartrand and W. Yin, Iteratively Reweighted Algorithms for Compressive Sensing, IEEE international conference on acoustics, speech, and signal processing, pp. 3869-3872, 2008.
-  A. Cohen, W. Dahmen, and R. Devore, Compressed Sensing and Best -Term Approximation, Journal of the American Mathematical Society, Vol 22, No. 1, pp. 221-231, 2009.
-  I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Applied Math., 57(2004), pp. 1413-1457.
-  D. Donoho, X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inform. Theory, 47, pp. 2845-2862, 2001.
-  D. Donoho. Super-resolution via sparsity constraints, SIAM Journal on Mathematical Analysis, 23(5): 1309-1331 (1992).
-  D. Donoho. Compressed sensing, IEEE Transactions on Information Theory, 52(4), pp. 1289-1306, 2006.
-  E. Esser, Applications of Lagrangian-Based Alternating Direction Methods and Connections to Split Bregman, CAM-report 09-31, UCLA, 2009.
-  E. Esser, Y. Lou, and J. Xin, A Method for Finding Structured Sparse Solutions to Non-negative Least Squares Problems with Applications, SIAM J. Imaging Sciences, 6(4), pp. 2010-2046, 2013.
-  J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal American Statistical Association, 96(456), pp. 1348-1360, 2001.
-  A. Fannjiang and W. Liao, Coherence Pattern-Guided Compressive Sensing with Unresolved Grids, SIAM J. Imaging Sciences, Vol. 5, No. 1, pp. 179-202, 2012.
-  S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing, Springer, 2013.
-  G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, Vol. 57, No. 12, pp. 4686-4698, 2009.
-  T. Goldstein and S. Osher, The split Bregman method for L1 regularized problems, SIAM Journal on Imaging Sciences, 2(1), pp. 323-343, 2009.
-  E. Hale, W. Yin, and Y. Zhang, Fixed-point continuation for -minimization: Methodology and convergence, SIAM J. Optim., 19, pp. 1107-1130, 2008.
-  X. Huang, L. Shi, and M. Yan, Nonconvex Sorted Minimization for Sparse Approximation, Journal of the Operations Research Society of China, Volume 3, Issue 2, pp. 207-229, 2015.
-  M. Lai, Y. Xu, and W. Yin, Improved Iteratively reweighted least squares for unconstrained smoothed minimization, SIAM J. Numer. Anal., Vol. 51, Issue 2, pp. 927-957, 2013.
-  J. Lv, and Y. Fan, A unified approach to model selection and sparse recovery using regularized least squares, Annals of Statistics, 37(6A), pp. 3498-3528, 2009.
-  Y. Lou, P. Yin, Q. He, and J. Xin, Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of and , J. Sci. Comput., 64 (2015), pp. 178-196.
-  Y. Lou, P. Yin, and J. Xin, Point Source Super-resolution via Non-convex Based Methods, J. Sci. Computing, to appear.
-  D. Needell and J. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal., 26 (2009), 301-221.
-  H. Rauhut, Compresssive Sensing and Structured Random Matrices, Radon Series Comp. Appl. Math., Vol. 9, pp. 1-92, 2010.
-  L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60(1992), pp. 259-268.
-  P.D. Tao and L.T.H. An, Convex analysis approach to d.c. programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, vol. 22, no. 1, pp. 289-355, 1997.
-  P.D. Tao and L.T.H. An, A DC optimization algorithm for solving the trust-region subproblem, SIAM Journal on Optimization, 8(2), pp. 476-505, 1998.
-  R. Tibshirani, Regression shrinkage and selection via the lasso, J. Royal. Statist. Soc., 58(1): 267-288, 1996.
-  J. Tropp and A. Gilbert, Signal recovery from partial information via orthogonal matching pursuit, IEEE Trans. Inform. Theory, 53(12):4655-4666, 2007.
-  Y. Wang and W. Yin, Sparse signal reconstruction via iterative support detection, SIAM Journal on Imaging Sciences, 3.3 (2010): 462-491.
-  Z. Xu, X. Chang, F. Xu, H. Zhang, regularization: an iterative thresholding method, IEEE Transactions on Neural Networks and Learning Systems, 23, pp. 1013-1027, 2012.
-  J. Yang and Y. Zhang, Alternating direction algorithms for problems in compressive sensing, SIAM J. Sci. Comput., 33(1), pp. 250-278, 2011.
-  P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of for Compressed Sensing, SIAM J. Sci. Comput., 37 (2015), pp. A536-A563.
-  P. Yin, E. Esser, J. Xin, Ratio and Difference of L1 and L2 Norms and Sparse Representation with Coherent Dictionaries, Commun. Information and Systems, 14(2), 2014, pp. 87-109.
-  W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for minimization with applications to compressed sensing, SIAM J. Imaging Sci., 1 (2008), pp. 143-168.
-  S. Zhang and J. Xin, Minimization of Transformed L1 Penalty: Theory, Difference of Convex Function Algorithm, and Robust Application in Compressed Sensing, preprint, arXiv:1411.5735.
-  T. Zhang, Multi-stage convex relaxation for learning with sparse regularization, NIPS proceedings, 2008.
-  S. Zhou, N. Xiu, Y. Wang, L. Kong, and H. Qi, A null-space-based weighted minimization approach to compressed sensing, Information and Inference, 2016: iaw002v1-iaw002.