Is the 1norm the best convex sparse regularization?
Abstract
The 1norm is a good convex regularization for the recovery of sparse vectors from underdetermined linear measurements. No other convex regularization seems to surpass its sparse recovery performance. How can this be explained? To answer this question, we define several notions of “best” (convex) regularization in the context of general lowdimensional recovery and show that indeed the 1norm is an optimal convex sparse regularization within this framework.
1 Introduction
We consider the observation model in a Hilbert space (with associated norm ):
(1) 
where is an underdetermined linear operator, is a dimensional vector and is the unknown. We suppose that belongs to a lowdimensional model (a union of subspaces). We consider the following minimization program.
(2) 
where is a regularization function. A huge body of work gives practical regularizations ensuring that for several lowdimensional models (in particular sparse and low rank models, see [5] for a most complete review of these results) and convex regularizations. The operator is generally required to satisfy some property (e.g., the restricted isometry property (RIP)) to guarantee recovery. In this work, we aim at finding the “best” convex regularization for exact recovery of .
Best regularization with respect to a low dimensional model.
We describe the framework to define what is the “best” regularization in a set of convex functions that was initiated in [8] (This work is a followup of this article^{1}^{1}1The full version of [8] with proofs is avalaible at https://hal.inria.fr/hal01720871). If we do not have prior information on , we want to build a compliance measure that summarizes the notion of good regularization with respect to and maximize it
(3) 
In the sparse recovery example studied in this article, the existence of a maximum of is verified. However, we could ask ourselves what conditions on and are necessary and sufficient for the existence of a maximum, which is out of the scope of this article.
Compliance measures.
When studying recovery with a regularization function , two types of guarantees are generally used: uniform and nonuniform. To describe these recovery guarantees, we use the following definition of descent vectors.
Definition 1.1 (Descent vectors).
For any , the collection of descent vectors of at is
(4) 
We write . When is convex these sets are cones. Recovery is characterized by descent vectors (recall that is the result of minimization (2)):

Uniform recovery: Let be a linear operator. Then “for all , ” is equivalent to .

Nonuniform recovery: Let be a linear operator and . Then is equivalent to .
Hence, a regularization function is “good” if leaves a lot of space for to not intersect it (trivially). In dimension , if there is no orientation prior on the kernel of , the amount of space left can be quantified by the “volume” of where is the unit sphere with respect to . Hence, in dimension , we define a compliance measure for uniform recovery as:
(5) 
More precisely, here, the volume of a set is the measure of with respect to the uniform measure on the sphere (i.e. the dimensional Haussdorf measure of ). When looking at nonuniform recovery for random Gaussian measurements, the quantity represents the probability that a randomly oriented kernel of dimension 1 intersects (non trivially) . The highest probability of intersection with respect to quantifies the lack of compliance of , hence we can define:
(6) 
Note that this can be linked with the Gaussian width and statistical dimension theory of sparse recovery [3, 1]. In infinite dimension, the volume of the sphere vanishes, making the measures above uninteresting. However, [7] and [6] show that we can often come back to a lowdimensional recovery problem in an intermediate finite (potentially high dimensional) subspace of . Adapting the definition of to this subspace allows to extend these compliance measures.
While it was shown that the norm is indeed the best atomic norm for and in the minimal case of 1sparse recovery for in [8], extending these exact calculations to the case of sparse recovery in dimension seems out of reach.
Compliance measures based on the RIP.
For uniform recovery, another possibility is to use recovery results based on the restricted isometry property. They have been shown to be adequate for multiple models [7], to be tight in some sense for sparse and low rank recovery [4], to be necessary in some sense [2] and to be well adapted to the study of random operators [6].
Definition 1.2 (RIP constant).
Let be a union of subspaces and be a linear map, the RIP constant of is defined as
(7) 
where (differences of elements of ) is called the secant set.
It has been shown that if has a RIP with constant on the secant set , with being fully determined by and [7], then uniform stable recovery is possible. The explicit constant is only sufficient (and sharp in some sense for sparse and low rank recovery). An ideal RIP based compliance measure would be to use a sharp RIP constant (unfortunately, it is an open question to derive analytical expressions of this constant for sparsity and other lowdimensional models) defined as:
(8) 
It is the best RIP constant of measurement operators where uniform recovery fail. When increases, permits recovery of for more measurement operators (less stringent RIP condition). Hence can be viewed as a compliance measure:
(9) 
The lack of practical analytic expressions for limits the possibilities of exact optimization with respect to . We propose to look at two RIP based compliance measures:

A measure based on necessary RIP conditions [4] which yields sharp recovery constants for particular operators, e.g.,
(10) where is the orthogonal projection onto the onedimensional subspace (other intermediate necessary RIP constants can be defined). Another open question is to determine whether generally or for some particular models.

A measure based on sufficient RIP constants for recovery, i.e. from [7].
Note that we have the relation
(11) 
To summarize, instead of considering the most natural RIPbased compliance measure (based on ), we use the best known bounds of this measure. Moreover, in [7, Lemma 2.1], it has been shown that given a coercive convex regularization , there is always a atomic norm (always convex) with atoms included in the model such that .
Definition 1.3.
The atomic “norm” induced by the set is defined as:
(12) 
where is the closure of the convex hull of .
This implies that . In consequence, we look for best regularisations in the set .
2 Optimality of the norm for RIPbased compliance measures
We set and with and . Hence . It is possible to show [8]:
(13) 
where and is a notation for the support of biggest coordinates in , i.e. for all , we have .
Similarly to the necessary case, we can show
(14) 
where and denotes the support of the biggest coordinates of . The norm is the atomic norm generated by the set of atoms . Remark the similarity between the fundamental quantity to optimize for the necessary case and the sufficient case, and , this leads us to think that our control of is rather tight. Optimizing and for gives the result:
Theorem 2.1.
Let , , and . We have
(15) 
Note that contrary to [8] where multiples of the norm where the sole maximizers of these compliance measures among weighted norm, unicity among atomic norms has yet to be proven.
3 Discussion and future work
We have shown that, not surprisingly, the norm is an optimal convex regularization for sparse recovery within this framework. The important point is that we could explicitly quantify a notion of good regularization. This is promising for the search of optimal regularizations for more complicated lowdimensional models such as “sparse and low rank” models or hierarchical sparse models. We also expect similar results for lowrank recovery and the nuclear norm as technical tools are very similar.
We used compliance measures based on (uniform) RIP recovery guarantees to give results for the general sparse recovery case, it would be interesting to do such analysis using (nonuniform) recovery guarantees based on the statistical dimension or Gaussian width of the descent cones [3, 1].
Finally, while these compliance measures are designed to make sense with respect to known results in the area of sparse recovery, one might design other compliance measures tailored for particular needs (e.g. structured operators ), in this search for optimal regularizations.
Acknowledgements
This work was partly supported by the CNRS PEPS JC 2018 (project on efficient regularizations).
References
 [1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp. Living on the edge: phase transitions in convex programs with random data. Information and Inference, 3(3):224–294, 2014.
 [2] A. Bourrier, M. Davies, T. Peleg, P. Perez, and R. Gribonval. Fundamental performance limits for ideal decoders in highdimensional linear inverse problems. Information Theory, IEEE Transactions on, 60(12):7928–7946, 2014.
 [3] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805–849, 2012.
 [4] M. E. Davies and R. Gribonval. Restricted isometry constants where sparse recovery can fail for . Information Theory, IEEE Transactions on, 55(5):2203–2214, 2009.
 [5] S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Springer, 2013.
 [6] G. Puy, M. E. Davies, and R. Gribonval. Recipes for stable linear embeddings from hilbert spaces to . arXiv preprint arXiv:1509.06947, 2015.
 [7] Y. Traonmilin and R. Gribonval. Stable recovery of lowdimensional cones in Hilbert spaces: One RIP to rule them all. Applied And Computational Harmonic Analysis, In Press, 2016.
 [8] Y. Traonmilin and S. Vaiter. Optimality of 1norm regularization among weighted 1norms for sparse recovery: a case study on how to find optimal regularizations. 8th International Conference on New Computational Methods for Inverse Problems, 2018.