Is the 1-norm the best convex sparse regularization?

# Is the 1-norm the best convex sparse regularization?

Yann Traonmilin, Samuel Vaiter, Rémi Gribonval.
CNRS, Institut de Mathématiques de Bordeaux, Talence, France; CNRS, Institut de Mathématiques de Bourgogne, Dijon, France;
Univ Rennes, Inria, CNRS, IRISA.
###### Abstract

The 1-norm is a good convex regularization for the recovery of sparse vectors from under-determined linear measurements. No other convex regularization seems to surpass its sparse recovery performance. How can this be explained? To answer this question, we define several notions of “best” (convex) regularization in the context of general low-dimensional recovery and show that indeed the 1-norm is an optimal convex sparse regularization within this framework.

## 1 Introduction

We consider the observation model in a Hilbert space (with associated norm ):

 y=Mx0 (1)

where is an under-determined linear operator, is a -dimensional vector and is the unknown. We suppose that belongs to a low-dimensional model (a union of subspaces). We consider the following minimization program.

 x∗∈argminMx=yR(x) (2)

where is a regularization function. A huge body of work gives practical regularizations ensuring that for several low-dimensional models (in particular sparse and low rank models, see  for a most complete review of these results) and convex regularizations. The operator is generally required to satisfy some property (e.g., the restricted isometry property (RIP)) to guarantee recovery. In this work, we aim at finding the “best” convex regularization for exact recovery of .

### Best regularization with respect to a low dimensional model.

We describe the framework to define what is the “best” regularization in a set of convex functions that was initiated in  (This work is a follow-up of this article111The full version of  with proofs is avalaible at https://hal.inria.fr/hal-01720871). If we do not have prior information on , we want to build a compliance measure that summarizes the notion of good regularization with respect to and maximize it

 R∗∈argmaxR∈CAΣ(R). (3)

In the sparse recovery example studied in this article, the existence of a maximum of is verified. However, we could ask ourselves what conditions on and are necessary and sufficient for the existence of a maximum, which is out of the scope of this article.

### Compliance measures.

When studying recovery with a regularization function , two types of guarantees are generally used: uniform and non-uniform. To describe these recovery guarantees, we use the following definition of descent vectors.

###### Definition 1.1 (Descent vectors).

For any , the collection of descent vectors of at is

 TR(x):={z∈H:R(x+z)≤R(x)}. (4)

We write . When is convex these sets are cones. Recovery is characterized by descent vectors (recall that is the result of minimization (2)):

• Uniform recovery: Let be a linear operator. Then “for all , ” is equivalent to .

• Non-uniform recovery: Let be a linear operator and . Then is equivalent to .

Hence, a regularization function is “good” if leaves a lot of space for to not intersect it (trivially). In dimension , if there is no orientation prior on the kernel of , the amount of space left can be quantified by the “volume” of where is the unit sphere with respect to . Hence, in dimension , we define a compliance measure for uniform recovery as:

 AUΣ(R):=1−vol(TR(Σ)∩S(1))vol(S(1)). (5)

More precisely, here, the volume of a set is the measure of with respect to the uniform measure on the sphere (i.e. the -dimensional Haussdorf measure of ). When looking at non-uniform recovery for random Gaussian measurements, the quantity represents the probability that a randomly oriented kernel of dimension 1 intersects (non trivially) . The highest probability of intersection with respect to quantifies the lack of compliance of , hence we can define:

 ANUΣ(R):=1−supx∈Σvol(TR(x)∩S(1))vol(S(1)) (6)

Note that this can be linked with the Gaussian width and statistical dimension theory of sparse recovery [3, 1]. In infinite dimension, the volume of the sphere vanishes, making the measures above uninteresting. However,  and  show that we can often come back to a low-dimensional recovery problem in an intermediate finite (potentially high dimensional) subspace of . Adapting the definition of to this subspace allows to extend these compliance measures.

While it was shown that the -norm is indeed the best atomic norm for and in the minimal case of 1-sparse recovery for in , extending these exact calculations to the case of -sparse recovery in dimension seems out of reach.

### Compliance measures based on the RIP.

For uniform recovery, another possibility is to use recovery results based on the restricted isometry property. They have been shown to be adequate for multiple models , to be tight in some sense for sparse and low rank recovery , to be necessary in some sense  and to be well adapted to the study of random operators .

###### Definition 1.2 (RIP constant).

Let be a union of subspaces and be a linear map, the RIP constant of is defined as

 δ(M)=supx∈Σ−Σ∣∣ ∣∣∥Mx∥2H∥x∥2H−1∣∣ ∣∣, (7)

where (differences of elements of ) is called the secant set.

It has been shown that if has a RIP with constant on the secant set , with being fully determined by and  , then uniform stable recovery is possible. The explicit constant is only sufficient (and sharp in some sense for sparse and low rank recovery). An ideal RIP based compliance measure would be to use a sharp RIP constant (unfortunately, it is an open question to derive analytical expressions of this constant for sparsity and other low-dimensional models) defined as:

 δsharpΣ(R):=infM:kerM∩TR(Σ)≠{0}δ(M). (8)

It is the best RIP constant of measurement operators where uniform recovery fail. When increases, permits recovery of for more measurement operators (less stringent RIP condition). Hence can be viewed as a compliance measure:

 ARIPΣ(R)=δsharpΣ(R). (9)

The lack of practical analytic expressions for limits the possibilities of exact optimization with respect to . We propose to look at two RIP based compliance measures:

• A measure based on necessary RIP conditions  which yields sharp recovery constants for particular operators, e.g.,

 ARIP,necΣ(R)=δnecΣ(R):=infz∈TR(Σ)∖{0}δ(I−Πz). (10)

where is the orthogonal projection onto the one-dimensional subspace (other intermediate necessary RIP constants can be defined). Another open question is to determine whether generally or for some particular models.

• A measure based on sufficient RIP constants for recovery, i.e. from .

Note that we have the relation

 δsuffΣ(R)≤δsharpΣ(R)≤δnecΣ(R). (11)

To summarize, instead of considering the most natural RIP-based compliance measure (based on ), we use the best known bounds of this measure. Moreover, in [7, Lemma 2.1], it has been shown that given a coercive convex regularization , there is always a atomic norm (always convex) with atoms included in the model such that .

###### Definition 1.3.

The atomic “norm” induced by the set is defined as:

 ∥x∥A:=inf{t∈R+:x∈t⋅¯¯¯¯¯¯¯¯¯¯¯conv(A)} (12)

where is the closure of the convex hull of .

This implies that . In consequence, we look for best regularisations in the set .

## 2 Optimality of the ℓ1-norm for RIP-based compliance measures

We set and with and . Hence . It is possible to show :

 argmaxR∈CΣARIP,necΣ(R)=argminR∈CΣBΣ(R) (13)

where and is a notation for the support of biggest coordinates in , i.e. for all , we have .

Similarly to the necessary case, we can show

 argmaxR∈CΣARIP,suffΣ(R)=argminR∈CΣDΣ(R) (14)

where and denotes the support of the biggest coordinates of . The norm is the atomic norm generated by the set of atoms . Remark the similarity between the fundamental quantity to optimize for the necessary case and the sufficient case, and , this leads us to think that our control of is rather tight. Optimizing and for gives the result:

###### Theorem 2.1.

Let , , and . We have

 (15)

Note that contrary to  where multiples of the -norm where the sole maximizers of these compliance measures among weighted -norm, unicity among atomic norms has yet to be proven.

## 3 Discussion and future work

We have shown that, not surprisingly, the -norm is an optimal convex regularization for sparse recovery within this framework. The important point is that we could explicitly quantify a notion of good regularization. This is promising for the search of optimal regularizations for more complicated low-dimensional models such as “sparse and low rank” models or hierarchical sparse models. We also expect similar results for low-rank recovery and the nuclear norm as technical tools are very similar.

We used compliance measures based on (uniform) RIP recovery guarantees to give results for the general sparse recovery case, it would be interesting to do such analysis using (non-uniform) recovery guarantees based on the statistical dimension or Gaussian width of the descent cones [3, 1].

Finally, while these compliance measures are designed to make sense with respect to known results in the area of sparse recovery, one might design other compliance measures tailored for particular needs (e.g. structured operators ), in this search for optimal regularizations.

## Acknowledgements

This work was partly supported by the CNRS PEPS JC 2018 (project on efficient regularizations).

## References

•  D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp. Living on the edge: phase transitions in convex programs with random data. Information and Inference, 3(3):224–294, 2014.
•  A. Bourrier, M. Davies, T. Peleg, P. Perez, and R. Gribonval. Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems. Information Theory, IEEE Transactions on, 60(12):7928–7946, 2014.
•  V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805–849, 2012.
•  M. E. Davies and R. Gribonval. Restricted isometry constants where sparse recovery can fail for . Information Theory, IEEE Transactions on, 55(5):2203–2214, 2009.
•  S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Springer, 2013.
•  G. Puy, M. E. Davies, and R. Gribonval. Recipes for stable linear embeddings from hilbert spaces to . arXiv preprint arXiv:1509.06947, 2015.
•  Y. Traonmilin and R. Gribonval. Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all. Applied And Computational Harmonic Analysis, In Press, 2016.
•  Y. Traonmilin and S. Vaiter. Optimality of 1-norm regularization among weighted 1-norms for sparse recovery: a case study on how to find optimal regularizations. 8th International Conference on New Computational Methods for Inverse Problems, 2018.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   