A Geometrical Stability Condition for Compressed Sensing 1footnote 11footnote 1This article has been accepted for publication in Linear Algebra and its Applications. It can be found via its DOI 10.1016/j.laa.2016.04.017

A Geometrical Stability Condition for Compressed Sensing 111This article has been accepted for publication in Linear Algebra and its Applications. It can be found via its DOI 10.1016/j.laa.2016.04.017

Axel Flinth E-mail: flinth@math.tu-berlin.de Institut für Mathematik
Technische Universität Berlin
Abstract

During the last decade, the paradigm of compressed sensing has gained significant importance in the signal processing community. While the original idea was to utilize sparsity assumptions to design powerful recovery algorithms of vectors , the concept has been extended to cover many other types of problems. A noteable example is low-rank matrix recovery. Many methods used for recovery rely on solving convex programs.

A particularly nice trait of compressed sensing is its geometrical intuition. In recent papers, a classical optimality condition has been used together with tools from convex geometry and probability theory to prove beautiful results concerning the recovery of signals from Gaussian measurements. In this paper, we aim to formulate a geometrical condition for stability and robustness, i.e. for the recovery of approximately structured signals from noisy measurements.

We will investigate the connection between the new condition with the notion of restricted singular values, classical stability and robustness conditions in compressed sensing, and also to important geometrical concepts from complexity theory. We will also prove the maybe somewhat surprising fact that for many convex programs, exact recovery of a signal immediately implies some stability and robustness when recovering signals close to .

Keywords: Compressed Sensing, Convex Geometry, Grassmannian Condition Number, Sparse Recovery.

MSC(2010): Primary: 52A20, 90C25. Secondary: 94A12.

1 Introduction

Suppose that we are given linear measurements of a signal , i.e. for some matrix , and are asked to recover the signal from them. If , this will not be trivial, since the map in that case won’t be injective. If, however, one assumes that in some sense is sparse, e.g., that many of ’s entries vanish, we can still recover the signal, e.g. with the help of -minimization [8]:

()

This is the philosophy of compressed sensing, an area of mathematics which has achieved major attention over the last decade. It has become a standard technique to choose at random, and then to ask the question how large the number of measurements has to be in order for to be successful with high probability. A popular assumption is that has the Gaussian distribution, i.e., that the entries are i.i.d. standard normally distributed.

A widely used criterion to ensure that is successful is the -property. Put a bit informally, a matrix is said to possess the -property if its -constants;

are small.

The idea of using convex programs like to recover structured signals has come to be used in a much wider sense than the one above. Some examples of structure assumptions that have been considered in the literature are dictionary sparsity [10], block sparsity [20], sparsity with prior information [13, 16, 17], saturated vectors (i.e. vectors with for many ) [14] and low-rank assumptions for matrix completion [7]. Although these problems may seem very different at first sight, they can all be solved with the help of a convex program of the form

()

where is some convex function defined on an appropriate space. In all of the mentioned examples above, is chosen to be a norm, but this is not per se necessary.

The connection between the different convex program approaches was thoroughly investigated in [9], in which the very general case of being an atomic norm was investigated. An atomic norm is thereby the gauge function of the convex hull of a compact set ;

The authors derive bounds on how many Gaussian measurements are required for a convex optimization problem of the form to be able to recover a signal with high probability. Their arguments have a geometrical flavor, since they utilize a well known optimality condition regarding the descent cone (see Definition 1 and Lemma 3 below) of . Other important theoretical tools are Gaussian widths and Gordon’s escape through a mesh lemma [15], which will be discussed in Section 3.3 of this article. The authors of [3] make a similar analysis for even more general functions , only assuming that they are convex. They use the so called statistical dimension of the descent cone for determining the threshold value of measurements.

The Problem of Stability and Robustness

In applications, the linear measurements are often contaminated with noise. This means that we are actually given data , where is a noise vector. A popular assumption is that is bounded in -norm, i.e. . Moreover, it is not often not entirely realistic to assume that the signal is exactly sparse (or more generally does not exactly have the structure assumed by the model), but rather that the distance to the set of sparse (structured) signals is small, i.e.

(1)

where is some norm. If the above quantity is small for a vector , we will call it approximately structured.

There are several approaches to approximately recover approximately structured signals from noisy measurements – the one we will consider is the following regularized convex optimization problem:

()

This approach was investigated already in the earliest works on compressed sensing, where and [8]. In short, it turns out that (somewhat stronger) assumptions on the -constants suffice to prove that the solution of in this case satisfies a bound of the form

where and depend on the -constants of . In this paper, we will formulate and investigate a geometrical criterion for general convex programs to satisfy such a bound, at least when is a norm. We will call such programs robust (with respect to noise) and stable (with respect to distance of to ). The criterion, which we will call the Angular Separation Criterion or , will only depend on the relative positions of the kernel of and the descent cone . More specifically, we will prove that if the mentioned sets have a positive angular separation, will be robust and, in the case of being a norm, stable.

We will furthermore relate the to known criteria for stability and robustness from the literature. We will also prove the somewhat remarkable fact that for a very large class of norms, the is in fact implied by the ability of to recover signals exactly from noiseless measurements. The idea originates from a part of the Master Thesis [11] of the author.

Related Work and Contributions of This Paper

Some research towards a geometrical understanding of the stability of compressed sensing has already been conducted. Here we list a few of the approaches that have been considered. First, we would like to mention the so called --condition from the recent paper [5]. This paper only deals with the classical compressed sensing setting, i.e., that the signal of interest is approximately sparse and -minimization is used to recover it. A matrix is said to satisfy the --condition if there exists another matrix , which has the -property, such that . The authors of [5] prove that this is enough to secure stability and robustness of . This is intriguing, as it shows that stability and robustness can be secured by only considering the kernel of the measuring matrix.

Another line of research is the so called Robust Width Property, which was developed in [6]. The authors of said article define compressed sensing spaces, a general framework that covers both different types of sparsity in as well as the case of low rank matrices. The main part of the definition of a compressed sensing space is a norm decomposability condition; if is the norm induced by the inner product of the Hilbert space , and is another norm on , is said to be a compressed sensing space with bound if for every in the subset and , there exists a decomposition such that

A matrix is then said to satisfy the -robust width property if for every with , we have . This robust width property is in fact equivalent with the stability and robustness of with . One large difference between this approach and ours is that we do not require any norm decomposability conditions. For instance, as was pointed out in [5], the -norm in is not well suited to be included in this framework.

As for the problem of stability of recovering almost sparse vectors, we want to mention the paper [21]. The authors of that article carry out an asymptotic analysis of the threshold amount of Gaussian measurements needed for the classical technique of -minimization for sparse recovery to be stable. The analysis heavily relies on the theory of so called Grassmannian angles of polytopes, which is a purely geometrical concept. This approach has connections to, but is still relatively far away from, the one in this work. In particular, it is by no means straight-forward, if at all possible, to generalize it other problems than -minimization.

The condition presented in this paper is highly related to several other geometric stability measures: so called restricted singular values [2] of matrices on cones, as also Renegar’s condition number [4, 18] as well as the Grassmannian condition number [1] (see Section 2.1). In fact, already in the previously mentioned paper [9], it is proven that if the smallest singular value of restricted to the descent cone of the functional does not vanish, we will have stability. During the final review of this paper, the author was made aware of the fact that also the connection between Renegar’s Condition number and robustness of compressed sensing has recently been investigated in [19].

This work provides a new, more elementary, perspective to the above mentioned notions, and in particular establishes its relations to classical criteria for stability in compressed sensing. Another contribution of this paper is the observation that if is a norm, the criterion also implies stability, an observation which, to the best of the knowledge of the author, has not been done before.

Notation

Throughout the whole paper, will denote a general, finite-dimensional Hilbert space. The corresponding inner product will be denoted , and the induced norm by . For a subspace , we will write

Note that due to the finite-dimensionality of , this infimum is in fact attained.

In the final part of the paper, we will deal with Gaussian vectors and Gaussian matrices. A random vector, or linear map, is said to be Gaussian if its representation in an orthonormal basis, or in a pair of such, has i.i.d. standard normally distributed entries.

The entries of a vector in will be denoted . is equipped with the standard scalar product whose induced norm is the -norm:

2 A Geometrical Robustness Condition

Let us begin by considering the most classical compressed sensing setting: that is, the problem of retrieving a signal with few non-zero entries from exact measurements using -minimization (i.e. ). One of the most well known criteria for to be successful is the so called Null Space Property, . A matrix satisfies the with respect to the index set if for every , we have

(2)

The with respect to is in fact equivalent to recovering a signal with support [12]. The has a geometrical meaning. In order to explain it, let us first define descent cones of a function .

Definition 1.

Let be a finite-dimensional Hilbert space, and . The descent cone of at the point is the cone generated by the descent directions of at , i.e.

Example 2.

The descent cone of the -norm at a vector supported on the set is given by

Proof: Since the conditions for to belong to both the left hand and the right hand set, respectively, is invariant under scaling, we may assume that has small norm. Then we have

which implies . This is smaller than or equal to zero exactly when

With the last example in mind, it is not hard to convince oneself that Equation (2) exactly states that the vector does not lie in the descent cone of any signal supported on the set . I.e., the actually reads

This observation can be generalized to more general situations, as the following well-known lemma shows.

Lemma 3.

(E.g. [9, Proposition 2.1].) Let be convex, be linear and consider the program , with for noiseless recovery of the signal . The solution of is equal to if and only if

(3)
Figure 1: The impact of an angular separation of and . Note that locally, the sets and have the same structure.

Can we use the previous lemma to develop a geometrical intuition of what we have to assume in order to prove stability and robustness of the recovery using ? The main difference between the program and is that the former is only allowed to search for a solution in the set , while the latter can search in a tubular neighborhood of the same set. Figure 1 suggests that if the descent cone and do not only trivially intersect each other, but also have an angular separation, it should be possible to prove that the intersection of the mentioned tubular neighborhood and the set is not large. This should in turn imply robustness. (In fact, this intuition was used already in [8] when proving robustness in the original compressed sensing setting.) To provide a precise formulation of angular separation, we first define the -expansion of a cone.

Definition 4.

Let be a finite-dimensional Hilbert space with norm and scalar product . Let further be a convex cone, i.e. a convex set with for every . Then for , we define the -expansion as the set

For an illustration of the relation between a cone and its -expansion , see Figure 2. Before moving on, let us make some remarks.

Figure 2: A convex cone and its -expansion .
Remark 5.
  1. for every .

  2. If is closed, can alternatively defined as the set

  3. is always a cone, but not always convex. As a concrete counterexample, consider the closed, convex cone

    Using the previous remark, we can calculate exactly. We have for

    where the last equality can be proven using elementary calculus. The last remark now tells us that is given by

    This set is not convex; for instance, the points and are both contained in the set, whereas is not. See also Figure 3.

Remark 6.

It is not hard to convince oneself that

defines a metric on . In particular, the triangle inequality holds:

After these preparations, we may state our robustness condition.

Proposition 7.

Let and be linear. Consider the program , where is convex and with . If there exists a such that

(4)

then there exists a constant so that the solution of obeys

depends on and the smallest non-zero singular value of .

For simplicity, we will call (4) the -angular separation condition, or -. We will in fact not prove this proposition directly. Instead, we first establish a connection to so called restricted singular values of the matrix , which then implies Proposition 7 as a corollary.

Figure 3: The convex cone and its non-convex -extension

The concept of restricted singular values was extensively studied in the paper [2]. There, the singular value of a linear map restricted to the cones and was defined as by

where denotes the Euclidean (metric, orthogonal) projection (or nearest point map) of the convex set :

(See also, for instance, [3].) If , one also speaks of the minimal gain [9].

Interesting for us is that if , will be robust.

Lemma 8.

(See [2], [9, Proposition 2.2].) Let be convex and be linear with . If , any solution of obeys

Remark 9.
  1. Note that in the references we cited, the lemma is proved under slightly less general conditions (e.g., the function is assumed to be a norm). The proof does however work, line for line, also in our case, and we therefore omit it, for the sake of brevity.

  2. As is indicated by the formulation of Lemma 8, the solution vector by no means has to be unique, even for arbitrarily small . We can construct an example of such a situation as follows: Consider a sparse vector supported on the set and a matrix such that - i.e. in particular that is recovered from its exact measurements by . Suppose that the solution of is unique for some and is supported on a set which is larger than (this is arguably the most common situation). For any , consider the matrix formed by concatenating with a copy of the :th column of , and the vectors and formed by concatenating and with a zero, respectively. It is then clear that still is recovered from the exact measurements via the exact program , and hence (see Section 3.1) , but that any vector

    for solves . Hence, the solution of the relaxed problem for is not unique.

We now prove that the to some extent is equivalent to .

Lemma 10.

Let be a non-empty convex cone and be a linear map. Then the following are equivalent

  1. There exists a such that .

  2. .

In particular, if denotes the smallest/largest non-vanishing singular value of , respectively, we have for every with that

(5)
Proof.

. Suppose that and let have unit norm. Then we have for every

since due to , there must be . Since was arbitrary, we obtain . This has the consequence

It follows that , since due to , which follows from , and .

Suppose that and let and have unit norm. Define through . Our goal is to prove that has to be larger than some number .

Since , we have . We also have for every , due to ,

Choosing yields

where we in the last step used . This proves the claim. ∎

Now Proposition 7 easily follows from combining Lemma 8 with Lemma 10. We will return to the connection between restricted singular values and the in the next section.

Remark 11.

The is not necessary for robust recovery. Consider for example -minimization in :

()

In order for -minimization to exactly recover a signal (which of course is necessary for robust recovery, choose , we need to have , since the solution of is given by . Furthermore, is given by . We claim that this implies that for each , .

To see why, let and be a nonzero element of . Such an element necessarily exists as soon as . For any , . This since we have due to . Consequently, , i.e. .

Now consider the quotient

By letting , this quotient can be made arbitrarily close to . Since , this means that for every . Hence, we have a non-trivial intersection between the -expansion of the descent cone and the kernel for every .

It is, however, not hard to convince oneself that the solution of necessarily lies in (any part in can be removed without affecting , and at the same time making smaller). We already argued that also has this property. This has the immediate consequence that

i.e. we have robustness.

Now we prove that under the assumption that is a norm on , the in fact also implies stability.

Theorem 12.

Let be a norm on and consider the convex program

(6)

where with . Suppose that there exists a so that fulfills the - for every in some subset . Then the following is true for (6): There exist constants and such that any solution fulfills

(7)

Here, denotes the cone generated by , i.e., the set . The first constant depends on , and . The second constant depends on , and the condition number of .

Proof.

(See Fig. 4 for a graphical depiction of the proof.) Let be a, not necessarily unique, vector in with . Due to the homogenity of , we have for every . Therefore we may without loss of generality scale the problem such that . Also, since all norms on the finite dimensional space are equivalent, there exists so that for each , . These two facts have the consequence that

Since , this implies that

i.e. . Let be the vector in so that .

Figure 4: The proof of Proposition 12.

Now since is a solution of (6), there must be , i.e. , where the latter is due to the homogeneity of .

Now we have

Due to and , (4) implies that . This has the consequence that

which implies that . Since

we have . Finally, we estimate

where and , which is what we wanted to prove. ∎

2.1 and Two Other Geometrical Notions of Stability.

Let us end this section by briefly discussing the connection between the -condition and two other measures for stability of a linear embedding, namely the so-called Renegar’s condition number[4, 18] and the Grassmannian condition number [1] of a matrix. They were originally introduced to study the stability of the homogeneous convex feasibility problem: Given a closed convex cone with non-empty interior not containing a subspace (a regular cone), for which does there exist a with

Let us call matrices such feasible. The connection to our problem follows from duality: given a cone whose polar is regular, it is well known that if the range of the transposed matrix intersects the , the kernel of a matrix can’t intersect non-trivially (see for instance [4]).

Since the range of always is equal to the orthogonal complement of , it makes sense to define the following sets of subspaces :

denotes the Grassmannian manifold of -dimensional subspaces of . It can be proven that both and are closed in and that they share a common boundary . The Grassmannian condition number of a matrix with respect to a regular cone is defined as the inverse of the distance of to , i.e.

The distance is thereby calculated with respect to the canonical metric on : if and are -dimensional subspaces of and and are the orthogonal projections onto them, we define

Here, denotes the operator norm.

is very closely related to the -condition: if is feasible, the largest angle so that satisfies the - satisfies [1, Proposition 1.6]

(8)

The Grassmannian condition number is itself closely related to the so-called Renegar’s condition number . Given a feasible matrix (in the same sense as above), it is defined as the distance from to the set of infeasible matrices, i.e.

In our setting, it can be proven that [19, Lemma 2.2]

(9)

The article [19] contains some more details and interesting results concerning the connection between and the robustness properties of compressed sensing problems. In particular, they prove the following version of Lemma 8 of this article: if we assume that the noise level is below (which in particular is interesting when the measurement matrix only is known up to some error ), every solution of the program obeys

Let us end this section by noting that one can prove Lemma 10 by using the following inequality from [1, Theorem 1.4] :

Using (8) and (9), this can be rewritten as

which is exactly the inequality (5).

3 When is the satisfied?

Having established that the implies stability and robustness for signal recovery using the convex program (6), it is of course interesting to ask for which matrices this condition is satisfied. In this section, we will first prove the maybe somewhat remarkable fact that, for many reasonable norms, the weak -like condition (3) in fact implies that the is satisfied for some .

However, the above reasoning only yields the existence of a with (4), and does not give any control of the size of . Therefore, we will also briefly discuss the relation between already known stability conditions for compressed sensing, and that the can be secured with high probability using random Gaussian matrices. Using the concept of Gaussian widths, we will argue that if one needs measurements to secure that recovers a signal with high probability from noiseless measurements, we need to secure the for .

3.1 Exact Recovery Implies Some Stability and Some Robustness.

The first result of this subsection was essentially already proven in [2]. Let us state it, and for completeness also give a proof, and then discuss its implications and limitations.

Theorem 13.

Let be a closed convex cone, and be linear. Then if , there exists a such that , i.e., the - holds.

Proof.

Under the assumption that and are closed, the restricted singular value vanishes if and only if either or [2, Proposition 2.2]. Since in our case, , and we hence by contraposition have the equivalence

Since by Lemma 10, is equivalent to the existence of a such that , the claim is proven. ∎

On a theoretical level, the last proposition implies that as soon as the recovery of some class of signals from exact measurements with the help of a convex program is guaranteed, we also have stability and robustness for the recovery of signals close to from noisy measurements. As simple and beautiful the result is, it has its flaws. In particular, we have no control whatsoever over the size of the parameter , which in turn implies that we have no control over the constants in (7).

In the case that the norm has a unit ball which is a polytope, we can do a bit better. Although we still cannot provide any general bound on the size of , we can prove that it will have the same size for all points lying in the same face of the unit ball.

Before stating and proving the result, let us note that the assumption that the unit ball of is a polytope is not far-fetched. In particular, it is true for both -minimization (and its many variants, i.e., also for weighted norms etc.), and for -minimization – or in general any atomic norm generated by a finite set of atoms . Let us now formulate the main part of the argument in the following lemma.

Lemma 14.

Let be a closed polytope and be a union of faces of . Suppose that the linear subspace has the property that for each , intersects only in . Then for each , there exists a such that

The size of is only dependent on which face lies in.

Although the proof of this lemma is elementary, it is relatively long. Therefore, we postpone it to Appendix A. Instead, we use it to prove the aforementioned result about stability and robustness for recovery using convex programs involving norms with polytope unit balls.

Corollary 15.

Let be given. Suppose that is a norm whose unit ball is a polytope, and be a union of faces of that polytope. If the program recovers from the noiseless measurements for every , all signals close to the cone generated by will be stably and robustly recovered by in the sense of (7). The constants and will only depend on which face the normalized version of lies closest to.

Proof.

Since each in is recovered exactly by by noiseless measurements, we will by Lemma 3 have for each . Since the descent cone of at is generated by the vectors , where , the conditions of Lemma 14 are satisfied. Said lemma therefore implies that for , where only depends on which face lies in. This together with Theorem 12 implies the claim. ∎

3.2 Compared to Classical Stability and Robustness Conditions in Compressed Sensing.

In the following, we will relate the to two well-known criteria for stability and robustness of -minimization from the literature: the and the . We will begin by considering the . It is a well-known fact that if the restricted isometry constant is small, the program will recover any -sparse vector in a robust and stable manner. E.g. in [12, Theorem 6.12], it is proved that if , (7) will be satisfied for some constants , only dependent on . Having this in mind, it is of course interesting to ask oneself if it is possible to directly prove that a small will imply the for some . The next proposition gives a positive answer to that question, and it furthermore provides the control of the size of we lacked in the previous section.

Proposition 16.