Recovery Analysis for Weighted \ell_{1}-Minimization Using a Null Space Property

# Recovery Analysis for Weighted ℓ1-Minimization Using a Null Space Property

Hassan Mansour, Rayan Saab Hassan Mansour is with Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 (mansour@merl.com).Rayan Saab is with the Department of Mathematics, The University of California, San Diego (rsaab@ucsd.edu).
###### Abstract

We study the recovery of sparse signals from underdetermined linear measurements when a potentially erroneous support estimate is available. Our results are twofold. First, we derive necessary and sufficient conditions for signal recovery from compressively sampled measurements using weighted -norm minimization. These conditions, which depend on the choice of weights as well as the size and accuracy of the support estimate, are on the null space of the measurement matrix. They can guarantee recovery even when standard minimization fails. Second, we derive bounds on the number of Gaussian measurements for these conditions to be satisfied, i.e., for weighted minimization to successfully recover all sparse signals whose support has been estimated sufficiently accurately. Our bounds show that weighted minimization requires significantly fewer measurements than standard minimization when the support estimate is relatively accurate.

## 1 Introduction

The application of norm minimization for the recovery of sparse signals from incomplete measurements has become standard practice since the advent of compressed sensing [1, 2, 3]. Consider an arbitrary -sparse signal and its corresponding linear measurements with , where results from the underdetermined system

 y=Ax. (1)

It is possible to exactly recover all such sparse from by solving the minimization problem

 minz ∥z∥1 subject to y=Az (2)

if satisfies certain conditions [1, 2, 3]. In particular, these conditions are satisfied with high probability by many classes of random matrices, including those whose entries are i.i.d. Gaussian random variables, when333We write when for some constant independent of and . .

One property of the measurement matrix that characterizes sparse recovery from compressive measurements is the null space property (NSP) (see, e.g.,  ) defined below.

###### Definition 1.

 A matrix is said to have the null space property of order and constant if for any vector , and for every index set with , we have

 ∥hT∥1≤C∥hTc∥1.

In this case, we say that satisfies NSP.

NSP with is a necessary and sufficient condition on the matrix for the recovery of all -sparse vectors from their measurements using (2). For example, it was shown in [5, Section 9.4] using an escape through the mesh argument based on, e.g., (cf. [8, 9]) that Gaussian random matrices satisfy the null space property with probability greater than when . Here, depends on and , but the dependence is mild enough that is a reasonable approximation when is large and is small.

While the minimization problem in (2) is suitable for recovering signals with arbitrary support sets, it is often the case in practice that signals exhibit structured support sets, or that an estimate of the support can be identified. In such cases, one is interested in modifying (2) to weaken the exact recovery conditions. In this paper, we analyze a recovery method that incorporates support information by replacing (2) with weighted minimization. In particular, given a support estimate , we solve the optimization problem

 minzN∑i=1wi|zi| subject to y=Az, where wi={w∈[0,1],i∈˜T1,i∈˜Tc. (3)

The idea behind such a modification is to choose the weight vector such that the entries of that are “expected” to be large, i.e., those on the support estimate , are penalized less.

### 1.1 Prior work

The recovery of compressively sampled signals using prior support information has been studied in several works, e.g., [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. Vaswani and Lu [11, 12, 13] proposed a modified compressed sensing approach that incorporates known support elements using a weighted minimization approach, with zero weights on the known support. Their work derives sufficient recovery conditions that are weaker than the analogous minimization conditions of  in the case where a large proportion of the support is known. This work was extended by Jacques in  to include compressible signals and account for noisy measurements.

Friedlander et al.  studied the case where non-zero weights are applied to the support estimate, further generalizing and refining the results of Vaswani and Liu; they derive tighter sufficient recovery conditions that depend on the accuracy and size of the support estimate. Mansour et al.  then extended these results to incorporate multiple support estimates with varying accuracies.

Khajehnejad et al.  also derive sufficient recovery conditions for compressively sampled signals with prior support information using weighted minimization. They partition to two sets such that the entries of supported on each set have a fixed probability of being non-zero, albeit the probabilities differ between the sets. Thus, in this work the prior information is the knowledge of the partition and probabilities. More recently, Oymak et al.  adopt the same prior information setup as  and derive lower bounds on the the minimum number of Gaussian measurements required for successful recovery when the optimal weights are chosen for each set. Their results are asymptotic in nature and pertain to the non-uniform model where one fixes a signal and draws the matrix at random. In this model, every new instance of the problem requires a new draw of the random measurement matrix. In addition to differing in our model for prior information, our results are uniform in nature, i.e., they pertain to the model where the matrix is drawn once and successful recovery is guaranteed (with high probability) for all sparse signals with sufficiently accurate support information.

Recently, Rauhut and Ward  analyzed the effectiveness of weighted minimization for the interpolation of smooth signals that also admit a sparse representation in an appropriate transform domain. Using a weighted robust null space property, they derive error bounds associated with recovering functions with coefficient sequences in weighted spaces, . This differs from our work which focuses on the effect of the support estimate accuracy on the recovery guarantees, and the relationship between the weighted null space property and the standard null space property (e.g.,).

### 1.2 Notation and preliminaries

Throughout the paper, we adopt the following notation. As stated earlier, is the -sparse signal to be recovered and denotes the vector of measurements, i.e., . Thus, and denote the number of non-zeros entries of , its ambient dimension, and the number of measurements, respectively. is the support of , and is the support estimate used in (3). The cardinality of is for some and the accuracy of is . For an index set we define

 Γs(V):={U⊂{1,...,N}:∣∣(V∩Uc)∪(Vc∩U)∣∣≤s}.

We use the notation to denote the restriction of the vector to the set . We introduce a weighted nonuniform null space property that, as we prove in Section 2, provides a necessary and sufficient condition for the recovery of sparse vectors supported on a fixed set using weighted minimization (with constant weights applied to a support estimate).

###### Definition 2.

Let with and . A matrix is said to have the weighted nonuniform null space property with parameters and , and constant if for any vector , we have

 w∥hT∥1+(1−w)∥hS∥1≤C∥hTc∥1,

where . In this case, we say satisfies -NSP.

Next we define a weighted uniform null space property that lends itself to necessary and sufficient conditions for the recovery of all -sparse vectors from compressive measurements using weighted minimization.

###### Definition 3.

A matrix is said to have the weighted null space property with parameters and , and constant if for any vector , and for every index set with and with , we have

 w∥hT∥1+(1−w)∥hS∥1≤C∥hTc∥1.

In this case, we say satisfies -NSP.

Thus, the standard null space property of order , i.e., NSP is equivalent to -NSP.

###### Remark 1.

There should be no confusion between the notation used for the weighted non-uniform and uniform null space properties, as one pertains to subsets and the other to sizes of subsets.

### 1.3 Main contributions

Necessary and sufficient conditions: Our first main result is Theorem 4, identifying necessary and sufficient conditions for weighted minimization to recover all -sparse vectors when the error in the support estimate is of size or less.

###### Theorem 4.

Given a matrix , every -sparse vector is the unique solution to all optimization problems (3) with if and only if satisfies -NSP for some .

We prove this theorem in Section 2. There, we also compare minimization to weighted minimization. For example, we show that if the accuracy of the support estimate is at least , then weighted minimization recovers if minimization recovers (Corollary 12). Moreover, when the support accuracy exceeds and the weights are sufficiently small, weighted minimization can successfully recover even when standard minimization fails (Corollary 9).

Weaker conditions on the number of measurements: Our second main result deals with matrices whose entries are i.i.d. Gaussian random variables. We establish a condition on the number of measurements, , that yields the weighted null space property, and hence guarantees exact sparse recovery using weighted minimization.

###### Theorem 5.

Let and be two subsets of with and and let be a random matrix with independent zero-mean unit-variance Gaussian entries. Then satisfies -NSP with probability exceeding provided

 m√m+1≥√k+s+C−1√2(w2k+s)ln(eN/k)+(12πe3)1/4√kln(eN/k)+√2lnϵ−1.

We observe that in the limiting case of large with small , and taking , the condition in Theorem 5 simplifies to

 m≳k+sln(eN/k),

which can be significantly smaller than the analogous condition of standard minimization  especially when the support estimate is accurate, i.e., when is small. In Section 3, we prove a more general version of Theorem 5, namely Theorem 13. Theorem 13 suggests that the choice gives the weakest condition on the number of measurements . On the other hand, Proposition 10 shows that when , and the weighted null space property holds for a weight , it also holds for weights . Taken together, these results indicate that while the bound on the number of measurements is minimized for , recovery by weighted minimization is also guaranteed for all weights (when ). We note that Theorem 13 also indicates that when , then using any weight results in a weaker condition on the number of measurements than standard minimization. In Section 3, we also develop the corresponding bounds that guarantee uniform recovery for arbitrary sets and . Finally, we present numerical simulations in Section 4 that illustrate our theoretical results.

## 2 Weighted null space property

In what follows, we describe the relationship between the weighted and standard null space properties and their associated optimization problems. Specifically, Proposition 6 establishes - with some as a necessary condition for weighted -minimization to recover all -sparse vectors from their measurements , given a support estimate with at most errors. Proposition 7 establishes that the same weighted null space property is also sufficient. Together, Propositions 6 and 7 yield Theorem 4.

Proposition 8 relates the weighted null space property to the standard null space properties of size , , and . As a consequence, Corollary 9 shows that weighted minimization can succeed when minimization fails provided the support estimate is accurate enough and the weights are small enough. Proposition 10 shows that if , i.e. the support estimate is at least accurate, then any matrix that satisfies -NSP also satisfies -NSP for all . Corollary 11 shows that the standard null space property guarantees that weighted minimization succeeds when the support estimate is at least accurate, regardless of . Corollary 12 establishes the equivalence of -NSP and NSP. This shows that weighted minimization succeeds in recovering all -sparse signals from a support estimate that is accurate if and only if minimization recovers alls -sparse signals.

###### Proposition 6.

Let be an matrix that does not satisfy -NSP for any . Then, there exists a -sparse vector satisfying and a set with such that is not the unique minimizer of the optimization problem (3).

###### Proof.

Since does not satisfy -NSP for any , there exists a vector and index sets with and with such that and

 w∥hT∥1+(1−w)∥hS∥1≥∥hTc∥1.

Define , so that . Substituting for , splitting the set , and simplifying we obtain

 ∥hT∩˜Tc∥1+w∥hT∩˜T∥1≥w∥hTc∩˜T∥1+∥hTc∩˜Tc∥1.

In other words, the weighted -norm of the vector equals or exceeds that of . So is not the unique minimizer of (3). This establishes the necessity of the -NSP condition. ∎

###### Proposition 7.

Let be an matrix that satisfies -NSP for some . Then, every -sparse vector is the unique minimizer of the optimization problem (3) provided satisfies .

###### Proof.

Let be a minimizer of (3) and define . Then by the optimality of , and using the reverse triangle inequality

 w∥hTc∥1+(1−w)∥h˜Tc∩Tc∥1≤w∥hT∥1+(1−w)∥h˜Tc∩T∥1.

Consequently

 ∥hTc∥1≤w∥hT∥1+(1−w)∥h˜T∩Tc∥1+(1−w)∥h˜Tc∩T∥1.

Setting , we note that when , the above inequality is in contradiction with -NSP for unless . We thus conclude that . ∎

###### Proposition 8.

Let be an matrix that satisfies -NSP for some as well as -NSP and -NSP for some finite . Then, satisfies -NSP, with  .

###### Proof.

Let be any fixed vector in the null space of and let and be the supports of its -largest and -largest entries (in modulus), respectively. To check whether holds for all and , it suffices to check whether holds. To see this, note that the left hand side is largest and the right hand side is smallest over all choices of and when . Since satisfies -NSP and -NSP, and have no zero entries on or , respectively (otherwise we would have , contradicting the null space property). Thus, to prove -NSP we may now examine for any vector only sets , with and , . Defining (which implies , and ) we have

 w∥hT∥1+(1−w)∥hS∥1 =w∥h˜T∥1+∥hT∩˜Tc∥1 ≤w Ck−s(∥hT∩˜Tc∥1+∥hTc∥1)+∥hT∩˜Tc∥1. (4)

Moreover, we have . Hence,

 ∥hT∩˜Tc∥1 ≤Cs(∥hTc∥1+∥h˜T∥1) ≤Cs((1+Ck−s)∥hTc∥1+Ck−s∥hT∩˜Tc∥1),

and consequently . Substituting in (4), we obtain

 w∥hT∥1+(1−w)∥hS∥1≤(1+w)CsCk−s+Cs+wCk−s1−CsCk−s∥hTc∥1, (5)

which is the desired result. ∎

###### Corollary 9.

Let be a positive integer and suppose that is the smallest constant so that the matrix satisfies -NSP. Suppose there exists an integer such that satisfies -NSP with constant . Let be any -sparse vector in , with . If satisfies and , then , the minimizer of (3) with .

###### Proof.

Proposition 8 implies that satisfies -NSP with . If and then , so Proposition 7 guarantees that . ∎

###### Proposition 10.

Let be an matrix that satisfies -NSP. If , then for every weight , the matrix satisfies -NSP.

###### Proof.

Since satisfies -NSP, then any vector satisfies

 w∥hT∥1+(1−w)∥hS∥1≤C∥hTc∥1,

for all index sets with and with . In particular, consider the sets and indexing the largest and entries in magnitude of . We have

 w∥hT∗∥1+(1−w)∥hS∗∥1≤C∥hT∗c∥1.

Let and write the above as

 v∥hT∗∥1+(w−v)∥hT∗∥1+(1−w)∥hS∗∥1≤C∥hT∗c∥1.

Since , the set , which implies

 v∥hT∗∥1+(w−v)∥hS∗∥1+(1−w)∥hS∗∥1≤C∥hT∗c∥1,

which is equivalent to

 v∥hT∗∥1+(1−v)∥hS∗∥1≤C∥hT∗c∥1.

Replacing by an arbitrary of the same size and by an arbitrary of the same size decreases the left hand side. Replacing by increases the right hand side. So,

 v∥hT∥1+(1−v)∥hS∥1≤C∥hTc∥1,

for all with and . This is -NSP(), and it holds for all once it holds for . In particular, it holds for . ∎

###### Remark 1.

The condition is satisfied when a support estimate set with has an accuracy . Therefore, Proposition 10 states that if the support estimate is at least accurate, any matrix that satisfies -NSP also satisfies -NSP for every weight .

###### Corollary 11.

Let be an matrix that satisfies -NSP with . Then, for every -sparse vector supported on some set , and for every support estimate with it holds that , the minimizer of (3) with , and .

###### Proof.

This follows from Proposition 10 by setting and (and applying Proposition 7).

###### Corollary 12.

The weighted null space property -NSP and the standard null space property -NSP are equivalent.

###### Proof.

Proposition 8 with , coupled with the observation , yield one direction of the equivalence. The other direction, i.e., that -NSP implies -NSP follows upon picking for any set in the definition of the weighted null space property. ∎

###### Remark 1.

Corollary 12 in turn implies that weighted -minimization recovers all -sparse signals from noise-free measurements given a support estimate that is accurate if and only if minimization recovers all -sparse signals from their noise-free measurements.

## 3 Gaussian Matrices

It is known that Gaussian (and more generally, sub-Gaussian) matrices satisfy the standard null space property (with high probability) provided . It is also known, (see, e.g., [5, Theorem 10.11]) that if a matrix guarantees recovery of all -sparse vectors via minimization (2), then must exceed for some appropriate constants and . The purpose of this section is to show that weighted minimization allows us recover sparse vectors beyond this bound (i.e., with fewer measurements) given relatively accurate support estimates.

We begin with some simple observations to establish a rough lower bound on the number of measurements needed for weighted minimization. We first observe that when , -NSP implies -NSP, i.e., the standard null space property of size . This can be seen by restricting to be of size at most and setting in the definition of the weighted null space property. Thus, -NSP guarantees recovery of all sparse signals via minimization. Consequently, it requires [5, Theorem 10.11]. Since in weighted minimization plays the role of the size of the error in the support estimate, then one may hope (in analogy with standard compressed sensing results) that suffices for recovery, given an accurate support estimate. However, even if one had a perfect support estimate, measurements are needed to directly measure the entries on the support. Combining these observations, we seek a bound on the number of measurements that scales (up to constants) like .

Indeed this can be deduced from Corollary 14, presented later in this section, which follows from our main technical result, Theorem 13 (which is a more general version of Theorem 5 in the Introduction). Corollary 14 entails that in the case of large and small , for a fixed support estimate all -sparse vectors supported on any set can be recovered by solving (3) when . We conclude the section with another corollary of Theorem 13, establishing a bound on the number of Gaussian measurements that guarantee -NSP.

###### Theorem 13.

Let and be two subsets of with and and let be a random matrix with independent zero-mean unit-variance Gaussian entries. Then satisfies -NSP with probability exceeding provided

 m√m+1≥ √s+αρk+C−1√2((w2−2w(1−α))ρk+s)ln(eN/k) +(12πe3)1/4√kln(eN/k)+√2lnϵ−1. (6)
###### Proof.

Our proof will be a modified version of the analogous proof for the standard null space property for Gaussian matrices , cf., , . Define the set

 HT,˜T:={h∈RN:w∥hT∥1+(1−w)∥h(T∩˜Tc)∪(Tc∩˜T)∥1≥C∥hTc∥1}.

Our aim is to show that for a random Gaussian matrix with zero-mean and unit variance entries with high probability. This will show that there are no vectors from in the null space of , i.e., that the weighted null space property holds over and . To this end, we will utilize Gordon’s escape through the mesh theorem , as in , cf., . In short, the theorem states that for an Gaussian matrix with zero-mean and unit-variance entries and for an arbitrary set ,

 (7)

where is the Gaussian width of , cf. . So we must estimate . Note that is compact so the supremum in the definition of can be replaced by a maximum. Moreover, note that

 maxh∈HT,˜T∩SN−1⟨g,h⟩=maxh∈HT,˜T∩SN−1∩{h: hi≥0} N∑i=1|gi|hi.

Define the vector with entries , , the convex cone , and its dual . We may now use duality (see, e.g., [B.40]) to conclude that

 maxh∈˜HT,˜T⟨~g,h⟩≤minz∈˜H∗T,˜T∥~g+z∥2.

To bound the right hand side from above, we introduce for (to be determined later) the set

 QtT,˜T:={z∈RN:⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩zi=wt for i∈T∩˜T,zi=t for i∈T∩˜Tc,zi=(1−w)t for i∈Tc∩˜T,zi≥−Ct for i∈Tc∩˜Tc}

and we observe that for any two vectors and

 N∑i=1zihi =tw∑i∈T∩˜Thi+t∑i∈T∩˜Tchi+t(1−w)∑i∈Tc∩˜Thi+∑i∈Tc∩˜Tczihi (8) =tw∥hT∩˜T∥1+t(1−w)∥h(T∩˜Tc)∪(Tc∩˜T)∥1+tw∥hT∩˜Tc∥1+∑i∈Tc∩˜Tczihi (9) =t(w∥hT∥1+(1−w)∥h(T∩˜Tc)∪(Tc∩˜T)∥1)+∑i∈Tc∩˜Tczihi (10) ≥t(w∥hT∥1+(1−w)∥h(T∩˜Tc)∪(Tc∩˜T)∥1)−Ct∥hTc∩˜Tc∥1 (11) ≥t(w∥hT∥1+(1−w)∥h(T∩˜Tc)∪(Tc∩˜T)∥1−C∥hTc∩˜Tc∥1)≥0. (12)

Hence and so for any

 maxh∈HT,˜T∩SN−1⟨g,h⟩≤minz∈˜H∗T,˜T∥~g+z∥2≤minz∈QtT,˜T∥~g+z∥2.

Taking expectations and defining , the soft-thresholding operator with , we have

 ℓ(HT,˜T∩SN−1)≤ Eminz∈QtT,˜T∥~g+z∥2 ≤ E∥~gT∪(Tc∩˜T)+zT∪(Tc∩˜T)∥2+Eminz∈QtT,˜T∥~g(Tc∩˜Tc)+z(Tc∩˜Tc)∥2 ≤ + Eminzi≥−Ct(∑i∈Tc∩˜Tc(~gi+zi)2)1/2 ≤ √(1+ρ−αρ)k+(√w2αρk+(1−w)2(1−α)ρk+(1−αρ)k)t + E(∑i∈Tc∩˜TcSCt(~gi)2)1/2 = √s+αρk+(√(w2−2w(1−α))ρk+s)t+E(∑i∈Tc∩˜TcSCt(~gi)2)1/2 ≤ √s+αρk+(√(w2−2w(1−α))ρk+s)t +((N−s−αρk)(2πe)1/2e−(Ct)2/2(Ct)2)1/2. (13)

Above, the third inequality utilizes the facts that when where , when where , and that when where . Moreover, the fourth inequality follows from the well known bound on standard Gaussian random vectors , namely and a straightforward computation for the soft-thresholding term. Similarly the fifth inequality results from a direct computation of where is a standard Gaussian vector (see [5, Section 9.2] for the details).

Picking , we have

 ℓ(HT,˜T∩SN−1)≤ √s+αρk+C−1√2((w2−2w(1−α))ρk+s)ln(eN/k) +(12πe3)1/4√kln(eN/k).

We now apply Gordon’s escape through the mesh theorem (7) to deduce

 P(infv∈V∥Av∥2≤m√m+1−ℓ(HT,˜T∩SN−1)−a)≤e−a2/2.

Choosing , we obtain -NSP, with probability exceeding provided

 m√m+1 ≥√s+αρk+C