Accuracy guarantees for \ell_{1}-recovery Research of the second author was supported by the Office of Naval Research grant # N000140811104 and the NSF grant DMS-0914785.

# Accuracy guarantees for ℓ1-recovery ††thanks: Research of the second author was supported by the Office of Naval Research grant # N000140811104 and the NSF grant DMS-0914785.

Anatoli Juditsky
LJK, Université J. Fourier, B.P. 53, 38041 Grenoble Cedex 9, France
Anatoli.Juditsky@imag.fr
Georgia Institute of Technology, Atlanta, Georgia 30332, USA
nemirovs@isye.gatech.edu
###### Abstract

We discuss two new methods of recovery of sparse signals from noisy observation based on -minimization. While they are closely related to the well-known techniques such as Lasso and Dantzig Selector, these estimators come with efficiently verifiable guaranties of performance. By optimizing these bounds with respect to the method parameters we are able to construct the estimators which possess better statistical properties than the commonly used ones.

We link our performance estimations to the well known results of Compressive Sensing and justify our proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance when the signal support is known. We also show how the estimates can be computed using the Non-Euclidean Basis Pursuit algorithm.

##### Key words

: sparse recovery, linear estimation, oracle inequalities, nonparametric estimation by convex optimization

: 62G08, 90C25

## 1 Introduction

Recently several methods of estimation and selection which refer to the -minimization received much attention in the statistical literature. For instance, Lasso estimator, which is the -penalized least-squares method is probably the most studied (a theoretical analysis of the Lasso estimator is provided in, e.g., [2, 3, 4, 19, 20, 21, 17, 18], see also the references cited therein). Another, closely related to the Lasso, statistical estimator is the Dantzig Selector [7, 2, 16, 17]. To be more precise, let us consider the estimation problem as follows. Assume that an observation

 y=Ax+σξ∈Rm (1)

is available, where is an unknown signal and is a known sensing matrix. We suppose that is a Gaussian disturbance with (i.e., , where are independent normal r.v.’s with zero mean and unit variance), and is a known deterministic noise level. Our focus is on the recovery of unknown signal .

The Dantzig Selector estimator of the signal is defined as follows [7]:

 ˆx\scriptsize\rm DS(y)∈\rm Argminv∈Rn{∥v∥1|∥AT(Av−y)∥∞≤ρ}

where is the algorithm’s parameter. Since is obtained as a solution of an linear program, it is very attractive by its low computational cost. Accuracy bounds for this estimator are readily available. For instance, a well known result about this estimator (cf. [7, Theorem 1.1]) is that if then

 ∥ˆx\scriptsize\rm DS(y)−x∥2≤Kσ√slog(nϵ−1)

with probability if the signal is -sparse, i.e. has at most non-vanishing components, and the sensing matrix with unit columns possesses the Restricted Isometry Property with parameters and . 111Recall that , called also uniform uncertainty principle, means that for any with at most nonzero entries,

This property essentially requires that every set of columns of with cardinality less than approximately behaves like an orthonormal system. Further, in this case one has , where is a moderate absolute constant. This result is quite impressive, in part due to the fact (see, e.g. [5, 6]) that there exist random matrices, with , which possess the RIP with probability close to , close to zero and the value of as large as . Similar performance guarantees are known for Lasso recovery

 ˆx\scriptsize\rm lasso(y)∈\rm Argminv∈Rn{∥v∥1+ϰ∥Av−y∥22},

with properly chosen penalty parameter . The available accuracy bounds for Lasso and Dantzig Selector rely upon the Restricted Isometry Property or less restrictive assumptions about the sensing matrix, such as Restricted Eigenvalue [2] or Compatibility [3] conditions (a complete overview of those and several other assumptions with description of how they relate to each other is provided in [19]). However, these assumptions cannot be verified efficiently. The latter implies that there is currently no way to provide any guaranties (e.g., confidence sets) of the performance of the proposed procedures. A notable exception from this rule is the Mutual Incoherence assumption (see, e.g. [10, 11, 12] and [21] for the case of, respectively, deterministic and random observation noise) which can be used to compute the accuracy bounds for recovery algorithms: a matrix with columns of unit -norm and mutual incoherence possesses with .222 The mutual incoherence of a sensing matrix is computed according to

Obviously, the mutual incoherence can be easily computed even for large matrices. Unfortunately, the latter relation implies that should be very small to certify the possibility of accurate -recovery of non-trivial sparse signals, so that performance guarantees based on mutual incoherence are very conservative. This “theoretical observation” is supported by numerical experiments – the practical guarantees which may be obtained using the mutual incoherence are generally quite poor even for the problems with nice theoretical properties (cf. [14, 15]).

Recently the authors have proposed a new approach for efficient computing of upper and lower bounds on the “level of goodness” of a sensing matrix , i.e. the maximal such that the -recovery of all signals with no more than non-vanishing components is accurate in the case where the measurement noise vanishes (see [14]). In the present paper we aim to use the related verifiable sufficient conditions of “goodness” of a sensing matrix to provide efficiently computable bounds for the error of recovery procedures in the case when the observations are affected by random noise.

The main body of the paper is organized as follows:

1. We start with Section 2.1 where we formulate the sparse recovery problem and introduce our core assumption – a verifiable condition linking matrix and a contrast matrix . In Sections 2.2, 2.3 we present two recovery routines with contrast matrices:

• regular recovery:

 ˆx\scriptsize\rm reg(y)∈\rm Argminv∈Rn{∥v∥1:∥HT(Av−y)∥∞≤ρ},
• penalized recovery:

 ˆx\scriptsize\rm pen(y)∈\rm Argminv∈Rn{∥v∥1+θs∥HT(Av−y)∥∞},

( is our guess for the number of nonzero entries in the true signal, is the penalty parameter)

along with their performance guarantees under condition with , that is, explicit upper bounds on the confidence levels of the recovery errors . The novelty here is that our bounds are of the form

 \rm Prob{∥ˆx−x∥p≤O(s1/pσ√ln(n/ϵ))for every s-sparse signal x and all 1≤p≤∞}≥1−ϵ (2)

(with hidden factors in independent of ), and are valid in the entire range of values of . Note that similar error bounds for Dantzig Selector and Lasso are only known for , whatever be the assumptions on “essentially nonsquare” matrix .

2. Our interest in condition stems from the fact that this condition, in contrast to the majority of the known sufficient conditions for the validity of -based sparse recovery (e.g., Restricted Isometry/Eigenvalue/Compatibility), is efficiently verifiable. Moreover, it turns out that one can efficiently optimize the error bounds of the associated with this verifiable condition regular/penalized recovery routines over the contrast matrix . The related issues are considered in Section 3. In Section 4 we provide some additional justification of the condition , in particular, by linking it with the Mutual Incoherence and Restricted Isometry properties. This, in particular, implies that the condition with, say, associated with randomly selected matrices is feasible, with probability approaching 1 as grow, for as large as . We also establish limits of performance of the condition, specifically, show that unless is nearly square, with can be feasible only when , meaning that the tractability of the condition has a heavy price: when designing and validating minimization based sparse recovery routines, this condition can be useful only in a severely restricted range of the sparsity parameter .

3. In Section 5 we show that the condition is the strongest (and seemingly the only verifiable one) in a natural family of conditions linking a sensing and a contrast matrix; here is the number of nonzeros in the sparse signal to be recovered . We demonstrate that when a contrast matrix satisfies with , the associated regular and penalized recoveries admit error bounds similar to (2), but now in the restricted range of values of . We demonstrate also that feasibility of with implies instructive (although slightly worse than those in (2)) error bounds for the Dantzig Selector and Lasso recovering routines.

4. In Section 6, we present numerical results on comparison of regular/penalized recovery with the Dantzig Selector and Lasso algorithms. The conclusion suggested by these preliminary numerical results is that when the former procedures are applicable (i.e., when the techniques of Section 3 allow to build a “not too large” contrast matrix satisfying the condition with, say, ), our procedures outperform significantly the Dantzig Selector and work exactly as well as the Lasso algorithm with “ideal” (unrealistic in actual applications) choice of the regularization parameter333With “theoretically optimal,” rather than “ideal,” choice of the regularization parameter in Lasso, this algorithm is essentially worse than our algorithms utilizing the contrast matrix..

5. In the concluding Section 7 we present a “Non-Euclidean Matching Pursuit algorithm” (similar to the one presented in [15]) with the same performance characteristics as those of regular/penalized recoveries; this algorithm, however, does not require optimization and can be considered as a computationally cheap alternative to recoveries, especially in the case when one needs to process a series of recovery problems with common sensing matrix.

All proofs are placed in the Appendix.

## 2 Accuracy bounds for ℓ1-Recovery Routines

### 2.1 Problem statement

##### Notation.

For a vector and we denote the vector obtained from by setting to all but the largest in magnitude entries of . Ties, if any, could be resolved arbitrarily; for the sake of definiteness assume that among entries of equal magnitudes, those with smaller indexes have priority (e.g., with one has ). stands for the usual -norm of (so that ). We say that a vector is -sparse if it has at most nonzero entries. Finally, for a set we denote by its complement ; given , we denote by the vector obtained from by zeroing the entries with indices outside of , so that .

Given a norm on and a matrix , we set .

##### The problem.

We consider an observation

 y=Ax+u+σξ, (3)

where is an unknown signal and is the sensing matrix. We suppose that is a Gaussian disturbance, where (i.e., with independent normal random variables with zero mean and unit variance), being known, and is a nuisance parameter known to belong to a given uncertainty set which we will suppose to be convex, compact and symmetric w.r.t. the origin. Our goal is to recover from , provided that is “nearly -sparse.” Specifically, we consider the sets

 X(s,υ)={x∈Rn:∥x−xs∥1≤υ}

of signals which admit -sparse approximation of -accuracy . Given , , and a confidence level , , we quantify a recovery routine — a Borel function — by its worst-case, over , confidence interval, taken w.r.t. -norm of the error. Specifically, we define the risks of a recovery routine as

 \rm Riskp(ˆx(⋅)|ϵ,σ,s,υ)=inf{δ:\rm Prob{ξ:∃x∈X(s,υ),u∈U:∥ˆx(Ax+σξ+u)−x∥p>δ}≤ϵ}.

Equivalently: if and only if there exists a set of “good” realizations of with such that whenever , one has for all and all .

##### Norm ν(⋅).

Given and let us denote

 ν(v)=νϵ,σ,U(v)=supu∈UuTv+σ√2ln(n/ϵ)∥v∥2. (4)

Since is convex, closed and symmetric with respect to the origin, is a norm. Let be the norm on conjugate to :

 ν∗(u)=maxv{vTu:ν(v)≤1}.
##### Conditions H(γ) and Hs,∞(κ).

Let . Given , consider the following condition on a matrix :

: for all and one has

 |xi|≤|hTiAx|+γi∥x∥1. (5)

Now let be a positive integer and . Given , we say that a matrix satisfies condition 444The reason for this cumbersome, at the first glance, notation will become clear later, in Section 5., if

 ∀x∈Rn:∥x∥∞≤∥HTAx∥∞+s−1κ∥x∥1. (6)

The conditions we have introduced are closely related to each other:

###### Lemma 1

If satisfies , then satisfies , and “nearly vice versa:” given satisfying , one can build efficiently a matrix satisfying with (i.e., ) and such that the columns of are convex combinations of the columns of and , so that for every norm on .

### 2.2 Regular ℓ1 Recovery

In this section we discuss the properties of the regular -recovery given by:

 ˆx\scriptsize\rm reg=ˆx% \scriptsize\rm reg(y)∈\rm Argminv∈Rn{∥v∥1:|hTi(Av−y)|≤ρi,i=1,...,n}, (7)

where is as in  (3), , , are some vectors in and , . We refer to the matrix as to the contrast matrix underlying the recovering procedure.

The starting point of our developments is the following

###### Proposition 1

Given an sensing matrix , noise intensity , uncertainty set and a tolerance , let the matrix from (7) satisfy the condition for some , and let in (7) satisfy the relation

 ρi≥νi:=ν(hi),i=1,...,n (8)

where is given by (4). Then there exists a set , , of ”good” realizations of such that

(i) Whenever , for every , every and every subset such that

 γI:=∑i∈Iγi<\small12, (9)

the regular -recovery given by (7) satisfies:

 (a) ∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥1 ≤2∥xJ∥1+2ρI+2νI1−2γI; (b) |[ˆx\scriptsize\rm reg(Ax+σξ+u)−x]i| ≤ρi+νi+γi∥ˆx\scriptsize% \rm reg(y)−x∥1 ≤ρi+νi+γi2∥xJ∥1+2ρI+2νI1−2γI,i=1,...,n,

where and

(ii) In particular, when setting

 ˆρs=∥[ρ1;...;ρn]∥s,1,ˆνs=∥[ν(h1);...;ν(hn)]∥s,1,ˆγs=∥[γ1;...;γn]∥s,1,ˆρ=ˆρ1=maxiρi,ν(H)=ˆν1=maxiν(hi),ˆγ=ˆγ1=maxiγi, (11)

and assuming , for every , and it holds

 ∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥1 ≤ 2∥x−xs∥1+ˆρs+ˆνs1−2ˆγs≤2∥x−xs∥11−2ˆγs+2sˆρ+ν(H)1−2ˆγs; ∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥∞ ≤ 2ˆγ∥x−xs∥11−2ˆγs+[1+2sˆγ−2ˆγs][ˆρ+ν(H)]1−2ˆγs

(iii) Finally, assuming , for every , and one has

 ∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥1≤2∥x−xs∥11−2sˆγ+2sˆρ+ν(H)1−2sˆγ;∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥∞≤s−1∥x−xs∥11−2sˆγ+ˆρ+ν(H)1−2sˆγ. (12)

The following result is an immediate corollary of Proposition 1:

###### Lemma 2

Under the premise of Proposition 1, assume that . Then for all and :

 \rm Riskp(ˆx\scriptsize\rm reg(⋅)|ϵ,σ,s,υ)≤21−2ˆγs[υ+ˆρs+ˆνs]1p[ˆγυ+[\small12−ˆγs][ˆρ+ν(H)]+ˆγ[ˆνs+ˆρs]]p−1p (13)

(for notation, see (11)). Further, if , we have also

 1≤p≤∞⇒\rm Riskp(ˆx% \scriptsize\rm reg(⋅)|ϵ,σ,s,υ)≤(2s)1p1−2sˆγ(s−1υ+ˆρ+ν(H)). (14)

The next statement is similar to the cases of in Proposition 1 and Lemma 2; the difference is that now we assume that satisfies , which, by Lemma 1, is a weaker requirement on than to satisfy with .

###### Proposition 2

Given an sensing matrix , noise intensity , uncertainty set and a tolerance , let the matrix from (7) satisfy the condition for some , and let in (7) satisfy the relation (8). Then there exists a set , , of ”good” realizations of such that whenever , for every and every one has

 ∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥1≤2∥x−xs∥11−2κ+2sˆρ+ν(H)1−2κ;∥ˆx\scriptsize\rm reg(Ax+σξ+u)−x∥∞≤s−1∥x−xs∥11−2κ+ˆρ+ν(H)1−2κ. (15)

In particular,

 1≤p≤∞⇒\rm Riskp(ˆx% \scriptsize\rm reg(⋅)|ϵ,σ,s,υ)≤(2s)1p1−2κ(s−1υ+ˆρ+ν(H)). (16)

### 2.3 Penalized ℓ1 Recovery

Now consider the penalized -recovery as follows:

 ˆx\scriptsize\rm pen(y)∈\rm Argmin% v∈Rn{∥v∥1+θs∥HT(Av−y)∥∞}, (17)

where is as in  (3), and an integer , a positive , and a matrix are parameters of the construction.

###### Proposition 3

Given an sensing matrix , an integer , a matrix and positive reals , , satisfying the condition , and a , assume that

 ˆγs:=∥γ∥s,1<\small12 (18)

and

 (1−ˆγs)−1<θ<(ˆγs)−1 (19)

Further, let , , and let

 νi=νϵ,σ,U(hi),i=1,...,n,ν(H)=maxiνi. (20)

Consider the penalized recovery associated with . There exists a set , , of ”good” realizations of such that

(i) Whenever , for every signal and every one has

 (a)∥ˆx\scriptsize\rm pen(Ax+σξ+u)−x∥1≤2∥x−xs∥1+2sθν(H)min[θ(1−ˆγs)−1,1−θˆγs](b)∥ˆx\scriptsize\rm pen(Ax+σξ+u))−x∥∞≤(1sθ+ˆγ)∥ˆx% \scriptsize\rm pen(Ax+σξ+u)−x∥1+2ν(H)≤2(1sθ+ˆγ)∥x−xs∥1min[θ(1−ˆγs)−1,1−θˆγs]+2ν(H)[1+sθˆγmin[θ(1−ˆγs)−1,1−θˆγs]+1], (21)

where, as in Lemma 2, .

(ii) When and , one has for every , and :

 (a)∥ˆx\scriptsize\rm pen(Ax+σξ+u)−x∥1≤2∥x−xs∥11−2sˆγ+4sν(H)1−2sˆγ(b)∥ˆx\scriptsize\rm pen(Ax+σξ+u))−x∥∞≤2s−1∥x−xs∥11−2sˆγ+4ν(H)1−2sˆγ, (22)

whence for every and :

 \rm Riskp(ˆx\scriptsize\rm pen(⋅)|ϵ,σ,s,υ)≤2s1p1−2sˆγ(s−1υ+2ν(H)). (23)

The next statement is in the same relation to Proposition 3 as Proposition 2 is to Proposition 1 and Lemma 2.

###### Proposition 4

Given an sensing matrix , noise intensity , uncertainty set and a tolerance , let the matrix from (17) satisfy the condition for some , and let . Then there exists a set , , of ”good” realizations of such that whenever , for every and every one has

 ∥ˆx\scriptsize\rm pen(Ax+σξ+u)−x∥1≤2∥x−xs∥11−2κ+4sν(H)1−2κ;∥ˆx\scriptsize\rm pen(Ax+σξ+u)−x∥∞≤2s−1∥x−xs∥11−2κ+4ν(H)1−2κ. (24)

In particular,

 1≤p≤∞⇒\rm Riskp(ˆx% \scriptsize\rm pen(⋅)|ϵ,σ,s,υ)≤2s1p1−2κ(s−1υ+2ν(H)). (25)

Note that under the premise of Proposition 2, the smallest possible values of are the quantities , which results in ; with this choice of , the risk bound for the regular recovery, as given by the right hand side in (16), coincides within factor 2 with the risk bound for the penalized recovery with as given by (25); both bounds assume that satisfies with and imply that

 1≤p≤∞⇒\rm Riskp(ˆx(⋅)|ϵ,σ,s,υ)≤2s1p1−2κ(s−1υ+2ν(H)). (26)

When , the latter bound admits a quite transparent interpretation: everything is as if we were observing the sum of an unknown -dimensional signal and an observation error of the uniform norm .

## 3 Efficient construction of the contrast matrix H

In what follows, we fix , the “environment parameters” and the “level of sparsity” of signals we intend to recover, and are interested in building the contrast matrix resulting in as small as possible error bound (26). All we need to this end is to answer the following question (where we should specify the norm as ):

(?) Let be a norm on , and be a positive integer. What is the domain of pairs such that and there exists matrix satisfying the condition and the relation ? How to find such an , provided it exists?

Invoking Lemma 1, we can reformulate this question as follows:

(??) Let and be as in (?). Given , how to find vectors , , satisfying

 (a): φ(hi)≤ω; & (b): |xi|≤|hTiAx|+s−1κ∥x∥1∀x∈Rn

for every , or to detect correctly that no such collection of vectors exists?

Indeed, by Lemma 1, if satisfies and , then there exists such that satisfy for all and , so that satisfy for all as well. Vice versa, if satisfy , , then the matrix clearly satisfies , and .

The answer to (??) is given by the following

###### Lemma 3

Given , , and a positive integer , let . For every , the following three properties are equivalent to each other:

• There exists satisfying ;

• The optimal value in the optimization problem

 \rm Opti(γ)=minh{φ(h):∥ATh−ei∥∞≤γ}

where is -th standard basic orth in , is ;

• One has

 ∀x∈Rn:|xi|≤ωφ∗(Ax)+γ∥x∥1, (27)

where is the norm on conjugate to .

Whenever one (and then – all) of these properties take place, problem is solvable, and its optimal solution satisfies .

### 3.1 Optimal contrasts for regular and penalized recoveries

As an immediate consequence of Lemma 3, we get the following description of the domain associated with the norm :

 (28)

where in is specified as . Note that the second equality in is given by Linear Programming duality. Indeed, by , is the smallest for which all problems , , are feasible, and thus, by Lemma 3, if and only if and .

Note that the quantity depends solely of , while depends on , as on parameters, but is independent of .

The outlined results suggest the following scheme of building the contrast matrix :

• we compute by solving Linear Programming problems in (28.); if , then does not contain points with , so that our recovery routines are not applicable (or, at least, we cannot justify them theoretically);

• when , the set is nonempty, and its Pareto frontier (the set of pairs such that is possible if and only if ) is the curve , . We choose a “working point” on this curve, that is, a point and compute by solving the convex optimization programs , , with specified as . is nothing but the maximum, over , of the optimal values of these problems, and the optimal solutions to the problems induce the matrix which satisfies and has . By reasoning which led us to (??),

 ν(H(¯γ))=ω∗(¯γ)=minH′{ν(H′):H′\ satisfies\ Hs,∞(s¯γ)},

that is, is the best for our purposes contrast matrices satisfying . With this contrast matrix, the error bound (26) for regular/penalized recoveries (in the former, , in the latter, ) read

 1≤p≤∞⇒\rm Riskp(ˆx(⋅)|ϵ,σ,s,υ)≤2s1p1−2s¯γ(s−1υ+2ω∗(¯γ)). (29)

The outlined strategy does not explain how to choose . This issue could be resolved, e.g., as follows. We choose an upper bound on the sensitivity of the risk (29) to , i.e., to the -deviation of a signal to be recovered from the set of -sparse signals. This sensitivity is proportional to , so that an upper bound on the sensitivity translates into an upper bound on . We can now choose by minimizing the remaining term in the risk bound over , which amounts to solving the optimization problem

 max{τ:τω∗(γ)≤1−2sγ,γ∗≤γ≤γ+}.

Observing that is, by its origin, a convex function, we can solve the resulting problem efficiently by bisection in . A step of this bisection requires solving a univariate convex feasibility problem with efficiently computable constraint and thus is easy, at least for moderate values of .

## 4 Range of feasibility of condition Hs,∞(κ)

We address the crucial question of what can be said about the magnitude of the quantity , see (28) and the risk bound (