Accuracy guarantees for L1recovery
Abstract
We discuss two new methods of recovery of sparse signals from noisy observation based on minimization. While they are closely related to the wellknown techniques such as Lasso and Dantzig Selector, these estimators come with efficiently verifiable guaranties of performance. By optimizing these bounds with respect to the method parameters we are able to construct the estimators which possess better statistical properties than the commonly used ones.
We link our performance estimations to the well known results of Compressive Sensing and justify our proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance when the signal support is known. We also show how the estimates can be computed using the NonEuclidean Basis Pursuit algorithm.
Abstract
Key words : sparse recovery, linear estimation, oracle inequalities, nonparametric estimation by convex optimization
AMS Subject Classification
: 62G08, 90C251 Introduction
Recently several methods of estimation and selection which refer to the minimization received much attention in the statistical literature. For instance, Lasso estimator, which is the penalized leastsquares method is probably the most studied (a theoretical analysis of the Lasso estimator is provided in, e.g., [2, 3, 4, 19, 20, 21, 17, 18], see also the references cited therein). Another, closely related to the Lasso, statistical estimator is the Dantzig Selector [7, 2, 16, 17]. To be more precise, let us consider the estimation problem as follows. Assume that an observation
(1) 
is available, where is an unknown signal and is a known sensing matrix. We suppose that is a Gaussian disturbance with (i.e., , where are independent normal r.v.’s with zero mean and unit variance), and is a known deterministic noise level. Our focus is on the recovery of unknown signal . The Dantzig Selector estimator of the signal is defined as follows [7]:
where is the algorithm’s parameter. Since is obtained as a solution of an linear program, it is very attractive by its low computational cost. Accuracy bounds for this estimator are readily available. For instance, a well known result about this estimator (cf. [7, Theorem 1.1]) is that if then
with probability if the signal is sparse, i.e. has at most nonvanishing components, and the sensing matrix with unit columns possesses the Restricted Isometry Property with parameters and .
with properly chosen penalty parameter .
The available accuracy bounds for Lasso and Dantzig Selector rely upon the Restricted Isometry Property or less restrictive assumptions about the sensing matrix, such as Restricted Eigenvalue [2] or Compatibility [3] conditions (a complete overview of those and several other assumptions
with description of how they relate to each other is provided in [19]). However, these assumptions cannot be verified efficiently.
The latter implies that there is currently no way to provide any guaranties (e.g., confidence sets) of the performance of the proposed procedures. A notable exception from this rule is the Mutual Incoherence assumption (see, e.g. [10, 11, 12] and [21] for the case of, respectively, deterministic and random observation noise) which can be used to compute the accuracy bounds for recovery algorithms: a matrix with columns of unit norm and mutual incoherence possesses with .

We start with Section 2.1 where we formulate the sparse recovery problem and introduce our core assumption – a verifiable condition linking matrix and a contrast matrix . In Sections 2.2, 2.3 we present two recovery routines with contrast matrices:

regular recovery:

penalized recovery:
( is our guess for the number of nonzero entries in the true signal, is the penalty parameter)
along with their performance guarantees under condition with , that is, explicit upper bounds on the confidence levels of the recovery errors . The novelty here is that our bounds are of the form
(2) (with hidden factors in independent of ), and are valid in the entire range of values of . Note that similar error bounds for Dantzig Selector and Lasso are only known for , whatever be the assumptions on “essentially nonsquare” matrix .


Our interest in condition stems from the fact that this condition, in contrast to the majority of the known sufficient conditions for the validity of based sparse recovery (e.g., Restricted Isometry/Eigenvalue/Compatibility), is efficiently verifiable. Moreover, it turns out that one can efficiently optimize the error bounds of the associated with this verifiable condition regular/penalized recovery routines over the contrast matrix . The related issues are considered in Section 3. In Section 4 we provide some additional justification of the condition , in particular, by linking it with the Mutual Incoherence and Restricted Isometry properties. This, in particular, implies that the condition with, say, associated with randomly selected matrices is feasible, with probability approaching 1 as grow, for as large as . We also establish limits of performance of the condition, specifically, show that unless is nearly square, with can be feasible only when , meaning that the tractability of the condition has a heavy price: when designing and validating minimization based sparse recovery routines, this condition can be useful only in a severely restricted range of the sparsity parameter .

In Section 5 we show that the condition is the strongest (and seemingly the only verifiable one) in a natural family of conditions linking a sensing and a contrast matrix; here is the number of nonzeros in the sparse signal to be recovered . We demonstrate that when a contrast matrix satisfies with , the associated regular and penalized recoveries admit error bounds similar to (2), but now in the restricted range of values of . We demonstrate also that feasibility of with implies instructive (although slightly worse than those in (2)) error bounds for the Dantzig Selector and Lasso recovering routines.

In Section 6, we present numerical results on comparison of regular/penalized recovery with the Dantzig Selector and Lasso algorithms. The conclusion suggested by these preliminary numerical results is that when the former procedures are applicable (i.e., when the techniques of Section 3 allow to build a “not too large” contrast matrix satisfying the condition with, say, ), our procedures outperform significantly the Dantzig Selector and work exactly as well as the Lasso algorithm with “ideal” (unrealistic in actual applications) choice of the regularization parameter
^{4} . 
In the concluding Section 7 we present a “NonEuclidean Matching Pursuit algorithm” (similar to the one presented in [15]) with the same performance characteristics as those of regular/penalized recoveries; this algorithm, however, does not require optimization and can be considered as a computationally cheap alternative to recoveries, especially in the case when one needs to process a series of recovery problems with common sensing matrix.
All proofs are placed in the Appendix.
2 Accuracy bounds for Recovery Routines
2.1 Problem statement
Notation. For a vector and we denote the vector obtained from by setting to all but the largest in magnitude entries of . Ties, if any, could be resolved arbitrarily; for the sake of definiteness assume that among entries of equal magnitudes, those with smaller indexes have priority (e.g., with one has ). stands for the usual norm of (so that ). We say that a vector is sparse if it has at most nonzero entries. Finally, for a set we denote by its complement ; given , we denote by the vector obtained from by zeroing the entries with indices outside of , so that .Given a norm on and a matrix , we set .
The problem. We consider an observation
(3) 
where is an unknown signal and is the sensing matrix. We suppose that is a Gaussian disturbance, where (i.e., with independent normal random variables with zero mean and unit variance), being known, and is a nuisance parameter known to belong to a given uncertainty set which we will suppose to be convex, compact and symmetric w.r.t. the origin. Our goal is to recover from , provided that is “nearly sparse.” Specifically, we consider the sets
of signals which admit sparse approximation of accuracy . Given , , and a confidence level , , we quantify a recovery routine — a Borel function — by its worstcase, over , confidence interval, taken w.r.t. norm of the error. Specifically, we define the risks of a recovery routine as
Equivalently: if and only if there exists a set of “good” realizations of with such that whenever , one has for all and all .
Norm . Given and let us denote
(4) 
Since is convex, closed and symmetric with respect to the origin, is a norm. Let be the norm on conjugate to :
Conditions and . Let . Given , consider the following condition on a matrix :
: for all and one has
(5)
Now let be a positive integer and . Given , we say that a matrix satisfies condition
(6) 
The conditions we have introduced are closely related to each other:
Lemma 1
If satisfies , then satisfies , and “nearly vice versa:” given satisfying , one can build efficiently a matrix satisfying with (i.e., ) and such that the columns of are convex combinations of the columns of and , so that for every norm on .
2.2 Regular Recovery
In this section we discuss the properties of the regular recovery given by:
(7) 
where is as in (3), , , are some vectors in and , . We refer to the matrix as to the contrast matrix underlying the recovering procedure. The starting point of our developments is the following
Proposition 1
Given an sensing matrix , noise intensity , uncertainty set and a tolerance , let the matrix from (7) satisfy the condition for some , and let in (7) satisfy the relation
(8) 
where is given by (4). Then there exists a set , , of ”good” realizations of such that (i) Whenever , for every , every and every subset such that
(9) 
the regular recovery given by (7) satisfies:
(10)  
where and (ii) In particular, when setting
(11) 
and assuming , for every , and it holds
(iii) Finally, assuming , for every , and one has
(12) 
The following result is an immediate corollary of Proposition 1:
Lemma 2
The next statement is similar to the cases of in Proposition 1 and Lemma 2; the difference is that now we assume that satisfies , which, by Lemma 1, is a weaker requirement on than to satisfy with .
Proposition 2
Given an sensing matrix , noise intensity , uncertainty set and a tolerance , let the matrix from (7) satisfy the condition for some , and let in (7) satisfy the relation (8). Then there exists a set , , of ”good” realizations of such that whenever , for every and every one has
(15) 
In particular,
(16) 
2.3 Penalized Recovery
Now consider the penalized recovery as follows:
(17) 
where is as in (3), and an integer , a positive , and a matrix are parameters of the construction.
Proposition 3
Given an sensing matrix , an integer , a matrix and positive reals , , satisfying the condition , and a , assume that
(18) 
and
(19) 
Further, let , , and let
(20) 
Consider the penalized recovery associated with . There exists a set , , of ”good” realizations of such that (i) Whenever , for every signal and every one has
(21) 
where, as in Lemma 2, . (ii) When and , one has for every , and :
(22) 
whence for every and :
(23) 
The next statement is in the same relation to Proposition 3 as Proposition 2 is to Proposition 1 and Lemma 2.
Proposition 4
Given an sensing matrix , noise intensity , uncertainty set and a tolerance , let the matrix from (17) satisfy the condition for some , and let . Then there exists a set , , of ”good” realizations of such that whenever , for every and every one has
(24) 
In particular,
(25) 
Note that under the premise of Proposition 2, the smallest possible values of are the quantities , which results in ; with this choice of , the risk bound for the regular recovery, as given by the right hand side in (16), coincides within factor 2 with the risk bound for the penalized recovery with as given by (25); both bounds assume that satisfies with and imply that
(26) 
When , the latter bound admits a quite transparent interpretation: everything is as if we were observing the sum of an unknown dimensional signal and an observation error of the uniform norm .
3 Efficient construction of the contrast matrix
In what follows, we fix , the “environment parameters” and the “level of sparsity” of signals we intend to recover, and are interested in building the contrast matrix resulting in as small as possible error bound (26). All we need to this end is to answer the following question (where we should specify the norm as ):
(?) Let be a norm on , and be a positive integer. What is the domain of pairs such that and there exists matrix satisfying the condition and the relation ? How to find such an , provided it exists?
Invoking Lemma 1, we can reformulate this question as follows:
(??) Let and be as in (?). Given , how to find vectors , , satisfying
P_{i} for every , or to detect correctly that no such collection of vectors exists?
Indeed, by Lemma 1, if satisfies and , then there exists such that satisfy for all and , so that satisfy for all as well. Vice versa, if satisfy , , then the matrix clearly satisfies , and . The answer to (??) is given by the following
Lemma 3
Given , , and a positive integer , let . For every , the following three properties are equivalent to each other:

There exists satisfying ;

The optimal value in the optimization problem
P_{i}^{\gamma} where is th standard basic orth in , is ;

One has
(27) where is the norm on conjugate to .
Whenever one (and then – all) of these properties take place, problem is solvable, and its optimal solution satisfies .
3.1 Optimal contrasts for regular and penalized recoveries
As an immediate consequence of Lemma 3, we get the following description of the domain associated with the norm :
(28) 
where in is specified as . Note that the second equality in is given by Linear Programming duality. Indeed, by , is the smallest for which all problems , , are feasible, and thus, by Lemma 3, if and only if and . Note that the quantity depends solely of , while depends on , as on parameters, but is independent of . The outlined results suggest the following scheme of building the contrast matrix :

we compute by solving Linear Programming problems in (28.); if , then does not contain points with , so that our recovery routines are not applicable (or, at least, we cannot justify them theoretically);

when , the set is nonempty, and its Pareto frontier (the set of pairs such that is possible if and only if ) is the curve , . We choose a “working point” on this curve, that is, a point and compute by solving the convex optimization programs , , with specified as . is nothing but the maximum, over , of the optimal values of these problems, and the optimal solutions to the problems induce the matrix which satisfies and has . By reasoning which led us to (??),
that is, is the best for our purposes contrast matrices satisfying . With this contrast matrix, the error bound (26) for regular/penalized recoveries (in the former, , in the latter, ) read
(29)
The outlined strategy does not explain how to choose . This issue could be resolved, e.g., as follows. We choose an upper bound on the sensitivity of the risk (29) to , i.e., to the deviation of a signal to be recovered from the set of sparse signals. This sensitivity is proportional to , so that an upper bound on the sensitivity translates into an upper bound on . We can now choose by minimizing the remaining term in the risk bound over , which amounts to solving the optimization problem
Observing that is, by its origin, a convex function, we can solve the resulting problem efficiently by bisection in . A step of this bisection requires solving a univariate convex feasibility problem with efficiently computable constraint and thus is easy, at least for moderate values of .