Relaxed Recovery Conditions for OMP/OLS by Exploiting both Coherence and Decay
We propose extended coherence-based conditions for exact sparse support recovery using orthogonal matching pursuit (OMP) and orthogonal least squares (OLS). Unlike standard uniform guarantees, we embed some information about the decay of the sparse vector coefficients in our conditions. As a result, the standard condition (where denotes the mutual coherence and the sparsity level) can be weakened as soon as the nonzero coefficients obey some decay, both in the noiseless and the bounded-noise scenarios. Furthermore, the resulting condition is approaching for strongly decaying sparse signals. Finally, in the noiseless setting, we prove that the proposed conditions, in particular the bound , are the tightest achievable guarantees based on mutual coherence.
rthogonal matching pursuit; orthogonal least-squares; mutual coherence; exact recovery; sparse decaying representations.
In this paper, we focus on two popular instances of greedy algorithms
for sparse signal approximation from linear measurements, namely
orthogonal matching pursuit (OMP)  and
orthogonal least squares
In the recent years, many researchers have studied conditions under which Oxx succeeds in recovering the true sparse vector. A popular approach to address this question relies on the derivation of uniform guarantees; the latter ensure the success of Oxx for a given sparsity level (or a given support) irrespective of the magnitude of the nonzero coefficients. This type of analyses was carried out for OMP in [8, 9] and also adapted to several extensions of OMP in [10, 11, 4]. Although OLS has been known in the literature for a few decades (under different names ), uniform exact recovery analyses of OLS have only appeared very recently, see [4, 13, 14].
On the one hand, uniform conditions are usually quite pessimistic
since they cannot be satisfied as soon as Oxx fails for one
particular sparse vector. As a matter of fact, it is now acknowledged
that uniform conditions typically fail in properly characterizing the
average behavior of the considered algorithm
[15, 16]. In particular, in  the author emphasized that the empirical behavior of OMP is much dependent on the distribution defining the amplitudes of the nonzero coefficients. On the other hand, probabilistic analyses
In this paper, we adopt a deterministic analysis technique for sparse vectors whose nonzero coefficients obey some decay. Our analysis is therefore no longer uniform since it restricts the success of Oxx to a certain class of sparse signals. To some extent, it may also provide insights into the success of Oxx for random input vectors as long as one can characterize the decay of “typical” realizations of the latter. From another point of view, let us mention that a number of authors empirically observed (and then conjectured) that the worst-case scenario for Oxx corresponds to the situation where all the nonzero coefficients have the same amplitude, see e.g., [21, 22, 23, 17]. The analysis of Oxx with decaying vectors is thus also expected to bring some answer to this question since vectors with equal nonzero coefficients correspond to the particular case of “no decay”.
Although sparse vectors with decaying nonzero coefficients can be observed in many applications (see  and  for examples in the field of image and audio processing, respectively), we are only aware of a few works analyzing the success of Oxx in such a setup [26, 9, 27, 28]. In  the authors adopted an information-theoretical point of view: they derived “rates” (i.e., dictionary dimensions and sparsity levels) under which a “successive interference canceller” (which can be understood as an idealized version of Oxx) can asymptotically succeed. In particular, they showed that the achievable rates depend on the amplitudes of the nonzero coefficients (which, in their theoretical framework, must be known to the receiver), and thus on the decay. However, their results only apply to randomly-generated dictionaries. In , the authors provided an analysis of OMP in terms of restricted isometry constants (RIC) and showed that the success of OMP can be ensured by considering sufficiently decaying vectors. In , Ding et al. extended this type of result to the case of observation models corrupted by noise. Finally, Ehler et al. carried out the same kind of RIC-based analysis for some non-linear generalization of OLS in .
In the sequel, we propose novel conditions of success in steps for both
OMP and OLS in terms of mutual coherence of the dictionary ( denotes the number of nonzero coefficients in the sparse vector). We note
that, as long as the success of OMP and OLS in steps is concerned
We will consider a unified definition of Oxx based on the orthogonal projection of the dictionary atoms onto the orthogonal complement of the subspace spanned by the selected atoms, see e.g., . This definition allows us to define both algorithms in a unifying framework and to carry out a parallel analysis of both OMP and OLS. Our derivations are then based on a fine analysis of the correlation between the projected atoms involved in the iterations of Oxx. Unlike previous works, we highlight that the decay conditions can be relaxed as the iterations of Oxx progress. Moreover, our guarantees are tight: these are the best achievable coherence-based guarantees exploiting the decay between successive ordered coefficients in the noiseless setup.
2 Context and Main Results
Let obey the following model:
where is a known dictionary, is an unknown vector and is some additive noise with . The columns of the dictionary are supposed to be normalized: . We investigate conditions ensuring that Oxx selects a subset of dictionary atoms, where matches the support of the largest elements of . We have summarized the main recursions of Oxx in Algorithm 1. denotes the vector inner product and is the projection of onto the space orthogonal to the columns of indexed by . We refer the reader to section 3.1 for a more detailed description of Oxx.
Our derivations are based on the so-called “-step” analysis of Oxx: Oxx will be assumed to fail as soon as one wrong atom is included to the estimated support [8, 9, 13]. On the contrary, Oxx succeeds if and only if the atoms in are selected during the first iterations. Alternative definitions of exact sparse recovery may be considered. In [4, 31], the authors focused on “delayed recovery”, where Oxx is assumed to succeed if the selected atoms contain the correct support, with possible false atom selections. This approach will not be pursued hereafter.
Several scenarios are considered. In sections 2.1 and 2.2, we address the case where the observation model is noiseless () and is -sparse with support (). In section 2.1, we focus on conditions ensuring the recovery of from the first iteration, i.e., with the initial empty support, whereas a finer analysis at intermediate iterations is carried out in section 2.2. This analysis allows us to provide weaker guarantees of good atom selection when (i) less than iterations are being performed, and when (ii) Oxx is known to have selected good atoms in the early iterations. In section 2.3, we address the noisy scenario (), and the case where is compressible but possibly non-sparse. In this case, can be thought of as the “head” of the signal , obtained by gathering the indices of the largest amplitudes in .
Some of the results presented below share connections. For example, the direct part of Theorem 2 (section 2.1), dealing with -step recovery and noiseless observations, can be seen as a particular case of the results presented in Theorems 3 and 5 (sections 2.2 and 2.3). However, we chose to follow this editorial line to keep the discussion of the results and the relation to the current state of the art as simple as possible.
2.1 -step Analysis in the Noiseless Setup
The first thoughtful “-step” analysis of OMP is due to Tropp in [8, Th. 3.1 and Th. 3.10]. He provided a sufficient and worst-case necessary condition for the exact recovery of any sparse vector with a given support . Moreover, he showed that the condition
Condition (2) is uniform, that is Oxx can recover any
-sparse vector irrespective of the amplitude of the nonzero
coefficients when (2) is satisfied. On the
other hand, it was shown in [32, Th. 3.1] that
(2) is tight: there exist a -sparse
vector and a dictionary
with such that Oxx selects a wrong atom at
the first iteration
If is a -sparse vector whose nonzero amplitudes are not all equal, there exists some such that Oxx recovers in steps for any dictionary with .
Interestingly, as mentioned in the introduction, it has been stated in many pieces of research (and accepted as a “folk knowledge” ) that sparse vectors with nonzero coefficients of equal magnitude correspond to the most difficult case for many reconstruction algorithms, see e.g., [21, 22, 23]. The result in Theorem 1 supports this observation by stating that, as long as the satisfaction of mutual coherence conditions for exact recovery is concerned, “flat” vectors correspond to the worst possible case for Oxx. In particular, a condition of success more favorable than always exists as soon as the coefficients of exhibit some decay.
Unfortunately, the proof of Theorem 1 does not provide an optimal value for (as a function of the rate of decay). In fact, a precise characterization of for general decay patterns may be a quite difficult task. In the next theorem, we provide “horizon-1” decay conditions (i.e., conditions between consecutive elements of the ordered nonzero coefficients) ensuring that Oxx succeeds in steps. In our statement, we assume without loss of generality that
then Oxx recovers in steps.
There exists an instance of dictionary with such that for all -sparse vectors supported by , Oxx selects a wrong atom during the first iterations.
For all , there exists a vector and a dictionary of mutual coherence , for which the inequalities (7) hold for , and become an equality for , and such that Oxx with as input selects a wrong atom at the -th iteration.
as soon as
Thus, by virtue of our convention (5), (8) implies that condition (7) trivially holds for any as soon as (2) is satisfied. We also note that is a decreasing function of for . Hence, the rate of decay in (7) becomes lower as increases.
Condition (7) can be equivalently expressed as
. The conditions of success stated in Theorem 2 can thus also be rephrased as:
It can be seen that possible values for range in the interval and depend on the decay of the nonzero coefficients of . On the one hand, the smallest value for occurs when , in which case . Hence, we recover the standard condition (2) when . On the other hand, (and therefore ) as soon as . This leads to the following corollary:
If and , then Oxx recovers in steps.
A graphical representation of these considerations is provided in Fig. 1 for : the decay factor appearing in (7) is plotted as a function of for different values of . For a given , the region above the related curve characterizes the set of vectors satisfying the recovery conditions of Theorem 2. We notice that the size of the region of success increases as the mutual coherence decreases. In particular, when , the curve is laying below the dashed line and (7) is satisfied for any -sparse representation since, by convention, the nonzero entries have been sorted according to (5). On the other side, the region of success is restricted to vectors satisfying when is close to . We note moreover that the decay constraints become less stringent as increases.
It is also insightful to see how often nonzero coefficients drawn
from different distributions can satisfy (7). In
Fig. 2, we have represented the
empirical probability that coefficients drawn from Bernoulli,
Uniform, Normal, Laplacian and LogLogistic distributions verify the
decay conditions of Theorem 2. We consider again
the case where and the results are averaged over 2000
realizations. In accordance with Theorems 1
and 2, the Bernoulli distribution (which always
generates “flat” vectors) leads to the worst results. In particular,
conditions (7) cannot be verified as soon as
. In contrast, the vectors drawn
from the other distributions satisfy (7) with some
nonzero probability for any (and are therefore ensured to yield a success of Oxx). Interestingly,
our conclusions regarding the comparison of
distributions is the same
as the one observed in the empirical study of the average performance of OMP in .
It is worth noting that not all standard sparse-representation
algorithms enjoy a relaxation of their recovery conditions when dealing with decaying
vectors. For example, the standard condition
cannot be improved for Basis Pursuit . Indeed, it
has been shown in [32, Th. 3.1] that there exists a
dictionary with and a flat
-sparse vector such that Basis Pursuit leads to a wrong
The converse part of Theorem 2 emphasizes that the proposed recovery conditions (6)-(7) are worst-case necessary in some sense. The nature of the sharpness of (6) and (7) is however slightly different. The tightness of (7) is restricted to the set of “horizon-1” conditions, that is conditions exploiting the decay between pairs of consecutive elements in the ordered sparse vector. The tightness of (6) is of more fundamental nature since Theorem 2 states that there exists a dictionary such that Oxx will fail during the first iterations irrespective of the values of the nonzero coefficients in . Hence, any mutual coherence condition ensuring -step recovery and valid for general deterministic dictionaries (and in particular for the specific dictionary considered in the proof of Theorem 2, see section 3.3) must be of the form with . Said otherwise, the bound cannot be improved whatever the hypotheses made on the sparse vector.
2.2 Partial Recovery and Successful Termination
In many applications, it is desirable to have some guarantees on the partial success of Oxx. Two main situations may be of interest:
Successful Termination: Oxx is assumed to have selected atoms in , with cardinality , during the first iterations, and one is interested in conditions guaranteeing the selection of atoms in during the next iterations.
Partial Support Recovery: the focus is on conditions ensuring the selection of elements of during the first iterations.
Before we state our results, let us make a few remarks. First, the question of partial support recovery has a trivial answer in the standard “uniform” setup. Indeed, as mentioned previously, the authors of  provided an instance of problem in which and Oxx selects a wrong atom at the first iteration. This shows that weaker coherence guarantees cannot be obtained for non-decaying vectors, even by restricting the success of Oxx to partial support recovery. On the contrary, we will emphasize that the paradigm of partial support recovery can be nicely addressed when accounting for the decay of the sparse vector.
Secondly, the question of the successful termination of Oxx has already been addressed in the uniform setting. In , the authors extended Tropp’s exact recovery condition (ERC) to this particular setup, both for OMP and OLS. The same type of conditions were expressed in terms of mutual coherence in [14, Th. 3]: if is reached during the first iterations, then Oxx selects atoms in during the next iterations provided that
Similar to the standard -step analysis , (9) was shown to be tight: there exist a -sparse vector with support , a subset with and a dictionary with such that Oxx selects atoms in during the first steps and then makes a wrong decision. We show hereafter that this coherence bound can be relaxed when dealing with decaying sparse vectors.
In the statement of Theorem 3, following convention (5), we proceed to a re-ordering of the atoms by decreasing values of their magnitudes . Here, this convention is applied to the unselected atoms, which are therefore indexed by:
Theorem 3 jointly addresses both questions of successful termination and partial support recovery.
Assume that Oxx has selected with during the first iterations and let .
and the largest magnitudes of the unselected atoms after iteration satisfy
then Oxx is guaranteed to select atoms in until all the elements in have been selected.
and the largest magnitudes of the unselected atoms after iteration satisfy
then Oxx is ensured to select atoms in until all the elements in have been selected or iterations have been carried out.
Let us discuss the implications of Theorem 3 on both problems of “successful termination” and “partial support recovery”. We specifically elaborate on the corresponding choices of , and .
The paradigm of successful termination corresponds to the case . In this setup, we note that conditions (14)-(15) are irrelevant since , thus (14) cannot be satisfied. On the other hand, the conditions (12)-(13) can be rewritten in terms of constraints on the mutual coherence involving the decay of the nonzero elements:
Depending on the decay of the nonzero coefficients, it can thus be seen that . The strongest condition corresponds to (9) and is obtained for ; it ensures the uniform recovery of any -sparse vector when good atoms have been selected during the first iterations. The weakest condition, , is obtained as soon as . We thus recover a result similar to Corollary 1. Here, the decay constraints only apply to the elements in since the elements in have already been selected by assumption.
Let us now discuss the particularization of Theorem 3 to the problem of partial support recovery (here, is set to 0). We focus on the case where . We note that both (12)-(13) and (14)-(15) ensure the selection of elements of during the first iterations of Oxx provided that the largest nonzero coefficients obey some “sufficient” decay (which is specified by either (13) or (15)). This leads to the following corollary for partial support recovery:
We note that Corollary 2 (which makes use of the mild assumpation ) does not guarantee that the selected atoms correspond to the largest coefficients of . Such guarantee can be obtained from (12)-(13) by imposing the stronger assumption :
Let . If and the largest magnitudes in exhibit a sufficient decay (13), then Oxx selects atoms in until the largest components of have been selected.
Note that Corollary 3 does not state that Oxx will select the largest components of during the first iterations. Their selection is however guaranteed during the first iterations.
2.3 Compressible and Noisy Signals
In many situations, the sought vector is not exactly -sparse but rather compressible and possibly non-sparse. Furthermore, the observations are corrupted by some additive noise (). We review hereafter some contributions of the literature dealing with the success of Oxx in this particular setup and provide new strongest results in Theorems 4 and 5. We will assume that the noise has a bounded -norm, that is . In the following, a signal will be referred to as “-compressible” as soon as the sum of the absolute values of entries of is large with respect to the remaining entries. Denoting by the complementary subset, the -compressible assumption reads for some subset of cardinality . and shall be thought of as the head and tail of the signal , respectively.
Let us first consider the -sparse setup () with noisy observations (). Although many researchers have emphasized that the noiseless conditions can be generalized to the case where the noise level is low in comparison to the smallest nonzero coefficient of [36, 37, 38, 39, 40], no tight condition of success for Oxx has been proposed so far. Among the noticeable coherence-based guarantees, we can nevertheless mention the work by Donoho et al. [36, Th. 5.1] (also rediscovered in [39, Th. 1]), stating that OMP succeeds if
To the best of our knowledge, the extension of these results to the success of OLS has never been made in the literature. We are neither aware of any contribution dealing with coherence-based conditions ensuring the recovery of a particular support for -compressible vectors and noisy observations. We address these questions in the next theorems. The result stated in Theorem 4 implies, as a corollary, that (16)-(17) are sufficient conditions for both OMP and OLS. Theorem 5 is an extension of Theorem 2 to the noisy -compressible setting. As in subsection 2.1, we assume that the elements of satisfy (4)-(5).
then Oxx selects atoms in during the first iterations.
The conditions of Theorem 4 take the same form as those in (16)-(17) but depend on the -norm of . Moreover, condition (19) depends on the position of the ordered coefficients: the larger , the weaker the constraint on their amplitude. As a result, when , Theorem 4 leads to weaker conditions than those previously proposed in ,  as soon as is not a flat vector. For flat vectors, (19) obviously reduces to (17). In such a case, Theorem 4 leads to the standard conditions by Donoho et al.
Let us mention that the conditions in Theorem 4 do not enforce any constraint on the decay of the coefficients in (but only between the elements in and each component of ). In the next theorem, we state “horizon-1” conditions of the same flavor as those presented in Theorems 2 and 3. Let us first define the following quantity:
Our result then writes as follows:
then Oxx selects atoms in from noisy data during the first iterations.
Theorem 3 could be extended in a similar way to the framework of compressible and noisy signals but we do not detail this extension for conciseness.
3 Technical Details
In this section, we provide a proof of the theorems stated in section 2. We first recall the main principles ruling OMP and OLS in section 3.1. We then introduce some technical lemmas in section 3.2. Finally the proof of the main results is exposed in section 3.3.
3.1 OMP and OLS
In order to precisely describe the update rules characterizing Oxx, let us first introduce some notations: given a set of indices , represents the submatrix of specified by the columns indexed in ; the projector onto the orthogonal complement of the column span of is defined as , where is the pseudo-inverse of ; in particular, is the residual error when projecting onto the span of . Finally, represents the vector inner product and is the null vector of size .
Oxx can be understood as an iterative procedure generating an estimate of by sequentially adding one new element to the current support estimate, say . As detailed in Algorithm 1, OMP and OLS differ in the way this new element is selected. At each iteration, OLS selects the atom yielding the minimum residual error :
and least-square problems have to be solved to compute for all . On the contrary, OMP adopts the simpler rule
to select the new atom , and then solves only one least-square problem to update the new residual .
The selection rules described above can also be expressed in terms of the (normalized) projected atoms of the dictionary . This formulation will turn out to be convenient in our proofs below. More specifically, let
With these notations, the selection rule of Oxx can be re-expressed as (see e.g., )
For simplicity, the dependence of , and on does not appear in our notations. The reader should however keep this dependence in mind in our subsequent derivations.
3.2 Some Useful Lemmas
We first state three useful lemmas, connecting different functions of the projected atoms to the mutual coherence of the dictionary.
Let . If , then
The result is a direct consequence of Lemmas 4 and 10 in .
Let . If , we have
If , then (26) simplifies to:
Proof: implies that and then
where follows from ,
First, using the definitions of the residual and the projected atoms , we have
Noticing that and , a sufficient condition for (30) is then as follows:
where the last inequality follows from the fact that . Combining these two bounds, we easily obtain that
3.3 Proofs of the Main Results
In this section, we provide a proof of the main theorems of the
paper. We skip the proofs of the corollaries, which are
straightforward. Theorems 1,
3, 4, 5
and the direct part of Theorem 2 are proved in
section 3.3.1. The converse part of Theorem
2 (that is the tightness of the proposed
conditions) is proved in section
Proofs of the Sufficient Conditions
All the proofs of this part use Lemma 4 as a key building block.
Proof of Theorem 1: We want to show that Oxx selects atoms in during the first iterations for all dictionaries obeying , for some , as long as is not a flat vector.
Let us first derive a condition on the mutual coherence ensuring that Oxx makes a correct decision at the first iteration. Particularizing the sufficient conditions of Lemma 4 to the case (with and ), we have that Oxx selects an element of provided that:
with . Now, since is not flat, we have that , and therefore .
On the other hand, if Oxx has selected any atoms in during the first iterations, it was proved in [14, Th. 3] that Oxx makes good decisions during the remaining iterations provided that . A sufficient condition for Oxx to select correct atoms during the first steps thus simply writes
Clearly, by definition.
Proof of Theorem 5: Assume that Oxx has selected atoms in when iterations have been completed; we apply Lemma 4 to show that, under the hypotheses of Theorem 5, the next atom selected by Oxx belongs to .
The first condition of Lemma 4, , is always verified since by hypothesis and . Let be the lowest index such that:
Clearly, (5) implies that .
Because the nonzero coefficients have been sorted in the decreasing order, see (5), we have:
It follows from (35) that
Then, (34) can be rewritten as
We finally obtain condition (22) by noticing that:
and the function is decreasing on for ;
This proof applies recursively to the iterations of Oxx for increasing
values of .
Proof of Theorem 3 (First Part): By hypothesis, we assume that Oxx has selected atoms in during the first iterations. We recursively show that, if (12) and (13) are satisfied, then Oxx keeps on picking atoms in as long as the largest elements of have not been selected.
Assume that after iteration , , has been completed, Oxx has selected with and (that is, some atoms in have not yet been selected by Oxx). We apply Lemma 4 (with subset , and with , ) to prove that the next atom selected by Oxx belongs to . The first condition in Lemma 4, , is verified since and .
Let be the lowest index such that
Proof of Theorem 3 (Second Part): We show that (14) and (15) ensure that atoms in are selected, provided that the largest elements of have not been selected and less than iterations have been carried out. The proof follows the same lines as the proof of the first part with some modifications that we describe hereafter.
Assume that Oxx has selected with after iteration has been completed, with , and . We apply Lemma 4 (with subset , and