Relaxed Recovery Conditions for OMP/OLS by Exploiting both Coherence and Decay

# Relaxed Recovery Conditions for OMP/OLS by Exploiting both Coherence and Decay

## Abstract

We propose extended coherence-based conditions for exact sparse support recovery using orthogonal matching pursuit (OMP) and orthogonal least squares (OLS). Unlike standard uniform guarantees, we embed some information about the decay of the sparse vector coefficients in our conditions. As a result, the standard condition (where denotes the mutual coherence and the sparsity level) can be weakened as soon as the nonzero coefficients obey some decay, both in the noiseless and the bounded-noise scenarios. Furthermore, the resulting condition is approaching for strongly decaying sparse signals. Finally, in the noiseless setting, we prove that the proposed conditions, in particular the bound , are the tightest achievable guarantees based on mutual coherence.

O

rthogonal matching pursuit; orthogonal least-squares; mutual coherence; exact recovery; sparse decaying representations.

## 1 Introduction

In this paper, we focus on two popular instances of greedy algorithms for sparse signal approximation from linear measurements, namely orthogonal matching pursuit (OMP) [1] and orthogonal least squares1 (OLS) [6, 7]. These two iterative procedures gradually build an estimate of the support of a sparse representation by adding one new element to it at each iteration, and update the sparse approximation by computing the orthogonal projection of the data vector onto the subspace yielded by the selected support. OMP and OLS exclusively differ in the way the new support element is selected: OMP picks the atom leading to the maximum (absolute) correlation with the current residual while OLS selects the atom minimizing the -norm of the new residual. In the rest of the paper, we will use the generic acronym Oxx to refer to both OMP and OLS in all the statements that are valid for the two procedures.

In the recent years, many researchers have studied conditions under which Oxx succeeds in recovering the true sparse vector. A popular approach to address this question relies on the derivation of uniform guarantees; the latter ensure the success of Oxx for a given sparsity level (or a given support) irrespective of the magnitude of the nonzero coefficients. This type of analyses was carried out for OMP in [8, 9] and also adapted to several extensions of OMP in [10, 11, 4]. Although OLS has been known in the literature for a few decades (under different names [12]), uniform exact recovery analyses of OLS have only appeared very recently, see [4, 13, 14].

On the one hand, uniform conditions are usually quite pessimistic since they cannot be satisfied as soon as Oxx fails for one particular sparse vector. As a matter of fact, it is now acknowledged that uniform conditions typically fail in properly characterizing the average behavior of the considered algorithm [15, 16]. In particular, in [17] the author emphasized that the empirical behavior of OMP is much dependent on the distribution defining the amplitudes of the nonzero coefficients. On the other hand, probabilistic analyses2 are usually quite involved to carry out for deterministic dictionaries because of the intricate nature of the recursions defining Oxx. It is noticeable that a probabilistic analysis of OMP has been proposed within the multiple measurement setup (i.e., when several data vectors having a common sparsity profile are to be simultaneously decomposed in the same dictionary) [20]. In this context, the uniform recovery guarantees can be significantly weakened within a probabilistic framework. Nevertheless, this result was shown to be only relevant when the number of measurement vectors is of the same order as the sparsity level and does therefore not apply to the single measurement case.

In this paper, we adopt a deterministic analysis technique for sparse vectors whose nonzero coefficients obey some decay. Our analysis is therefore no longer uniform since it restricts the success of Oxx to a certain class of sparse signals. To some extent, it may also provide insights into the success of Oxx for random input vectors as long as one can characterize the decay of “typical” realizations of the latter. From another point of view, let us mention that a number of authors empirically observed (and then conjectured) that the worst-case scenario for Oxx corresponds to the situation where all the nonzero coefficients have the same amplitude, see e.g., [21, 22, 23, 17]. The analysis of Oxx with decaying vectors is thus also expected to bring some answer to this question since vectors with equal nonzero coefficients correspond to the particular case of “no decay”.

Although sparse vectors with decaying nonzero coefficients can be observed in many applications (see [24] and [25] for examples in the field of image and audio processing, respectively), we are only aware of a few works analyzing the success of Oxx in such a setup [26, 9, 27, 28]. In [26] the authors adopted an information-theoretical point of view: they derived “rates” (i.e., dictionary dimensions and sparsity levels) under which a “successive interference canceller” (which can be understood as an idealized version of Oxx) can asymptotically succeed. In particular, they showed that the achievable rates depend on the amplitudes of the nonzero coefficients (which, in their theoretical framework, must be known to the receiver), and thus on the decay. However, their results only apply to randomly-generated dictionaries. In [9], the authors provided an analysis of OMP in terms of restricted isometry constants (RIC) and showed that the success of OMP can be ensured by considering sufficiently decaying vectors. In [27], Ding et al. extended this type of result to the case of observation models corrupted by noise. Finally, Ehler et al. carried out the same kind of RIC-based analysis for some non-linear generalization of OLS in [28].

In the sequel, we propose novel conditions of success in steps for both OMP and OLS in terms of mutual coherence of the dictionary ( denotes the number of nonzero coefficients in the sparse vector). We note that, as long as the success of OMP and OLS in steps is concerned3, mutual coherence and RICs are dictionary features which offer different perspectives on the success of Oxx: as shown in [14, Examples 2 and 3], there are instances of dictionaries for which the uniform mutual coherence condition is satisfied but the best-known uniform RIC conditions [29, 30] are not, and vice versa. The conditions derived in this paper relax several conditions previously proposed in the literature, and encompass them as particular cases.

We will consider a unified definition of Oxx based on the orthogonal projection of the dictionary atoms onto the orthogonal complement of the subspace spanned by the selected atoms, see e.g., [13]. This definition allows us to define both algorithms in a unifying framework and to carry out a parallel analysis of both OMP and OLS. Our derivations are then based on a fine analysis of the correlation between the projected atoms involved in the iterations of Oxx. Unlike previous works, we highlight that the decay conditions can be relaxed as the iterations of Oxx progress. Moreover, our guarantees are tight: these are the best achievable coherence-based guarantees exploiting the decay between successive ordered coefficients in the noiseless setup.

The rest of the paper is organized as follows. Our main results are stated in section 2 together with some relevant connections with the state of the art. The technical proofs of the results are reported in section 3.

## 2 Context and Main Results

Let obey the following model:

 y=Ax+w, (1)

where is a known dictionary, is an unknown vector and is some additive noise with . The columns of the dictionary are supposed to be normalized: . We investigate conditions ensuring that Oxx selects a subset of dictionary atoms, where matches the support of the largest elements of . We have summarized the main recursions of Oxx in Algorithm 1. denotes the vector inner product and is the projection of onto the space orthogonal to the columns of indexed by . We refer the reader to section 3.1 for a more detailed description of Oxx.

Our derivations are based on the so-called “-step” analysis of Oxx: Oxx will be assumed to fail as soon as one wrong atom is included to the estimated support [8, 9, 13]. On the contrary, Oxx succeeds if and only if the atoms in are selected during the first iterations. Alternative definitions of exact sparse recovery may be considered. In [4, 31], the authors focused on “delayed recovery”, where Oxx is assumed to succeed if the selected atoms contain the correct support, with possible false atom selections. This approach will not be pursued hereafter.

Several scenarios are considered. In sections 2.1 and 2.2, we address the case where the observation model is noiseless () and is -sparse with support (). In section 2.1, we focus on conditions ensuring the recovery of from the first iteration, i.e., with the initial empty support, whereas a finer analysis at intermediate iterations is carried out in section 2.2. This analysis allows us to provide weaker guarantees of good atom selection when (i) less than iterations are being performed, and when (ii) Oxx is known to have selected good atoms in the early iterations. In section 2.3, we address the noisy scenario (), and the case where is compressible but possibly non-sparse. In this case, can be thought of as the “head” of the signal , obtained by gathering the indices of the largest amplitudes in .

Some of the results presented below share connections. For example, the direct part of Theorem 2 (section 2.1), dealing with -step recovery and noiseless observations, can be seen as a particular case of the results presented in Theorems 3 and 5 (sections 2.2 and 2.3). However, we chose to follow this editorial line to keep the discussion of the results and the relation to the current state of the art as simple as possible.

### 2.1 k-step Analysis in the Noiseless Setup

The first thoughtful “-step” analysis of OMP is due to Tropp in [8, Th. 3.1 and Th. 3.10]. He provided a sufficient and worst-case necessary condition for the exact recovery of any sparse vector with a given support . Moreover, he showed that the condition

 μ<12k−1, (2)

where is the mutual coherence of , ensures the success of OMP. The derivation of similar conditions for OLS is more recent and is due to Soussen et al. in [13, 14].

Condition (2) is uniform, that is Oxx can recover any -sparse vector irrespective of the amplitude of the nonzero coefficients when (2) is satisfied. On the other hand, it was shown in [32, Th. 3.1] that (2) is tight: there exist a -sparse vector and a dictionary with such that Oxx selects a wrong atom at the first iteration4. This shows that one cannot expect to weaken (2) for the recovery of arbitrary -sparse vectors. Nevertheless, it is noticeable that the specific sparse vector involved in the example of [32] is “flat”, that is such that

 xi=constant∀i∈Q⋆. (3)

This is not a coincidence. In Theorem 1 below, we show that weaker sufficient conditions than (2) can be obtained as soon as the nonzero coefficients of obey some decay.

###### Theorem 1

If is a -sparse vector whose nonzero amplitudes are not all equal, there exists some such that Oxx recovers in steps for any dictionary with .

Interestingly, as mentioned in the introduction, it has been stated in many pieces of research (and accepted as a “folk knowledge” [17]) that sparse vectors with nonzero coefficients of equal magnitude correspond to the most difficult case for many reconstruction algorithms, see e.g., [21, 22, 23]. The result in Theorem 1 supports this observation by stating that, as long as the satisfaction of mutual coherence conditions for exact recovery is concerned, “flat” vectors correspond to the worst possible case for Oxx. In particular, a condition of success more favorable than always exists as soon as the coefficients of exhibit some decay.

Unfortunately, the proof of Theorem 1 does not provide an optimal value for (as a function of the rate of decay). In fact, a precise characterization of for general decay patterns may be a quite difficult task. In the next theorem, we provide “horizon-1” decay conditions (i.e., conditions between consecutive elements of the ordered nonzero coefficients) ensuring that Oxx succeeds in steps. In our statement, we assume without loss of generality that

 Q⋆={1,2,…,k}, (4)

and

 |x1|≥|x2|≥…≥|xk|>0. (5)
###### Theorem 2

If

 μ<1k, (6)

and

 |xi|>2μ(k−i)1−iμ|xi+1|∀i∈{1,…,k−1}, (7)

then Oxx recovers in steps.

Conversely, both conditions (6) and (7) are tight in the following sense:

• There exists an instance of dictionary with such that for all -sparse vectors supported by , Oxx selects a wrong atom during the first iterations.

• For all , there exists a vector and a dictionary of mutual coherence , for which the inequalities (7) hold for , and become an equality for , and such that Oxx with as input selects a wrong atom at the -th iteration.

Theorem 2 encompasses the standard condition (2) as a particular case. Indeed, the decay factor appearing in the right-hand side of condition (7) is such that

 2μ(k−i)1−iμ<1, (8)

as soon as

 μ <12k−i.

Thus, by virtue of our convention (5), (8) implies that condition (7) trivially holds for any as soon as (2) is satisfied. We also note that is a decreasing function of for . Hence, the rate of decay in (7) becomes lower as increases.

Condition (7) can be equivalently expressed as

 μ<μ⋆iwith μ⋆i=|xi||xi+1|2(k−i)+i|xi||xi+1| ,

. The conditions of success stated in Theorem 2 can thus also be rephrased as:

 μ<μ⋆=min(1k,μ⋆1,…,μ⋆k−1).

It can be seen that possible values for range in the interval and depend on the decay of the nonzero coefficients of . On the one hand, the smallest value for occurs when , in which case . Hence, we recover the standard condition (2) when . On the other hand, (and therefore ) as soon as . This leads to the following corollary:

###### Corollary 1

If and , then Oxx recovers in steps.

A graphical representation of these considerations is provided in Fig. 1 for : the decay factor appearing in (7) is plotted as a function of for different values of . For a given , the region above the related curve characterizes the set of vectors satisfying the recovery conditions of Theorem 2. We notice that the size of the region of success increases as the mutual coherence decreases. In particular, when , the curve is laying below the dashed line and (7) is satisfied for any -sparse representation since, by convention, the nonzero entries have been sorted according to (5). On the other side, the region of success is restricted to vectors satisfying when is close to . We note moreover that the decay constraints become less stringent as increases.

It is also insightful to see how often nonzero coefficients drawn from different distributions can satisfy (7). In Fig. 2, we have represented the empirical probability that coefficients drawn from Bernoulli, Uniform, Normal, Laplacian and LogLogistic distributions verify the decay conditions of Theorem 2. We consider again the case where and the results are averaged over 2000 realizations. In accordance with Theorems 1 and 2, the Bernoulli distribution (which always generates “flat” vectors) leads to the worst results. In particular, conditions (7) cannot be verified as soon as . In contrast, the vectors drawn from the other distributions satisfy (7) with some nonzero probability for any (and are therefore ensured to yield a success of Oxx). Interestingly, our conclusions regarding the comparison of distributions is the same as the one observed in the empirical study of the average performance of OMP in [17].

It is worth noting that not all standard sparse-representation algorithms enjoy a relaxation of their recovery conditions when dealing with decaying vectors. For example, the standard condition cannot be improved for Basis Pursuit [33]. Indeed, it has been shown in [32, Th. 3.1] that there exists a dictionary with and a flat -sparse vector such that Basis Pursuit leads to a wrong support detection.5 Now, it is well-known that tight conditions of success for Basis Pursuit only depend on the signed support of the sought sparse vector, see [34, 35]. The existence of a vector for which BP fails thus shows that BP will fail for all other sparse vectors with the same signed support, irrespective of the decay of the coefficients.

The converse part of Theorem 2 emphasizes that the proposed recovery conditions (6)-(7) are worst-case necessary in some sense. The nature of the sharpness of (6) and (7) is however slightly different. The tightness of (7) is restricted to the set of “horizon-1” conditions, that is conditions exploiting the decay between pairs of consecutive elements in the ordered sparse vector. The tightness of (6) is of more fundamental nature since Theorem 2 states that there exists a dictionary such that Oxx will fail during the first iterations irrespective of the values of the nonzero coefficients in . Hence, any mutual coherence condition ensuring -step recovery and valid for general deterministic dictionaries (and in particular for the specific dictionary considered in the proof of Theorem 2, see section 3.3) must be of the form with . Said otherwise, the bound cannot be improved whatever the hypotheses made on the sparse vector.

### 2.2 Partial Recovery and Successful Termination

In many applications, it is desirable to have some guarantees on the partial success of Oxx. Two main situations may be of interest:

• Successful Termination: Oxx is assumed to have selected atoms in , with cardinality , during the first iterations, and one is interested in conditions guaranteeing the selection of atoms in during the next iterations.

• Partial Support Recovery: the focus is on conditions ensuring the selection of elements of during the first iterations.

Before we state our results, let us make a few remarks. First, the question of partial support recovery has a trivial answer in the standard “uniform” setup. Indeed, as mentioned previously, the authors of [32] provided an instance of problem in which and Oxx selects a wrong atom at the first iteration. This shows that weaker coherence guarantees cannot be obtained for non-decaying vectors, even by restricting the success of Oxx to partial support recovery. On the contrary, we will emphasize that the paradigm of partial support recovery can be nicely addressed when accounting for the decay of the sparse vector.

Secondly, the question of the successful termination of Oxx has already been addressed in the uniform setting. In [13], the authors extended Tropp’s exact recovery condition (ERC) to this particular setup, both for OMP and OLS. The same type of conditions were expressed in terms of mutual coherence in [14, Th. 3]: if is reached during the first iterations, then Oxx selects atoms in during the next iterations provided that

 μ<12k−g−1. (9)

Similar to the standard -step analysis [32], (9) was shown to be tight: there exist a -sparse vector with support , a subset with and a dictionary with such that Oxx selects atoms in during the first steps and then makes a wrong decision. We show hereafter that this coherence bound can be relaxed when dealing with decaying sparse vectors.

In the statement of Theorem 3, following convention (5), we proceed to a re-ordering of the atoms by decreasing values of their magnitudes . Here, this convention is applied to the unselected atoms, which are therefore indexed by:

 Q⋆∖Q={1,2,…,k−g}, (10)

with

 |x1|≥|x2|≥…≥|xk−g|>0. (11)

Theorem 3 jointly addresses both questions of successful termination and partial support recovery.

###### Theorem 3

Assume that Oxx has selected with during the first iterations and let .

• If

 μ<1k, (12)

and the largest magnitudes of the unselected atoms after iteration satisfy

 |xi|>2μ(k−g−i)1−(g+i)μ|xi+1|∀i∈{1,…,p}, (13)

then Oxx is guaranteed to select atoms in until all the elements in have been selected.

• If

 1k≤μ<1g+r, (14)

and the largest magnitudes of the unselected atoms after iteration satisfy

 |xi|>2μ(k−g−r)1−(g+r)μ|xi+1|∀i∈{1,…,p}, (15)

then Oxx is ensured to select atoms in until all the elements in have been selected or iterations have been carried out.

Let us discuss the implications of Theorem 3 on both problems of “successful termination” and “partial support recovery”. We specifically elaborate on the corresponding choices of , and .

The paradigm of successful termination corresponds to the case . In this setup, we note that conditions (14)-(15) are irrelevant since , thus (14) cannot be satisfied. On the other hand, the conditions (12)-(13) can be rewritten in terms of constraints on the mutual coherence involving the decay of the nonzero elements:

 μ<μ⋆=min(1k,μ⋆1,…,μ⋆p),

with

 μ⋆i=|xi||xi+1|2(k−g−i)+(g+i)|xi||xi+1|.

Depending on the decay of the nonzero coefficients, it can thus be seen that . The strongest condition corresponds to (9) and is obtained for ; it ensures the uniform recovery of any -sparse vector when good atoms have been selected during the first iterations. The weakest condition, , is obtained as soon as . We thus recover a result similar to Corollary 1. Here, the decay constraints only apply to the elements in since the elements in have already been selected by assumption.

Let us now discuss the particularization of Theorem 3 to the problem of partial support recovery (here, is set to 0). We focus on the case where . We note that both (12)-(13) and (14)-(15) ensure the selection of elements of during the first iterations of Oxx provided that the largest nonzero coefficients obey some “sufficient” decay (which is specified by either (13) or (15)). This leads to the following corollary for partial support recovery:

###### Corollary 2

Let . If and the largest coefficients of exhibit a sufficient decay (specified by (13) or (15)), then Oxx selects atoms in during the first iterations.

We note that Corollary 2 (which makes use of the mild assumpation ) does not guarantee that the selected atoms correspond to the largest coefficients of . Such guarantee can be obtained from (12)-(13) by imposing the stronger assumption :

###### Corollary 3

Let . If and the largest magnitudes in exhibit a sufficient decay (13), then Oxx selects atoms in until the largest components of have been selected.

Note that Corollary 3 does not state that Oxx will select the largest components of during the first iterations. Their selection is however guaranteed during the first iterations.

### 2.3 Compressible and Noisy Signals

In many situations, the sought vector is not exactly -sparse but rather compressible and possibly non-sparse. Furthermore, the observations are corrupted by some additive noise (). We review hereafter some contributions of the literature dealing with the success of Oxx in this particular setup and provide new strongest results in Theorems 4 and 5. We will assume that the noise has a bounded -norm, that is . In the following, a signal will be referred to as “-compressible” as soon as the sum of the absolute values of entries of is large with respect to the remaining entries. Denoting by the complementary subset, the -compressible assumption reads for some subset of cardinality . and shall be thought of as the head and tail of the signal , respectively.

Let us first consider the -sparse setup () with noisy observations (). Although many researchers have emphasized that the noiseless conditions can be generalized to the case where the noise level is low in comparison to the smallest nonzero coefficient of [36, 37, 38, 39, 40], no tight condition of success for Oxx has been proposed so far. Among the noticeable coherence-based guarantees, we can nevertheless mention the work by Donoho et al. [36, Th. 5.1] (also rediscovered in [39, Th. 1]), stating that OMP succeeds if

 μ <12k−1, (16) |xi| >2ϵ1−(2k−1)μ,∀i∈{1,…,k}. (17)

To the best of our knowledge, the extension of these results to the success of OLS has never been made in the literature. We are neither aware of any contribution dealing with coherence-based conditions ensuring the recovery of a particular support for -compressible vectors and noisy observations. We address these questions in the next theorems. The result stated in Theorem 4 implies, as a corollary, that (16)-(17) are sufficient conditions for both OMP and OLS. Theorem 5 is an extension of Theorem 2 to the noisy -compressible setting. As in subsection 2.1, we assume that the elements of satisfy (4)-(5).

###### Theorem 4

If

 μ<12k−1, (18)

and

 |xi| >2(∥x¯Q⋆∥1+ϵ)1−(2k−i)μ∀i∈{1,…,k}, (19)

then Oxx selects atoms in during the first iterations.

The conditions of Theorem 4 take the same form as those in (16)-(17) but depend on the -norm of . Moreover, condition (19) depends on the position of the ordered coefficients: the larger , the weaker the constraint on their amplitude. As a result, when , Theorem 4 leads to weaker conditions than those previously proposed in [36], [39] as soon as is not a flat vector. For flat vectors, (19) obviously reduces to (17). In such a case, Theorem 4 leads to the standard conditions by Donoho et al.

Let us mention that the conditions in Theorem 4 do not enforce any constraint on the decay of the coefficients in (but only between the elements in and each component of ). In the next theorem, we state “horizon-1” conditions of the same flavor as those presented in Theorems 2 and 3. Let us first define the following quantity:

 γk≜⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩1−(k−2)μ(μ+1)(1−kμ)for OMP,√1−(k−2)μμ+1√1−(k−1)μ1−kμfor OLS. (20)

Our result then writes as follows:

###### Theorem 5

If

 μ<1k, (21)

and ,

 |xi| >2μ(k−i)1−iμ|xi+1|+2γk(ϵ+∥x¯Q⋆∥1), (22)

then Oxx selects atoms in from noisy data during the first iterations.

Theorem 3 could be extended in a similar way to the framework of compressible and noisy signals but we do not detail this extension for conciseness.

## 3 Technical Details

In this section, we provide a proof of the theorems stated in section 2. We first recall the main principles ruling OMP and OLS in section 3.1. We then introduce some technical lemmas in section 3.2. Finally the proof of the main results is exposed in section 3.3.

### 3.1 OMP and OLS

In order to precisely describe the update rules characterizing Oxx, let us first introduce some notations: given a set of indices , represents the submatrix of specified by the columns indexed in ; the projector onto the orthogonal complement of the column span of is defined as , where is the pseudo-inverse of ; in particular, is the residual error when projecting onto the span of . Finally, represents the vector inner product and is the null vector of size .

Oxx can be understood as an iterative procedure generating an estimate of by sequentially adding one new element to the current support estimate, say . As detailed in Algorithm 1, OMP and OLS differ in the way this new element is selected. At each iteration, OLS selects the atom yielding the minimum residual error :

 j∈argmini∉Q∥rQ∪{i}∥2,

and least-square problems have to be solved to compute for all  [6]. On the contrary, OMP adopts the simpler rule

 j∈argmaxi∉Q|⟨ai,rQ⟩|,

to select the new atom , and then solves only one least-square problem to update the new residual .

The selection rules described above can also be expressed in terms of the (normalized) projected atoms of the dictionary [13]. This formulation will turn out to be convenient in our proofs below. More specifically, let

 ~ai ≜P⊥Qai, ~bi ≜{~ai/∥~ai∥2if ~ai≠0m0motherwise.

With these notations, the selection rule of Oxx can be re-expressed as (see e.g., [5])

 j∈argmaxi∉Q|⟨~ci,rQ⟩|, (23)

where

 ~ci≜{~aifor OMP,~bifor OLS.

For simplicity, the dependence of , and on does not appear in our notations. The reader should however keep this dependence in mind in our subsequent derivations.

### 3.2 Some Useful Lemmas

We first state three useful lemmas, connecting different functions of the projected atoms to the mutual coherence of the dictionary.

###### Lemma 1

Let . If , then

 ∥~ai∥22≥(μ+1)(1−gμ)1−(g−1)μ∀i∉Q,|⟨~ai,~aj⟩|≤μ(μ+1)1−(g−1)μ∀j≠i. (24)

Proof: The result is a direct consequence of Lemmas 4 and 10 in [14].

###### Lemma 2

Let . If , we have

 ⟨~ci,~ai⟩≥αg>0∀i∉Q,|⟨~ci,~aj⟩|≤μg∀j≠i,

where

 αg= ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩(μ+1)(1−gμ)1−(g−1)μfor OMP√(μ+1)(1−gμ)1−(g−1)μfor OLS (25) μg= min{1,μ1−gμαg}. (26)

Proof: The result immediately follows from Lemma 1 and from and . Note that implies that (see (24)). Thus, reads .

###### Lemma 3

If , then (26) simplifies to:

 μg=μ1−gμαg. (27)

Proof: implies that and then

 μ1−gμαg≤αg≤1,

where follows from , .

Lemmas 2 and 3 are the building blocks of the next lemma, which provides sufficient conditions for Oxx to select a good atom at a given iteration:

###### Lemma 4

Consider a (possibly non-sparse) signal and a subset of cardinality . Assume that Oxx, with defined as in (1) as input, has selected atoms in during the first iterations, with . Let , be defined as in Lemma 2. If

 μ<1g, (28) (αg+μg)∥xQ⋆∖Q∥∞−2μg∥xQ⋆∖Q∥1>2(ϵ+∥x¯Q⋆∥1), (29)

then Oxx selects an atom in at the next iteration.

Proof : We want to show that (28)-(29) implies

 maxi∈Q⋆∖Q|⟨~ci,rQ⟩|>|⟨~cl,rQ⟩|,∀l∉Q⋆. (30)

First, using the definitions of the residual and the projected atoms , we have

 rQ=sQ+P⊥Qw,

where

 sQ=∑i∉Q~aixi.

Noticing that and , a sufficient condition for (30) is then as follows:

 maxi∈Q⋆∖Q|⟨~ci,sQ⟩|−|⟨~cl,sQ⟩|>2ϵ,∀l∉Q⋆. (31)

Let . Since , we can apply Lemma 2 and bound the terms in the left-hand side of (31) as follows:

 maxi∈Q⋆∖Q|⟨~ci,sQ⟩| ≥|⟨~cj,sQ⟩| ≥|⟨~cj,~aj⟩||xj|−∑i∉Q∪{j}|⟨~cj,~ai⟩||xi| ≥αg|xj|−μg(∥xQ⋆∖(Q∪{j})∥1+∥x¯Q⋆∥1),

and ,

 |⟨~cl,sQ⟩| ≤|⟨~cl,~al⟩||xl|+∑i∉Q∪{l}|⟨~cl,~ai⟩||xi| ≤|xl|+μg(∥xQ⋆∖Q∥1+∥x¯Q⋆∖{l}∥1),

where the last inequality follows from the fact that . Combining these two bounds, we easily obtain that

 (αg+μg)|xj|−2μg ∥xQ⋆∖Q∥1 >2ϵ+(1−μg)|xl|+2μg∥x¯Q⋆∥1

is a sufficient condition for (31) and then (30). Finally, noticing that (Lemma 2) and , we obtain (29).

### 3.3 Proofs of the Main Results

In this section, we provide a proof of the main theorems of the paper. We skip the proofs of the corollaries, which are straightforward. Theorems 1, 3, 4, 5 and the direct part of Theorem 2 are proved in section 3.3.1. The converse part of Theorem 2 (that is the tightness of the proposed conditions) is proved in section 3.3.2.

#### Proofs of the Sufficient Conditions

All the proofs of this part use Lemma 4 as a key building block.

Proof of Theorem 1: We want to show that Oxx selects atoms in during the first iterations for all dictionaries obeying , for some , as long as is not a flat vector.

Let us first derive a condition on the mutual coherence ensuring that Oxx makes a correct decision at the first iteration. Particularizing the sufficient conditions of Lemma 4 to the case (with and ), we have that Oxx selects an element of provided that:

 μ<ρ2−ρ,

with . Now, since is not flat, we have that , and therefore .

On the other hand, if Oxx has selected any atoms in during the first iterations, it was proved in [14, Th. 3] that Oxx makes good decisions during the remaining iterations provided that . A sufficient condition for Oxx to select correct atoms during the first steps thus simply writes

 μ<μ⋆  with  μ⋆ =min(ρ2−ρ,12k−2).

Clearly, by definition.

The direct part of Theorem 2 can be seen as a special case of Theorem 5 when and , and of Theorem 3 with . Hence, we focus on the latter proofs hereafter.

Proof of Theorem 5: Assume that Oxx has selected atoms in when iterations have been completed; we apply Lemma 4 to show that, under the hypotheses of Theorem 5, the next atom selected by Oxx belongs to .

The first condition of Lemma 4, , is always verified since by hypothesis and . Let be the lowest index such that:

 j ∈argmaxi∈Q⋆∖Q|xi|. (32)

Clearly, (5) implies that .

Because the nonzero coefficients have been sorted in the decreasing order, see (5), we have:

 ∥xQ⋆∖Q∥1 ≤|xj|+(k−g−1)|xj+1|. (33)

Hence,

 (αg−μg)|xj| −2μg(k−g−1)|xj+1|>2(ϵ+∥x¯Q⋆∥1) (34)

is a sufficient condition for (29). Since and by assumption, we have and we can exploit the expression (27) of in Lemma 3 to rewrite:

 αg−μg =αg(1−μ1−gμ)>0. (35)

It follows from (35) that

 μgαg−μg =αgαg−μg−1, =1−gμ1−(g+1)μ−1, =μ1−(g+1)μ.

Then, (34) can be rewritten as

 |xj|>2μ(k−g−1)1−(g+1)μ|xj+1|+2(ϵ+∥x¯Q⋆∥1)αg−μg. (36)

We finally obtain condition (22) by noticing that:

• and the function is decreasing on for ;

• (see (25)) and are both decreasing with and non-negative, hence (see (35)) is increasing with . It is upper bounded by , which is equal to defined in (20).

We can thus conclude that (21)-(22) are sufficient conditions for (28)-(29) and the next atom selected by Oxx belongs to by virtue of Lemma 4.

This proof applies recursively to the iterations of Oxx for increasing values of .

The proofs of Theorems 3 and 4 follow a reasoning in the same vein as Theorem 5 but with some variations that we describe below.

Proof of Theorem 3 (First Part): By hypothesis, we assume that Oxx has selected atoms in during the first iterations. We recursively show that, if (12) and (13) are satisfied, then Oxx keeps on picking atoms in as long as the largest elements of have not been selected.

Assume that after iteration , , has been completed, Oxx has selected with and (that is, some atoms in have not yet been selected by Oxx). We apply Lemma 4 (with subset , and with , ) to prove that the next atom selected by Oxx belongs to . The first condition in Lemma 4, , is verified since and .

Let be the lowest index such that

 j ∈argmaxi∈Q⋆∖Q′|xi|. (37)

Because of our assumption we necessarily have that , and then from convention (11), we must have . By the same arguments as those exposed in the proof of Theorem 5, we have that

 |xj|>2μ(k−g−t−1)1−(g+t+1)μ|xj+1| (38)

is a sufficient condition for (29). Next, we have from the convention (11) that , and the function is decreasing on , hence on for . So,

 |xj|>2μ(k−g−j)1−(g+j)μ|xj+1| (39)

is sufficient for (38) and then (29). Since , (39) holds by virtue of (13) and the next atom selected by Oxx belongs to .

Proof of Theorem 3 (Second Part): We show that (14) and (15) ensure that atoms in are selected, provided that the largest elements of have not been selected and less than iterations have been carried out. The proof follows the same lines as the proof of the first part with some modifications that we describe hereafter.

Assume that Oxx has selected with after iteration has been completed, with , and . We apply Lemma 4 (with subset , and