On Probability of Support Recovery for Orthogonal Matching Pursuit Using Mutual Coherence

# On Probability of Support Recovery for Orthogonal Matching Pursuit Using Mutual Coherence

## Abstract

In this paper we present a new coherence-based performance guarantee for the Orthogonal Matching Pursuit (OMP) algorithm. A lower bound for the probability of correctly identifying the support of a sparse signal with additive white Gaussian noise is derived. Compared to previous work, the new bound takes into account the signal parameters such as dynamic range, noise variance, and sparsity. Numerical simulations show significant improvements over previous work and a closer match to empirically obtained results of the OMP algorithm.

Compressed Sensing (CS), Sparse Recovery, Orthogonal Matching Pursuit (OMP), Mutual Coherence

## I Introduction

Let be an unknown variable that we would like to estimate from the measurements

 y=As+w, (1)

where is a deterministic matrix and is a noise vector, often assumed to be white Gaussian noise with mean zero and covariance , where is the identity matrix. The matrix is called a dictionary. We consider the case when is overcomplete, i.e. , hence uniqueness of the solution of (1) cannot be guaranteed. However, if most elements of are zero, we can limit the space of possible solutions, or even obtain a unique one, by solving

 ^s=minx∥x∥0s.t.∥y−Ax∥22≤ϵ, (2)

where is a constant related to . The location of nonzero entries in is known as the support set, which we denote by . In some applications, e.g. estimating the direction of arrival in antenna arrays [1], correctly identifying the support is more important than accuracy of values in . When the correct support is known, the solution of the least squares problem gives , where is formed using the columns of indexed by , see [2, 3].

Solving (2) is an NP-hard problem and several greedy algorithms have been proposed to compute an approximate solution of (2); a few examples include Matching Pursuit (MP) [4], Orthogonal Matching Pursuit (OMP) [5], Regularized-OMP (ROMP) [6], and Compressive Sampling Matching Pursuit (CoSaMP) [7]. In contrast to greedy methods, convex relaxation algorithms [8, 9, 10, 11] replace the pseudo-norm in (2) with an norm, leading to a convex optimization problem known as the Basis Pursuit (BP) problem [12]. While convex relaxation methods require weaker conditions for exact recovery [13, 2], they are computationally more expensive than greedy methods, specially when  [14, 15, 7].

The most important aspect of a sparse recovery algorithm is the uniqueness of the obtained solution. Mutual Coherence (MC) [16], cumulative coherence [13], the spark [17], Exact Recovery Coefficient (ERC) [18], and Restricted Isometry Constant (RIC) [19] are metrics proposed to evaluate the suitability of a dictionary for exact recovery. Among these metrics, RIC, spark, and ERC achieve better performance guarantees; however, computing RIC and the spark is in general NP-hard and calculating ERC is a combinatorial problem. In contrast, MC can be efficiently computed and has shown to provide acceptable performance guarantees [20, 21, 22, 2, 3].

In this paper, we derive a new lower bound for the probability of correctly identifying the support of a sparse signal using the OMP algorithm. Our main motivation is that previous methods do not directly take into account signal parameters such as dynamic range, sparsity, and the noise characteristics in the computed probability. We will elaborate on this in section II, where we discuss the most recent theoretical analysis for OMP based on MC. The main result of the paper will be presented in section III, followed by numerical evaluation of the new performance guarantee in section IV.

## Ii Motivation

The mutual coherence of a dictionary , denoted , is the maximum absolute cross correlation of its columns [16]:

 μi,j(A) =⟨Ai,Aj⟩, (3) μmax(A) =max1≤i≠j≤N|μi,j(A)|, (4)

where we have assumed, as with the rest of the paper, that , . Apart from MC and sparsity,

 smin=min(|si|),andsmax=max(|si|),∀i∈Λ, (5)

which define the dynamic range of the signal, also affect the performance of OMP. The following theorem establishes an important coherence-based performance guarantee for OMP.

###### Theorem 1 (Ben-Haim et al. [3]).

Let , where , and . If

 smin−(2τ−1)μmaxsmin≥2β, (6)

where is defined for some constant , then with probability at least

 1−1Nα√π(1+α)logN, (7)

OMP identifies the true support, denoted .

The proof involves analyzing the probability event , for some constant and for all (see [3] for details). They show that with the lower bound probability of (7), the inequality holds. It is then shown that if and (6) hold, then OMP identifies the correct support in each iteration. Moreover, it is assumed that the elements of the sparse vector are deterministic variables. Hence a strong condition such as (6) is required to determine if the support of can be recovered.

Our analysis removes the condition stated in (6) and introduces a probabilistic bound that depends on , , , , , and the signal noise. Hence we derive a probability bound that directly takes into account signal parameters and MC. Moreover, unlike [3], we assume that the nonzero elements of are centered independent random variables with arbitrary distributions. This enables the derivation of a more accurate bound for the probability of exact support recovery.

## Iii Omp Convergence Analysis

In this section we present and prove the main result of the paper. Numerical results will be presented in section IV.

###### Theorem 2.

Let , where , and . Moreover, assume that the nonzero elements of are independent centered random variables with arbitrary distributions. Let , for some constant and . If , then OMP identifies the true support with lower bound probability

 λ(1−2Nexp(−N(smin/2−β)22τ2γ2+2Nγ(smin/2−β)/3)), (8)

where . Moreover, is lower bounded by

 1−N√2πσβe−β2/2σ2. (9)

Before presenting the proof, let us compare Theorems 1 and 2 analytically. It is important to note that (9) is indeed equivalent to (7). The apparent difference is only attributed to the use of or from the definition . For instance, using the aforementioned definition of on (9) leads to (7). As a result, the second term of (8) can be interpreted as a probabilistic representation of the condition imposed by (6) in Theorem 1. Moreover, because (9) is equal to (7) and the second term of (8) is in the range , therefore (8) is always smaller or equal to (7). However, as it will be seen in section IV, since the condition of Theorem 1 in (6) is not satisfied in many scenarios, our results match the empirical results more closely. Evidently, the condition in Theorem 2 is more relaxed compared to (6). Our numerical results in Section IV also verify this fact.

The following lemma will provide us with the necessary tool for the proof of Theorem 2. The proof of the lemma is postponed to the Appendix.

###### Lemma 1.

Define , for any , where and . Then for some constant , and assuming , we have

 Pr{Γj≥ξ}≤2exp(−(ξ−β)22(Nν+c(ξ−β)/3)), (10)

where

 |μj,nsn|≤c,E{μ2j,ns2n}≤ν,∀n∈{1,…,N} (11)

We can now state the proof of Theorem 2.

###### Proof of Theorem 2.

It was shown in [3] that OMP identifies the true support if

 minj∈Λ|⟨Aj,AΛsΛ+w⟩|≥maxk∉Λ|⟨Ak,AΛsΛ+w⟩|. (12)

The term on the left-hand side of (12) can be rewritten as

 minj∈Λ |⟨Aj,AΛsΛ+w⟩| =minj∈Λ∣∣sj+⟨Aj,AΛ∖{j}sΛ∖{j}+w⟩∣∣ (13) ≥minj∈Λ∣∣sj∣∣−maxj∈Λ∣∣⟨Aj,AΛ∖{j}sΛ∖{j}+w⟩∣∣. (14)

From (12) and (14), we can see that the OMP algorithm identifies the true support if

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩maxk∉Λ{Γk}

Using (15), we can define the probability of error as

 Pr{error} ≤Pr{maxj∈Λ∣∣⟨Aj,AΛ∖{j}sΛ∖{j}+w⟩∣∣≥smin2} +Pr{maxk∉Λ{Γk}≥smin2} (16) ≤∑j∈ΛPr{∣∣⟨Aj,AΛ∖{j}sΛ∖{j}+w⟩∣∣≥smin2} +∑k∉ΛPr{Γk≥smin2}. (17)

For the first term on the right-hand side of (17), excluding the summation over the indices in , from Lemma 1 we have

 Prj∈Λ{∣∣⟨Aj,AΛ∖{j}sΛ∖{j}+w⟩∣∣≥smin2}≤2exp(−ρ22((τ−1)ν+cρ/3))P1, (18)

where is defined for notational brevity. Note that the dictionary in (18) is supported on , i.e. all the indices in the true support excluding . Therefore the term , instead of , appears in the denominator of (18). Similarly, for the second term of (17) we have

 Prk∉Λ{Γk≥smin2}≤2exp(−ρ22(τν+cρ/3))P2. (19)

Substituting (18) and (19) into (17) yields

 Pr{error}≤τP1+(N−τ)P2≤NP2, (20)

where the last inequality follows since .

Moreover, for the upper bounds and in (11) we have

 |μj,nsn| ≤μmaxsmax, (21) E{μ2j,ns2n} ≤1NN∑n=1μ2maxE{s2n}≤τNs2maxμ2max, (22)

Combining (21) and (22) with (20), the following is obtained

 Pr{error}≤2Nexp(−Nρ22τ2γ2+2Nγρ/3), (23)

where we have defined for notational brevity.

So far we have assumed that , . Therefore, the probability of success is the joint probability of and the inverse of (23). For the former, a lower bound was formulated in [3] as follows

 Pr{|⟨Aj,w⟩|≤β}≥1−√2πσβe−β2/2σ2P3. (24)

Since should hold , we have

 Prj=1,…,N{|⟨Aj,w⟩|≤β}≥(1−P3)N≥1−NP3. (25)

Inverting the probability event in (23) and multiplying by the lower bound in (25) yields (8), which completes our proof. ∎

## Iv Numerical Results

In this section we compare numerical results of Theorem 1 (Ben-Haim et al. [3]), and Theorem 2 (proposed herein) with the empirical results of OMP. Indeed we only consider probability of successful recovery of the support. An upper bound for the MSE of the oracle estimator has been previously established, see e.g. Theorem 5.1 in [2] or Lemma 4 in [3]. The oracle estimator knows the support of the signal, a priori.

All the empirical results are obtained by performing the OMP algorithm times using a random sparse signal with additive white Gaussian noise in each trial. The probability of success is computed as the ratio of successful trials to the total number of trials; note that a trial is successful if , where is the support of obtained from OMP by solving (2). Moreover, the number of trials was empirically set such that the probability of success for the OMP algorithm was stable across different parameters. For comparison, we use the dictionary of [3] defined as , where is an identity matrix and is a Hadamard matrix, hence we have .

The sparse signal in each trial, denoted in (1), is constructed as follows: The support of the sparse signal, , is constructed by uniform random permutation of the set and taking the first indices. The nonzero elements located at are drawn randomly from a uniform distribution on the interval , multiplied randomly by or . Once the sparse signal is constructed, the input of the OMP algorithm, , is obtained by evaluating (1).

In order to facilitate the comparison of Theorems 1 and 2, we need to fix the value of . To do this, we empirically calculate as , where the maximum over is computed using vectors , as assumed by both theorems. Given , we can calculate for Theorem 1 from the definition . Indeed, a lower value of leads to better results for both theorems, see (8) and (6). As a result, here we consider the worst-case scenario. When (6) is not satisfied for Theorem 1, we set the probability of success to zero. We use the same procedure for the condition of Theorem 2; i.e. the probability of success is set to zero when .

Numerical results are summarized in Figure 1. We analyze the effect of sparsity on the probability of successful support recovery in plots 0(a), 0(b), and 0(c). Three signal dimensionalities and three noise variances: , , and , are considered. For all these cases we set and . In Fig. 0(a) we see that Theorem 1 achieves a higher probability for and small values of , while Theorem 2 leads to more accurate results for larger values of . Additionally, for and , Theorem 2 is much closer to empirical results. Most importantly, the shape of the probability curves for Theorem 2 matches the empirical curves. In contrast, Theorem 1 produces a step function due to the fact that condition (6) is not satisfied for a large range of values for , even though the success probability in (7) is close to one for different values of . The condition of Theorem 2 is satisfied across all the parameters for figures 0(a)-0(c).

We discussed in section III that (8) is always smaller than (7) due to the second term of (8). We expect this term to become more accurate as the signal dimensionality grows since it is exponential in ; moreover, and become smaller as grows. This is confirmed in figures 0(b) and 0(c). As we increase , the gap between theorems 1 and 2 increases, confirming that the second term of (8) is becoming more accurate compared to (6). The empirical probability is close to one for all the values of plotted in figures 0(b) and 0(c).

The effect of on the probability of success is demonstrated in figures 0(d), 0(e), and 0(f). For each plot, we consider , , and , while setting . The empirical results show a probability of success close to one across the parameters considered. In Fig. 0(d) we see a significant difference between Theorems 1 and 2. The condition of Theorem 1 is not satisfied for any value of and . In contrast, Theorem 2 shows high probabilities for all three values of . The dynamic range (DR) of the signal can be defined as . As we increase the signal dimensionality (), Theorem 2 reports larger probability for larger values of DR and all three values of . On the other hand, the condition of Theorem 1 fails for and , even when we have . For , Theorem 1 can produce valid results for a slightly higher DR.

Lastly, in plots 0(g), 0(h), and 0(i), we analyze the effect of noise variance on the probability of success for , , and . In Fig. 0(g), where , both theorems fail to produce valid results for . However, Theorem 2 reports acceptable results for and , while the condition of Theorem 1 is not satisfied. As the signal dimensionality grows, see Fig. 0(h) and 0(i), Theorem 2 becomes more tolerant of higher noise variances. The results for Theorem 1 also improves with increasing signal dimensionality, however only for . This shows the robustness of Theorem 2 to larger values of sparsity.

## V Conclusions

We presented a new bound for the probability of correctly identifying the support of a noisy sparse signal using the OMP algorithm. Compared to the analysis of Ben-Haim et al. [3], our analysis replaces a sharp condition with a probabilistic bound. Comparisons to empirical results obtained by OMP show a much improved correlation than previous work.

###### proof of Lemma 1.

Expanding , we can show that

 Γj =∣∣ ∣∣M∑m=1Am,j(N∑n=1Am,nsn+wm)∣∣ ∣∣ (26) (27) =∣∣ ∣∣N∑n=1{μj,nsn+1N⟨Aj,w⟩}∣∣ ∣∣. (28)

We are interested in tail bounds for sum of random variables , for . Let us define . Using the assumption we have

 Pr{Γj≥ξ} ≤Pr{∣∣ ∣∣N∑n=1xn∣∣ ∣∣+∣∣ ∣∣1NN∑n=1⟨Aj,w⟩∣∣ ∣∣≥ξ} ≤Pr{∣∣ ∣∣N∑n=1xn∣∣ ∣∣≥ξ−β}. (29)

Since , and hence , are centered independent real random variables, according to Bernstein’s inequality [23], if , and , then for a positive constant we have

 ≤2exp⎛⎜ ⎜ ⎜ ⎜⎝−δ22(N∑n=1E{x2n}+cδ/3)⎞⎟ ⎟ ⎟ ⎟⎠ ≤2exp(−δ22(Nν+cδ/3)). (30)

Setting in (30) completes the proof. ∎

### References

1. D. Malioutov, M. Cetin, and A. Willsky, “A sparse signal reconstruction perspective for source localization with sensor arrays,” Signal Processing, IEEE Transactions on, vol. 53, no. 8, pp. 3010–3022, Aug 2005.
2. D. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” Information Theory, IEEE Transactions on, vol. 52, no. 1, pp. 6–18, Jan 2006.
3. Z. Ben-Haim, Y. Eldar, and M. Elad, “Coherence-based performance guarantees for estimating a sparse vector under random noise,” Signal Processing, IEEE Transactions on, vol. 58, no. 10, pp. 5030–5043, Oct 2010.
4. S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” Signal Processing, IEEE Transactions on, vol. 41, no. 12, pp. 3397–3415, Dec 1993.
5. Y. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Conference Record of The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, Nov 1993, pp. 40–44 vol.1.
6. D. Needell and R. Vershynin, “Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit,” Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp. 310–316, April 2010.
7. D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.
8. S. Wright, R. Nowak, and M. Figueiredo, “Sparse reconstruction by separable approximation,” Signal Processing, IEEE Transactions on, vol. 57, no. 7, pp. 2479–2493, July 2009.
9. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.
10. M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586–597, Dec 2007.
11. E. van den Berg and M. P. Friedlander, “Probing the pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2009.
12. S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998.
13. J. Tropp, “Just relax: convex programming methods for identifying sparse signals in noise,” Information Theory, IEEE Transactions on, vol. 52, no. 3, pp. 1030–1051, March 2006.
14. S. H. Hsieh, C. S. Lu, and S. C. Pei, “Fast omp: Reformulating omp via iteratively refining l2-norm solutions,” in 2012 IEEE Statistical Signal Processing Workshop (SSP), Aug 2012, pp. 189–192.
15. F. Marvasti, A. Amini, F. Haddadi, M. Soltanolkotabi, B. H. Khalaj, A. Aldroubi, S. Sanei, and J. Chambers, “A unified approach to sparse signal processing.” EURASIP Journal on Advances in Signal Processing, vol. 2012, p. 44, 2012.
16. D. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” Information Theory, IEEE Transactions on, vol. 47, no. 7, pp. 2845–2862, Nov 2001.
17. D. L. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,” Proceedings of the National Academy of Sciences, vol. 100, no. 5, pp. 2197–2202, 2003.
18. J. Tropp, “Greed is good: algorithmic results for sparse approximation,” Information Theory, IEEE Transactions on, vol. 50, no. 10, pp. 2231–2242, Oct 2004.
19. E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
20. P. L. Dragotti and Y. M. Lu, “On sparse representation in fourier and local bases,” IEEE Transactions on Information Theory, vol. 60, no. 12, pp. 7888–7899, Dec 2014.
21. C. Herzet, C. Soussen, J. Idier, and R. Gribonval, “Exact recovery conditions for sparse representations with partial support information,” IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7509–7524, Nov 2013.
22. T. Cai, L. Wang, and G. Xu, “Stable recovery of sparse signals and an oracle inequality,” Information Theory, IEEE Transactions on, vol. 56, no. 7, pp. 3516–3522, July 2010.
23. G. Bennett, “Probability inequalities for the sum of independent random variables,” Journal of the American Statistical Association, vol. 57, no. 297, pp. 33–45, 1962.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters