Generalized Residual Ratio Thresholding

# Generalized Residual Ratio Thresholding

## Abstract

Simultaneous orthogonal matching pursuit (SOMP) and block OMP (BOMP) are two widely used techniques for sparse support recovery in multiple measurement vector (MMV) and block sparse (BS) models respectively. For optimal performance, both SOMP and BOMP require a priori knowledge of signal sparsity or noise variance. However, sparsity and noise variance are unavailable in most practical applications. This letter presents a novel technique called generalized residual ratio thresholding (GRRT) for operating SOMP and BOMP without the a priori knowledge of signal sparsity and noise variance and derive finite sample and finite signal to noise ratio (SNR) guarantees for exact support recovery. Numerical simulations indicate that GRRT performs similar to BOMP and SOMP with a priori knowledge of signal and noise statistics.

Index Terms: Compressive sensing, sparse recovery, orthogonal matching pursuit, block sparsity.

## I Introduction

This letter considers the recovery1 of high dimensional structured sparse signals from low dimensional linear measurements, a problem relevant in many signal processing and machine learning applications[1, 2, 3, 4]. This letter considers two structured sparse recovery scenarios given by a) multiple measurement vector (MMV) model and b) block sparse (BS) model. In MMV, we consider a linear model given by

 Y=XB+W, (1)

where is a matrix of noisy observations, is a fully known over-complete/under-determined design matrix where the dimensions of measurements is far lesser than the number of covariates/features (i.e., ) and represent a error/noise matrix which is assumed to be identically and independently (i.i.d) distributed as . In BS, we consider a regression model with . However, the entries of are divided into non-overlapping blocks of equal size such that the entries in each block are zero or nonzero simultaneously. The block contains the entries in indexed by . We consider the case of sparse which means that the support of in MMV scenario given by satisfies and the block support in BS scenario given by satisfies . The actual support in BS model is given by . One can consider MMV model as a BS model with , and .

A number of algorithms including versions of orthogonal matching pursuit (OMP) like simultaneous OMP (SOMP)[5, 6, 7, 8, 9] and block OMP (BOMP)[10, 11, 12, 13], minimization [14, 15, 16], sparse Bayesian learning (SBL)[17, 18] and sparse iterative covariance estimation (SPICE) [19, 20] etc. are proposed to solve MMV and BS problems. Algorithms based on OMP has received special attention in literature mainly because of it’s computational simplicity and analytical tractability. However, a recurring problem with BOMP, SOMP etc. (similar to many other signal processing problems like [21, 22]) is the requirement of a priori knowledge of signal sparsity ( or ) or ambient noise variance . Both sparsity and noise variance are rarely known a priori in practical applications. Recently, a technique called residual ratio thresholding (RRT) is shown to operate OMP for unstructured sparse recovery in single measurement vector (SMV) models (i.e., and ) with finite sample performance guarantees[23, 24, 25]. This concept is also applied recently to solve sparse robust regression problems[26]. Nevertheless, RRT in it’s current avatar is not directly applicable to solve structured sparse recovery problems like BS or MMV.

This article proposes a generalized version of RRT called GRRT which can operate SOMP and BOMP in a signal and noise statistics agnostic fashion with finite sample support recovery guarantees. Existing RRT based formulations [23, 25, 26] can be expressed as special cases of GRRT. Both numerical simulations and analytical results indicate that operating SOMP and BOMP using GRRT requires only slightly higher signal to noise ratio (SNR) compared to SOMP and BOMP with a priori knowledge of , or . To the best of our knowledge, these are the first schemes for the signal and noise statistics oblivious operation of SOMP and BOMP with finite sample and finite SNR performance guarantees.

## Ii SOMP and BOMP algorithms

Both SOMP and BOMP generate a support estimate sequence indexed by satisfying the following properties P1)-P2).
P1). for all and .
P2).Set difference satisfies . ( for SOMP).

The set difference which represent the new indices added to the previous support estimate is generated as follows. For each , let denotes the projection matrix onto the subspace spanned by and let denotes the residual corresponding to the support . For MMV, is given by . For BS, , where . For both MMV and BS, the choice of norm is the most popular. The final support estimate given by , where is determined by a user specified stopping condition.

The choice of stopping condition is very important in SOMP and BOMP. When is known a priori, one can choose (for MMV, ). When is known a priori, one can choose . Since ( for BS) when , it is common to choose in Gaussian noise [27]. A number of support recovery guarantees (i.e., conditions under which ) for BOMP[10, 11, 12, 13] and SOMP [5, 6, 7, 8, 9] are derived in literature. Restricted isometry constant (RIC)[28] of order denoted by is defined as the smallest such that

 (1−δ)∥b∥22≤∥Xb∥22≤(1+δ)∥b∥22 (2)

for all sparse . Similarly, block RIC (BRIC) of order denoted by is defined as the smallest such that for all block-sparse with a block length [11]. Under the RIC and BRIC constraints discussed in TABLE I, it is known that once in BS [11] and once in MMV[9]. These results also implies that the stopping rules and satisfies in any noise once , whereas, and ensures in Gaussian noise once .

Note that for BOMP and SOMP to work with these impressive support recovery guarantees, it is essential to know signal statistics (, ) or noise statistics (, ) a priori. Unfortunately, these quantities are unavailable in most practical applications and are extremely difficult to estimate with low complexity and finite sample guarantees. This limits the application of SOMP and BOMP in many practical problems. In the next section, we develop the GRRT algorithm which can estimate from the sequence without the a priori knowledge of , , or .

## Iii Generalized residual ratio thresholding

In this section, we explain the proposed GRRT technique for operating BOMP algorithm. This can be easily extended to SOMP by considering that MMV is special case of BS with and (which means ). GRRT propose to run BOMP for iterations and tries to identify the true support from the sequence . Note that is present in the sequence once . This choice of is motivated by the fact that maximum sparsity than can be recovered by any sparse recovery algorithm is limited to [28]. Unlike the residual norm based stopping rules which stops BOMP and SOMP iterations once or , the proposed GRRT statistic is based on the behaviour of residual ratio statistic given by . Since the support sequence , residual norms satisfy . Consequently, for each .

Next we define the minimal superset associated with support sequence as , where , i.e., the smallest support estimate in the support estimate sequence that covers the true support .

###### Lemma 1.

Minimal superset satisfies the following properties. 1). and are both unobservable R.V.
2). for all and for all .
3). Since the support sequence has cardinality and has cardinality , .
4). and once and . Since as , .

###### Proof.

2) and 3) follow from the definition of minimal superset and properties P1)-P2). 4) follows from the support recovery guarantees for SOMP and BOMP. ∎

The behaviour of as a function of significantly depends on whether , or as discussed next. The signal component in the measurement is given by . Since for , , the signal component in the residual is non zero. Since for , the signal component in the residual vanishes, i.e., . Consequently, the residual at satisfies

 RR(kmin)=∥(In−Pkmin)W∥F∥SCk+(In−Pkmin−1)W∥F (3)

Clearly, as , , (Lemma 1), and . A mathematically rigorous derivation of this result can be derived using similar results in [23] for the unstructured (, ) scenario. Since in for , given by

 RR(k)=∥(In−Pk)W∥F∥(In−Pk−1)W∥F (4)

is bounded away from zero even when . One can derive a more explicit lower bound on for when the noise is Gaussian distributed.

###### Theorem 1.

Consider any sequence of support estimate satisfying properties P1)-P2). Also assume that at step , there exists possibilities for the new entries . Define for . Then for all ,

 P(RR(k)>ΓαGRRT(k),∀kmin
###### Proof.

Please see Appendix A. ∎

Theorem 1 implies that for is not just bounded away from zero, but also lower bounded by a positive sequence with a probability . Further, Theorem 1 is valid at all .

###### Remark 1.

The parameter is problem specific. For the model order selection (MOS) problem in [25], is of the form for all . Hence, is the only possibility for and consequently . OMP and SOMP can add any from the set to , i.e., . BOMP can select any new block from to , i.e., .

###### Remark 2.

RRT results for OMP in [23] can be obtained by setting , and . RRT for MOS[25] can be obtained from Theorem 1 by setting , and . Hence, Theorem 1 is a generalization of existing results on residual ratios. The bounds of BOMP and SOMP can be obtained by setting (, ) and respectively.

Next we use the derived properties of to develop the proposed GRRT technique to estimate from the sequence produced by SOMP and BOMP.

### Iii-a GRRT and exact support recovery guarantees

From Theorem 1, we have seen that for is lower bounded by with a high probability (for small values of ). At the same time, converges to zero and converges to as . Hence, at high SNR, and for with a high probability . Consequently, the support estimate , where

 kGRRT=max{k:RR(k)<ΓαGRRT(k)} (6)

will be equal to the true support with probability at high SNR. This is the GRRT algorithm proposed in this letter. Please note that this idea is exactly similar to that of RRT in [23, 25, 24, 26] except that the scope of RRT is now extended to include BS and MMV scenarios through the generalized lower bound on the residual ratios in Theorem 1.

Note that GRRT involves a hyper parameter . The choice of this hyperparameter is explained next using the support recovery guarantees for operating SOMP and BOMP using GRRT derived in this letter.

###### Theorem 2.

1.) GRRT can exactly recover the support of any sparse matrix in MMV scenario with probability if and .
2.) GRRT can exactly recover the support of any block sparse vector in BS scenario with probability if and . The values of and are given in TABLE I.
3.) For both SOMP and BOMP, .

###### Proof.

Please see Appendix B. ∎

By Theorem 2, GRRT can recover the correct support once the noise power is slightly lower than that required for SOMP and BOMP with a priori knowledge of , or . Using the monotonicity properties of Beta CDF[23], one can see that this difference in the tolerable noise power can be reduced by increasing which will increase and decrease . However, this will decrease the probability of support recovery given by . Extensive numerical simulations in Section \@slowromancapiv@ indicate that a choice of or deliver good performance in terms of estimating , whereas, is more appropriate for support recovery applications. This choice is universal in the sense that user is not required to choose a value of in each problem using cross validation or subjective tuning. Since the hyper parameter in GRRT is also an upper bound on high SNR support recovery error, a lower value of for support recovery problems is justified.

## Iv Numerical Simulations

In this section, we numerically compare the performance of GRRT with respect to SOMP/BOMP with a priori knowledge of , and . In Fig.1 and Fig.2, and represent the performance of BOMP/SOMP with and aware stopping rules discussed in Section \@slowromancapii@, whereas, represent the performance of GRRT with hyperparameter . The matrix we consider is the widely studied concatenation of and a Hadamard matrix with columns normalized to have unit length[29]. We set and . For BS, we set . For MMV, is sampled randomly from the set . For BS, is sampled randomly from the set . In both cases, the non zero entries in are randomly assigned . We evaluate the algorithms in terms of mean squared error and support recovery error . Both MSE and PE are estimated after iterations. in this model is given by in BS and in MMV scenarios.

From Fig.1-2, it is clear that the MSE of GRRT closely matches the MSE of BOMP/SOMP with a priori knowledge of at all SNR, whereas, the PE of GRRT closely matches BOMP/SOMP with a priori knowledge of in the low to medium SNR regime. However, at high SNR the PE of GRRT floors such that GRRT with has a higher PE compared to GRRT with . In contrast, PE of GRRT with a priori knowledge of does not exhibit flooring. Note that in both cases, PE at high SNR satisfies as stated in Theorem 2. Similar performance results were also obtained with a random design matrix and different values of , , etc. Further, these results are similar to the results in [23, 24] where RRT was shown to achieve a performance similar to OMP with a priori knowledge of or when and .

## V Conclusions

In this letter, we presented a generalized version of RRT principle applicable to MMV and BS scenarios and use it to operate SOMP and BOMP without signal and noise statistics. The proposed algorithm is also shown analytically and numerically to deliver a performance close to that of SOMP/BOMP with a priori knowledge of signal and noise statistics.

## Appendix A: Proof of Theorem 1

###### Proof.

The proof of Theorem 1 mirrors closely the proof of Theorem 2 in [23] and Theorem 2 in [26]. When the noise is Gaussian and the support estimate is deterministic, is a projection matrix of rank and is a projection matrix of rank . This implies that and . Note that is a central chi square random variable with degrees of freedom[25]. Substituting this result in Equations 6)-9) in the proof of Theorem 2 of [23] along with the fact that there exists possibilities for given gives Theorem 1.∎

## Appendix B: Proof of Theorem 2

###### Proof.

will be equal to if three events , and occur simultaneously. ensures that true support is present in the sequence and it is indexed by . ensures that , whereas, ensures that . Hence, ensures that and . Consequently, .

We first prove the case of SOMP. Note that for MMV. is true once . Next we consider a regime where . Since and , . Following the proof of Theorem 1 in [9], we have for . Hence,

 RR(kmin)≤∥W∥F√1−δk0+1BSOMPmin−∥W∥F, (7)

once . Hence, , i.e., is satisfied once

 ∥W∥F√1−δk0+1BSOMPmin−∥W∥F<ΓαGRRT(k0). (8)

This is true once . This means that is true once . Since , it follows that , once . Since for all by Theorem 1, it follows that once .

The proof of BOMP is similar to that of SOMP except that and using the lower bound for from the proof of Theorem 1 in [10]. Statement 3 follows from the fact that for . ∎

### Footnotes

1. The following notations are used. denotes the entry of a matrix . and denote the columns and rows of matrix indexed by . , and represent the transpose, inverse and Moore-Penrose pseudo inverse of respectively. is the identity matrix, is the dimensional zero vector. is the Frobenius norm of . denotes the norm of vector . denotes the probability of an event . denotes expectation. represents a Gaussian random variable (R.V) with mean and variance . means that is a Beta R.V with parameters and . for is the cumulative distribution function (CDF) of a Beta R.V and is the inverse CDF. For any , denotes the set . For any , . denotes convergence in probability. denotes cardinality of a set.

### References

1. I. Fedorov, R. Giri, B. D. Rao, and T. Q. Nguyen, “Robust Bayesian method for simultaneous block sparse signal recovery with applications to face recognition,” in Proc. ICIP.   IEEE, 2016, pp. 3872–3876.
2. N. Rajamohan, A. Joshi, and A. P. Kannu, “Joint block sparse signal recovery problem and applications in LTE cell search,” IEEE Trans. Veh. Technol., vol. 66, no. 2, pp. 1130–1143, 2016.
3. J. Yang, A. Bouzerdoum, F. H. C. Tivive, and M. G. Amin, “Multiple-measurement vector model and its application to through-the-wall radar imaging,” in Proc. ICASSP.   IEEE, 2011, pp. 2672–2675.
4. G. Tzagkarakis, D. Milioris, and P. Tsakalides, “Multiple-measurement Bayesian compressed sensing using GSM priors for DOA estimation,” in Proc. ICASSP.   IEEE, 2010, pp. 2610–2613.
5. J.-F. Determe, J. Louveaux, L. Jacques, and F. Horlin, “On the exact recovery condition of simultaneous orthogonal matching pursuit,” IEEE Signal Process. Lett., vol. 23, no. 1, pp. 164–168, 2015.
6. ——, “On the noise robustness of simultaneous orthogonal matching pursuit,” IEEE Trans. Signal Process., vol. 65, no. 4, pp. 864–875, 2016.
7. ——, “Improving the correlation lower bound for simultaneous orthogonal matching pursuit,” IEEE Signal Process Lett., vol. 23, no. 11, pp. 1642–1646, 2016.
8. J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Simultaneous sparse approximation via greedy pursuit,” in Proc. ICASSP., vol. 5.   IEEE, 2005, pp. v–721.
9. H. Li, L. Wang, X. Zhan, and D. K. Jain, “On the fundamental limit of orthogonal matching pursuit for multiple measurement vector,” IEEE Access, vol. 7, pp. 48 860–48 866, 2019.
10. J. Wen, H. Chen, and Z. Zhou, “An optimal condition for the block orthogonal matching pursuit algorithm,” IEEE Access, vol. 6, pp. 38 179–38 185, 2018.
11. H. Li and J. Wen, “A new analysis for support recovery with block orthogonal matching pursuit,” IEEE Signal Process. Lett., vol. 26, no. 2, pp. 247–251, 2018.
12. Y. Shi, L. Wang, and R. Luo, “Sparse recovery with block multiple measurement vectors algorithm,” IEEE Access, vol. 7, pp. 9470–9475, 2019.
13. Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals: Uncertainty relations and efficient recovery,” IEEE Trans. Signal Process., vol. 58, no. 6, pp. 3042–3054, 2010.
14. P. Pal and P. Vaidyanathan, “Pushing the limits of sparse support recovery using correlation information,” IEEE Trans. Signal Process., vol. 63, no. 3, pp. 711–726, 2014.
15. X. Lv, G. Bi, and C. Wan, “The group LASSO for stable recovery of block-sparse signal representations,” IEEE Trans. Signal Process, vol. 59, no. 4, pp. 1371–1382, 2011.
16. F. Bunea, J. Lederer, and Y. She, “The group square-root LASSO: Theoretical properties and fast algorithms,” IEEE Trans. Info. Theory, vol. 60, no. 2, pp. 1313–1325, 2013.
17. Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning,” IEEE J. Sel. Topics Signal Process, vol. 5, no. 5, pp. 912–926, 2011.
18. ——, “Extension of SBL algorithms for the recovery of block sparse signals with intra-block correlation,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 2009–2015, 2013.
19. J. Swärd, S. I. Adalbjörnsson, and A. Jakobsson, “Generalized sparse covariance-based estimation,” Signal Processing, vol. 143, pp. 311–319, 2018.
20. P. Stoica, P. Babu, and J. Li, “SPICE: A sparse covariance-based estimation method for array processing,” IEEE Trans. Signal Process., vol. 59, no. 2, pp. 629–638, Feb 2011.
21. V. Menon and S. Kalyani, “Structured and unstructured outlier identification for robust PCA: A fast parameter free algorithm,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2439–2452, May 2019.
22. V. Menon, S. Kalyani et al., “Subspace clustering without knowing the number of clusters: A parameter free approach,” arXiv preprint arXiv:1909.04406, 2019.
23. S. Kallummil and S. Kalyani, “Signal and noise statistics oblivious orthogonal matching pursuit,” in Proc. ICML, 2018, pp. 2434–2443.
24. ——, “High snr consistent compressive sensing without signal and noise statistics,” Signal Processing, p. 107335, 2019.
25. ——, “Residual ratio thresholding for linear model order selection,” IEEE Trans. Signal Process., vol. 67, no. 4, pp. 838–853, 2018.
26. ——, “Noise statistics oblivious GARD for robust regression with sparse outliers,” IEEE Trans. Signal Process., vol. 67, no. 2, pp. 383–398, 2018.
27. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with noise,” IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4680–4688, July 2011.
28. Y. C. Eldar and G. Kutyniok, Compressed sensing: Theory and applications.   Cambridge University Press, 2012.
29. M. Elad, Sparse and redundant representations: From theory to applications in signal and image processing.   Springer Science & Business Media, 2010.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters