# Tuning Free Orthogonal Matching Pursuit

## Abstract

Orthogonal matching pursuit (OMP) is a widely used compressive sensing (CS) algorithm for recovering sparse signals in noisy linear regression models. The performance of OMP depends on its stopping criteria (SC). SC for OMP discussed in literature typically assumes knowledge of either the sparsity of the signal to be estimated or noise variance , both of which are unavailable in many practical applications. In this article we develop a modified version of OMP called tuning free OMP or TF-OMP which does not require a SC. TF-OMP is proved to accomplish successful sparse recovery under the usual assumptions on restricted isometry constants (RIC) and mutual coherence of design matrix. TF-OMP is numerically shown to deliver a highly competitive performance in comparison with OMP having *a priori* knowledge of or . Greedy algorithm for robust de-noising (GARD) is an OMP like algorithm proposed for efficient estimation in classical overdetermined linear regression models corrupted by sparse outliers. However, GARD requires the knowledge of inlier noise variance which is difficult to estimate. We also produce a tuning free algorithm (TF-GARD) for efficient estimation in the presence of sparse outliers by extending the operating principle of TF-OMP to GARD. TF-GARD is numerically shown to achieve a performance comparable to that of the existing implementation of GARD.

## 1Introduction

Consider the linear regression model , where is a known design matrix, is the noise vector and is the observation vector. The design matrix is rank deficient in the sense that . Further, the columns of are normalised to have unit Euclidean norm. The vector is sparse, i.e., the support of given by has cardinality . The noise vector is assumed to have Gaussian distribution with mean and covariance , i.e., . The signal to noise ratio in this regression model is defined as

Throughout this paper, represents the expectation operator and represents the norm of . In this article we consider the following two problems in the context of recovering sparse vectors in underdetermined linear regression models which are of larger interest.

P1). Estimate with the objective of minimizing the mean squared error (MSE) .

P2). Estimate the support of with the objective of minimizing the probability of support recovery error , where .

These problems are common in signal processing applications like sparse channel estimation[1], direction of arrival estimation[2], multi user detection[3] etc. Typical machine learning applications include sparse subspace clustering[4], sparse representation classification[5] etc. In signal processing community these problems are discussed under the compressive sensing (CS) paradigm[6]. A number of algorithms like least absolute shrinkage and selection operator (LASSO)[7], Dantzig selector (DS)[9], subspace pursuit (SP)[10], compressive sampling matching pursuit (CoSaMP)[11], sparse Bayesian learning (SBL)[12], orthogonal matching pursuit (OMP)[13] etc. are proposed to solve the above mentioned problems. However, for optimal performance of these algorithms, a number of tuning parameters (also called hyper parameters) need to be fixed. For example, the value of in LASSO estimate

has to tuned appropriately. Indeed, when the noise is Gaussian a value is known to be optimal in terms of MSE performance[8]. Likewise, a value of with is known to deliver as under some regularity conditions[20]. Likewise, for the optimal performance of DS, one need to have knowledge of [9]. However, unlike the case of overdetermined linear regression models where one can readily estimate using the maximum likelihood (ML) estimator, estimating in underdetermined linear regression models is extremely difficult[21]. This means that the optimal performance using LASSO and DS in many practical applications involving Gaussian noise^{1}*a priori* knowledge of sparsity level which is rarely available. OMP, which is the focus of this article, requires either the knowledge of or the knowledge of for optimal performance. Hence, in many practical applications, the statistician is forced to choose ad hoc values of tuning parameters for which no performance guarantees are available. A popular alternative is based on techniques like cross validation which can deliver reasonably good performance at the expense of significantly high computational complexity[22]. Further, cross validation is also known to be ineffective for support recovery problems[23].

### 1.1Tuning parameter free sparse recovery.

The literature on tuning parameter free sparse recovery procedure is new in comparison with the literature on sparse recovery algorithms like OMP, LASSO, DS etc. A seminal contribution in this field is the square root LASSO[24] algorithm which estimate by

For optimal MSE performance can be set independent of thereby overcoming a major drawback of LASSO. However, the choice of is still subjective with little guidelines. The high SNR behaviour of PE for square root LASSO is not reported in the literature. Another interesting development in this area is the development of sparse iterative covariance-based estimation, popularly called as SPICE[25]. SPICE is a convex optimization based algorithm that is completely devoid of any hyper parameters. The relationship between SPICE and techniques like LAD-LASSO, square root LASSO and LASSO are derived in [26]. Another tuning parameter free algorithm called LIKES which is closely related to SPICE is proposed in [28]. Another interesting contribution in this area is the derivation of analytical properties of the non negative least squares (NNLS) estimator

in [29] which points to the superior performance of NNLS in terms of MSE. However, the NNLS estimate is applicable only to the cases where the sign pattern of is known *a priori*. Existing literature on tuning free sparse recovery has many disadvantages. In particular, all these techniques are computationally complex in comparison with simple algorithms like OMP, CoSaMP etc. Notwithstanding the connections established between algorithms like SPICE and LASSO, the performance guarantees of SPICE are not well established.

### 1.2Robust regression in the presence of sparse outliers.

In addition to the recovery of sparse signals in underdetermined linear regression models (which is the main focus of this article), we also consider a regression model widely popular in robust statistics called sparse outlier model. Here we consider the regression model

where is a full rank design matrix with or , regression vector may or may not be sparse and inlier noise . The outlier noise represents the large errors in the regression equation that are not modelled by the inlier noise distribution. In many cases of practical interest, is modelled as sparse, i.e., . However, the non zero entries in can take large values, i.e., can be potentially high. Algorithms from robust statistics like Hubers’ M-est[30] were used to solve this problem. Recently, a number of algorithms that utilizes the sparse nature of like the convex optimization based [31], SBL based [33], OMP based greedy algorithm for robust de-noising (GARD)[34] etc. are shown to estimate more efficiently than the robust statistics based techniques. Just like the case of sparse regression, algorithms proposed for robust estimation in the presence of sparse outliers also require tuning parameters that are subjective and dependent on inlier noise variance (which is difficult to estimate).

### 1.3 Contribution of this article.

This article makes the following contributions to the CS literature. We propose a novel way of using the popular OMP called tuning free OMP (TF-OMP) which does not require *a priori* knowledge of sparsity level or noise variance and is completely devoid of any tuning parameters. We analytically establish that the TF-OMP can recover the true support in bounded noise () if the matrix satisfy either exact recovery condition (ERC)[13], mutual incoherence condition (MIC) [14] or the restricted isometry condition in [35] and the minimum non zero value is large enough. It is important to note that the conditions imposed on design matrix for successful support recovery using TF-OMP is no more stringent than the results available [14] in literature for OMP with *a priori* knowledge of or noise variance . Under the same set of conditions on matrix , TF-OMP is shown to achieve high SNR consistency[36] in Gaussian noise, i.e., as . This is the first time a tuning free CS algorithm is shown to achieve high SNR consistency. As mentioned before, GARD for estimation in the presence of sparse outliers is closely related to OMP. We extend the operating principle behind TF-OMP to GARD and develop a modified version of GARD called TF-GARD which is devoid of tuning parameters and does not require the knowledge of inlier noise variance . Both proposed algorithms, *viz*. TF-OMP and TF-GARD are numerically shown to achieve highly competitive performance in comparison with a broad class of existing algorithms over a number of experiments.

### 1.4 Notations used.

the column space of . is the transpose and is the Moore-Penrose pseudo inverse of (if has full column rank). is the projection matrix onto . denotes the sub-matrix of formed using the columns indexed by . is the entry of . If is clear from the context, we use the shorthand for . Both and denotes the entries of indexed by . is a central chi square distribution with degrees of freedom (d.o.f). is a complex Gaussian R.V with mean and covariance matrix . implies that and are identically distributed. is the matrix norm. denotes the set . denotes the floor function. represents the null set. For any two index sets and , the set difference . For any index set , denotes the complement of with respect to . iff .

### 1.5 Organization of this article:-

Section 2@ discuss existing literature on OMP. Section 3@ present TF-OMP. Section 4@ presents the performance guarantees for TF-OMP. Section 5@ discuss TF-GARD algorithm. Section 6@ presents the numerical simulations.

## 2OMP: Prior art

The proposed tuning parameter free sparse recovery algorithm is based on OMP. OMP is a greedy procedure to perform sparsity constrained least square minimization. OMP starts with a null model and add columns to current support that is most correlated with the current residual. An algorithmic description of OMP is given in Table 1. The performance of OMP is determined by the properties of the measurement matrix , ambient SNR, sparsity of () and stopping condition (SC). We first describe the properties of that are conducive for sparse recovery using OMP.

Step 1:- Initialize the residual . , |

Support estimate , Iteration counter ; |

Step 2:- Find the column most correlated with the current |

residual , i.e., |

Step 3:- Update support estimate: . |

Step 4:- Estimate using current support: . |

Step 5:- Update residual: . |

Step 6:- Increment . . |

Step 7:- Repeat Steps 2-6, until stopping condition (SC) is met. |

Output:- and . |

### 2.1 Qualifiers for design matrix .

When , the linear equation has infinitely many possible solutions. Hence the support recovery problem is ill-posed even in the noiseless case. To uniquely recover the -sparse vector , the measurement matrix has to satisfy certain well known regularity conditions. A plethora of sufficient conditions including restricted isometry property (RIP)[6], mutual incoherence condition (MIC)[7], exact recovery condition (ERC)[13] etc. are discussed in the literature. We first describe the ERC.

**Definition 1:-** A matrix and a vector with support satisfy ERC if the exact recovery coefficient satisfies .

It is known that ERC is a sufficient and worst case necessary condition for accurately recovering from using OMP[13]. The same condition with appropriate scaling of is sufficient for recovery in regression models with noise[14]. Since ERC involves the unknown support , it is impossible to check ERC in practice. Another important metric used for qualifying is the restricted isometry constant (RIC). RIC of order denoted by is defined as the smallest value of that satisfies

for all -sparse . OMP can recover a sparse signal in the first iterations if [15]. In the absence of noise, OMP can recover a sparse in iterations if [18]. Likewise, it is possible to recover in iterations if [18]. It is well known that the computation of RIC is NP-hard. Hence, mutual coherence, a quantity that can be estimated easily is widely popular. For a matrix with unit norm columns, the mutual coherence is defined as the maximum pair wise column correlation, i.e.,

If , then for all -sparse vector , can be bounded as [13]. Hence, is a sufficient condition for both noiseless and noisy sparse recovery using OMP. It is also shown that is a worst case necessary condition for sparse recovery.

### 2.2 Stopping conditions for OMP

Most of the theoretical properties of OMP are derived assuming either the absence of noise[13] or the knowledge of [15]. In this case OMP iterations are terminated once or . When is not available which is typically the case, one has to rely on stopping conditions based on the properties of residual . For example, OMP can be stopped if [14] or [14]. Likewise, [39] suggested a SC based on the residual difference . The necessary and sufficient conditions for high SNR consistency of OMP with residual based SC is derived in [20]. A generalized likelihood ratio based stopping rule is developed in [40]. In addition to the subjectivity involved in the choice of SC, all the above mentioned SC requires the knowledge of . As explained before, estimating in underdetermined regression models is extremely difficult. In the following, we use the shorthand OMP() for OMP with *a priori* knowledge of and OMP( for OMP with SC based on *a priori* knowledge of . In the next section, we develop TF-OMP, an OMP based procedure which does not require the knowledge of either or for good performance.

## 3Tuning Free orthogonal Matching Pursuit.

In this section, we present the proposed TF-OMP algorithm. This algorithm is based on the statistic , where is the residual in the iteration of OMP. Using the property of projection matrices[36], we have , where is the zero matrix. This implies that . Hence, can be rewritten as

Since the residual norms are non decreasing, i.e., , we always have . This statistic exhibits an interesting behaviour which is the core of our proposed technique, i.e., TF-OMP. Consider running OMP for a number of iterations such that neither the matrices are rank deficient nor the residuals are zero. Then varies in the following manner for .

**Case 1:-)**. **When :-** Then both and contains contributions from signal and noise . Since both numerator and denominator contains noise and signal terms, it is less likely that takes very low values.

**Case 2)**.**When for the first time:-** In this case and . Hence, numerator has contribution only from the noise , whereas, denominator has contributions from both noise and signal . Hence, if signal strength is sufficiently high or noise level is low, will take very low values.

**Case 3:-** **When :-** In this case both and . This means that both numerator and denominator consists only of noise terms and hence the ratio will not take very small value even if noise variance is very low.

To summarize, as SNR improves, the minimal value of for will corresponds to that value of such that for the first time with a very high probability. This point is illustrated in Fig.1 where a typical realization of the quantity is plotted for a matrix signal pair satisfying ERC. The signal has non zero values and . At both SNR=10 dB and SNR=30 dB, the minimum value is attained at which is also the first time . Further, the dip in the value of at becomes more and more pronounced as SNR increases. This motivate the TF-OMP algorithm given in TABLE 2@ which try to estimate by utilizing the sudden dip in .

Input:- Observation , design matrix . |

Step 1:- Run OMP for iterations. |

Step 2:- Estimate . |

Step 3:- Estimate support as . |

Estimate as and . |

Output:- Support estimate and signal estimate . |

We now make the following observations about TF-OMP.

### 3.1Computational complexity of TF-OMP

The computational complexity of TF-OMP with is which is higher than the complexity of OMP(). This is the cost one has to pay for not knowing or *a priori*. However, TF-OMP is computationally much more efficient than either the second order conic programming (SOCP) or cyclic algorithm based implementation of the popular tuning free SPICE algorithm[28]. Even the cyclic algorithm based implementation of SPICE which is claimed to be computationally efficient (in comparison with SOCP) in small and medium sized problems involve multiple iterations and in each iteration it requires the inversion of a matrix ( complexity) and a matrix matrix multiplication of complexity . It is possible to reduce the complexity of TF-OMP by producing upper bounds on that is lower than the used in TF-OMP. Assuming *a priori* knowledge of an upper bound is a significantly weaker assumption than having exact *a priori* knowledge of . If one can produce an upper bound satisfying , then setting in TF-OMP gives the OMP() complexity of .

For situations where the statistician is completely oblivious to , we propose two low complexity versions of TF-OMP, *viz.*, QTF-OMP1 (quasi tuning free OMP) and QTF-OMP2 that uses a value of lower than the used in TF-OMP. QTF-OMP1 uses and QTF-OMP2 uses . QTF-OMP1 is motivated by the fact that the best coherence based guarantee for OMP extends upto and for any matrix satisfies [41]. Hence, QTF-OMP1 uses a value of which is two times higher than the maximum value of that can be covered by the coherence based guarantees available for OMP. Likewise, the best known asymptotic guarantee for OMP states that OMP can recover any sparse signal when if , where is any arbitrary value[19]. Hence, when , the highest value of one can reliably detect using OMP asymptotically is . The value of used in QTF-OMP1 and QTF-OMP2 is twice of the aforementioned maximum detectable values of to add sufficient robustness. The complexity of QTF-OMP1 and QTF-OMP2 are and which is significantly lower than the complexity of TF-OMP. Unlike TF-OMP which is completely tuning free, QTF-OMP1 and QTF-OMP2 involves a subjective choice of (though motivated by theoretical properties). The rest of this article consider TF-OMP only and in Section 6@ we demonstrate that the performance of TF-OMP, QTF-OMP1 and QTF-OMP2 are similar across multiple experiments.

## 4 Analysis of TF-OMP

In this section we will mathematically analyse various factors that will influence the performance of TF-OMP. In particular we discuss the conditions for successful recovery of a -sparse vector in bounded noise . Note that the Gaussian vector is essentially bounded in the sense that . Hence with , this analysis is applicable to Gaussian noise too. For bounded noise, we define the SNR as . We next state and prove a theorem regarding the successful support recovery by TF-OMP in bounded noise. Note that the accurate support recovery automatically translate to a MSE performance equivalent to that of an oracle with *a priori* knowledge of support . Throughout this section, we use to denote the ratio instead of .

Proof:-

The analysis of TF-OMP is based on the fundamental results developed in the [14] and [35] stated next.

### 4.1 A brief review of relevant results from and .

Let and denotes the minimum and maximum eigenvalues of respectively.

A1) shows how to bound the residual norms used in based on and . A2) implies that the first iterations of OMP will be correct if is sufficiently high and ERC is satisfied. We now state conditions similar to A1)-A2) in terms of MIC and RIC.

Since the analysis based on and are more general than MIC or RIC, we explain TF-OMP using and . However, as outlined in B1)-B2) and C1)-C2), this analysis can be easily replaced by and .

### 4.2Sufficient conditions for sparse recovery using TF-OMP

The successful recovery of support of using TF-OMP requires the simultaneous occurrence of the events E1)-E3) given below.

E1). The first iterations are correct, i.e., .

E2). .

E3). .

E1) implies that OMP with *a priori* knowledge of , i.e., OMP( can perform exact sparse recovery, whereas, E2) and E3) implies that TF-OMP will be free from missed and false discoveries respectively. Note that the condition A2) implies that the event E1) occurs as long as , and is below a particular level given by

Next we consider the events E2) and E3) assuming that the noise satisfies , i.e., E1) is true. To establish for , we produce an upper bound on and lower bounds on for and show that the upper bound on is lower than the lower bound on for at high SNR. We first consider the event E2). Since all entries in are selected in the first iterations and hence . Likewise, only one entry in is left out after iterations. Hence, and . Substituting these values in A1) of Lemma 1, we have and . Hence, is bounded by

Next we lower bound for . Note that after appending enough zeros in appropriate locations. Further, , where . Applying triangle inequality to gives the bound

Applying (Equation 3) in gives

for and . The R.H.S of (Equation 4) can be rewritten as

From (Equation 5) it is clear that the R.H.S of (Equation 4) decreases with decreasing . Note that the minimum value of is itself. This leads to an even smaller lower bound on for given by

For E2) to happen it is sufficient that the lower bound on for is larger than the upper bound on , i.e.,

This will happen if , where

In words, whenever , TF-OMP will not have any missed discoveries.

Next we consider the event E3) and assume again that . Since, the first iterations are correct, for . Note that the quantity is independent of the scaling factor . Hence, define the quantity

where is an ordered set representing the indices selected by OMP. By the definition of , . is a random variable depending on the indices which depends on the noise vector . However, influences only through . Since depends on , it is difficult to characterize . TF-OMP stops before deterministically and hence it is true that for each of the possible realization of or equivalently, each possible realization . Further, the set of all possible denoted by is large, but finite. This implies that . This implies that with probability one for all and is independent of . At the same time, the bound on decreases to zero with decreasing . Hence, given by

such that for all whenever . In words, TF-OMP will not make false discoveries whenever . Combining all the required conditions, we can see that TF-OMP will recover the correct support whenever . In words, for any support satisfying ERC, , such that TF-OMP will recover whenever . Hence proved.

We now make some remarks about the performance of TF-OMP.

### 4.3High SNR consistency of TF-OMP in Gaussian noise.

The high SNR consistency of variable selection techniques in Gaussian noise has received considerable attention in signal processing community recently[42]. High SNR consistency is formally defined as follows.

**Definition:-** A support recovery technique is high SNR consistent iff the probability of support recovery error (PE) satisfies .

The following lemma stated and proved in [20] establish the necessary and sufficient condition for the high SNR consistency of OMP and LASSO.

Lemma ? implies that LASSO and OMP with residual based SC are high SNR consistent iff the tuning parameters are adapted according to . In particular, Lemma ? implies that widely used parameters for LASSO like in [8] and OMP SC with are inconsistent at high SNR. In the following theorem, we state and prove the high SNR consistency of TF-OMP. To the best of our knowledge no CS algorithm is shown to achieve high SNR consistency in the absence of knowledge of .

From the analysis of Section 4@-B, we know that TF-OMP recover the correct support whenever , where is a function of and support . Hence, satisfies . Note that . Also the distribution of is independent of . Further, is bounded in probability in the sense that for any sequence . All these implies that

Hence proved.

## 5 Tuning Free Robust Linear Regression in the presence of sparse outliers

Throughout this article we have considered a linear regression model where is a sparse vector and is the noise. In this section we consider a different regression model

where is a full rank design matrix with or , regression vector may or may not be sparse and the inlier noise . The outlier noise represents the large errors in the regression equation that is not modelled using the inlier noise distribution. In addition to SNR, this regression model also require signal to interference ratio (SIR) given by

to quantify the impact of outliers. In many cases of practical interest is modelled as sparse, i.e., . However, can have very large power, i.e., SIR can be very low[31]. A classic example of this is channel estimation in OFDM systems in the presence of narrow band interference[43]. In spite of the full rank of , traditional least squares (LS) estimate of given by is highly inefficient in terms of MSE. An algorithm called greedy algorithm for robust de-noising (GARD)[34] which is very closely related to the OMP algorithm discussed in this paper was proposed in [34] for such scenarios. An algorithmic description of GARD is given in Table 2.

Input:- Observed vector , Design Matrix and SC. |

Initialization:- , . k=1. |

Repeat Steps 1-4 until SC is met. |

Step 1:- Identify the strongest residual in , i.e., |

. |

Step 2:- Update the matrix . |

Step 3:- Jointly estimate and as |

Step 4:- Update the residual . |

. |

Output:- Signal estimate . |

GARD can be considered as applying OMP to identify the significant entries in after nullifying the signal component in the regression equation by projecting onto a subspace orthogonal to the column span of . Just like OMP, the key component in GARD is the SC. One can stop GARD when (which is unknown *a priori*) iterations are performed or when the residual falls below a predefined threshold. However, setting the threshold requires the knowledge of . We use the shorthand GARD() and GARD() to represent these schemes. However, producing high quality estimate of in the presence of outliers is also a difficult task. Further, there exists a level of subjectivity in the choice of this threshold even if is known. A better strategy would be to produce a version of GARD free of any tuning parameters.

The principle developed for TF-OMP can be used in GARD also. To explain this, consider the statistic and let be the first iteration at which . For all , contains contributions from the outlier , whereas for all , has contributions from noise only. Hence, if entries in are sufficiently large in comparison with noise level , just like in the case of OMP, experience a sudden dip at . The algorithm given in Table 3 identify this dip and deliver high quality estimate of without having any tuning parameter.

Input:- Observed vector , Design matrix X |

Step 1:- Run GARD for iterations. |

Step 2:- Identify as . |

Step 3:- Jointly estimate and as |

Output:- Signal estimate . |

A detailed analysis of TF-GARD is not given in this article. However, following the similarities between OMP and GARD, we conjecture that the TF-GARD recover the support of under the same set of conditions used in [34] albeit at a higher SNR than GARD itself. Numerical simulations indicate that the performance of TF-GARD is highly competitive with the performance of GARD() and GARD() over a wide range of SNR and SIR.

## 6Numerical Simulations

In this section, we numerically evaluate the performance of techniques proposed in this paper *viz*, TF-OMP and TF-GARD and provide insights into the strengths and shortcomings of the same. First we consider the case of TF-OMP. We compare the performance of TF-OMP with that of OMP(), OMP, LASSO[8] and SPICE[25]. Among these, LASSO and OMP are provided with noise variance . OMP() stop iterations when [14]. LASSO in (Equation 1) uses the value proposed in [8]. To remove the bias in LASSO estimate, we re-estimate the non zero entries in LASSO estimate using LS. As mentioned before, SPICE is a tuning free algorithm. We implement SPICE using the cyclic algorithm proposed in [28]. The iterations in cyclic algorithm is terminated once the difference in the norm of quantity in successive iterations are dropped below . As observed in [27], SPICE results in biased estimate. To de-bias the SPICE estimate, we collect the coefficients in the SPICE estimate that comprises of the energy and re-estimate these entries using LS. This estimate denoted by SPICE() in figures exhibits highly competitive performance. All results except the PE vs SNR plot in Figure 2 and the symbol error rate (SER) vs SNR plot in Figure 7 are presented after iterations. These two plots were produced after performing iterations. Unless explicitly stated, the non zero entries of are fixed at and the locations of these non zero entries are randomly permuted in each iteration.

### 6.1 Small sample performance of TF-OMP.

In this section, we evaluate the performance of algorithms when the problem dimensions are small. For this, we consider a matrix of the form , where is the Hadamard matrix. It is well known that the mutual coherence of this matrix is given by [41]. Hence, satisfies the mutual incoherence property whenever . In our experiments we fix and . Note that by construction. For this particular , MIC and ERC are satisfied.

From Figure 2, it can be observed that the performance of all algorithms under consideration are equivalent at high SNR in terms of MSE. At low SNR, OMP() has the best performance. The performance of TF-OMP is slightly inferior to OMP() at low SNR, whereas it matches OMP() and LASSO across the entire SNR range. TF-OMP is performing better in comparison with both versions of SPICE. This is important considering the fact that SPICE is also a tuning free algorithm. In terms of support recovery error, OMP() has the best performance followed closely by TF-OMP. Both LASSO and OMP() are inconsistent at high SNR as proved in [20], whereas TF-OMP is high SNR consistent. This validates Theorems 1-2 in Section 4@. Note that the SPICE estimate contains a number of very small entries which is an artefact of termination criteria. Identifying significant entries from this estimate in the absence of knowledge of and is difficult and is subjective in nature. We have used a energy criteria to perform this task. However, unlike the MSE performance, we have observed that the of SPICE( depends crucially on . We choose percent mainly because it gave a very good MSE performance. As one can observe from Figure 2, SPICE is also high SNR consistent. However, with a different choice of one can possibly improve the performance. OMP based algorithms being step wise in nature will not have this problem.

### 6.2 Large sample performance of TF-OMP.

In this section, we evaluate the performance of algorithms

a).When both and are fixed and is increasing and

b).When is fixed and both and are increasing.

The matrix for this purpose is generated by sampling *i.i.d* from a distribution. Later the columns of are normalised to unit norm. For the fixed sparsity and increasing case, all algorithms under consideration except SPICE achieves similar performance. As the number of samples increase, the MSE improves for all algorithms. In the second case, the sparsity is increased linearly with . From the R.H.S of Figure 3 one can observe that the performance of OMP(), LASSO and TF-OMP matches across the ratio under consideration. In particular TF-OMP outperforms both SPICE() and OMP().

### 6.3 Performance of TF-OMP in signals with high .

The analysis in Section 4@ pointed to a deteriorated performance of TF-OMP when is large. In this section we evaluate this performance degradation numerically. The matrix under consideration is same the matrix used in Section 6@.A. The sparsity is fixed at . However, the magnitude of non zero entries of are and the signs are random as before. As the value of decreases the variation increases. and