A Fast NonGaussian Bayesian Matching Pursuit Method for Sparse Reconstruction
Abstract
A fast matching pursuit method using a Bayesian approach is introduced for sparse signal recovery. This method, referred to as nGpFBMP, performs Bayesian estimates of sparse signals even when the signal prior is nonGaussian or unknown. It is agnostic on signal statistics and utilizes a priori statistics of additive noise and the sparsity rate of the signal, which are shown to be easily estimated from data if not available. nGpFBMP utilizes a greedy approach and orderrecursive updates of its metrics to find the most dominant sparse supports to determine the approximate minimum mean square error (MMSE) estimate of the sparse signal. Simulation results demonstrate the power and robustness of our proposed estimator.
MyFramelinecolor=black, outerlinewidth=1pt, roundcorner=10pt, innertopmargin=nerbottommargin=nerrightmargin=20pt, innerleftmargin=20pt, backgroundcolor=gray!5!white
I Introduction
Sparsity is a feature that is abundant in both natural and manmade signals. Some examples of sparse signals include those from speech, images, videos, sensor arrays (e.g., temperature and light sensors), seismic activity, galactic activities, biometric activity, radiology, and frequency hopping. Given the vast existence of signals, their sparsity is an attractive property because the exploitation of this sparsity may be useful in the development of simple signal processing systems. Some examples of systems in which a priori knowledge of signal sparsity is utilized include motion estimation [1], magnetic resonance imaging (MRI) [2], impulse noise estimation and cancellation in DSL [3], network tomography[4], and peaktoaveragepower ratio reduction in OFDM [5]. All of these systems are based on sparsityaware estimators such as Lasso [6], basis pursuit [7], structurebased estimator [8], fast Bayesian matching pursuit [9], and estimators related to the relatively new area of compressed sensing [10, 11, 12].
Compressed sensing (CS), otherwise known as compressive sampling, has found many applications in the fields of communications, image processing, medical imaging, and networking. CS algorithms have been shown to recover sparse signals from underdetermined systems of equations that take the form
(1) 
where , and are the unknown sparse signal and the observed signal, respectively. Furthermore, is the measurement matrix and is the additive Gaussian noise vector. Here, the number of unknown elements, , is much larger than the number of observations, . CS uses linear projections of sparse signals that preserve structure of signals; furthermore, these projections are used to reconstruct the sparse signal using optimization with high probability.
(2) 
optimization is a convex optimization problem that conveniently reduces to a linear program known as basis pursuit, which has the computational complexity of . Since, usually, is large, such an approach rapidly becomes unrealistic. To tackle this problem, some efficient alternatives such as orthogonal matching pursuit (OMP) [13] and the algorithm proposed by Haupt et al. [14] have been proposed. These algorithms fall into the category of greedy algorithms that are relatively faster than basis pursuit. However, an inherent problem in these systems is that the only a priori information utilized is the sparsity information.
Another category of methods based on the Bayesian approach considers complete a priori statistical information of sparse signals. A method called fast Bayesian matching pursuit (FBMP) [9], adopts such an approach and assumes a Gaussian prior on the unknown sparse vector, . This method performs sparse signal estimation via model selection and model averaging. The sparse vector is described as a mixture of several components, the selection of which is based on successive model expansion. FBMP obtains an approximate MMSE estimate of the unknown signal with high accuracy and low complexity. It was shown to outperform several sparse recovery algorithms, including OMP [13], StOMP [15], GPSRBasic [16], Sparse Bayes [17], BCS [18] and a variationalBayes implementation of BCS [19]. However, there are several drawbacks associated with this method. The assumption that the signal prior is Gaussian is not realistic, because, in most realworld scenarios, the signal distribution is not Gaussian, or it is unknown. In addition, its performance is dependent on the knowledge of signal statistics, which are usually difficult to compute. Although, a signal statistics estimation process is proposed, it is dependent on knowledge of the initial estimates of these signal parameters. The estimation process, in turn, has a negative impact on the complexity of the method.
Another popular Bayesian method proposed by Larsson and Selén [20] computes the MMSE estimate of the unknown vector, . Its approach is similar to that of FBMP in the sense that the sparse vector is described as a mixture of several components that are selected based on successive model reduction. It also requires knowledge of the noise and signal statistics. However, it was found that the MMSE estimate is insensitive to the a priori parameters and therefore an empiricalBayesian variant that does not require any a priori knowledge of the data was devised.
The Bayesian approaches mentioned above work successfully only for Gaussian priors. It is reasonable to assume that any additive noise, generated at the sensing end, is Gaussian. However, assuming the signal statistics to be Gaussian is inadequate, because the actual situation is not captured. Moreover, in situations where the assumption of a Gaussian prior is valid, the parameters of that prior (mean and covariance) need to be estimated, which is challenging, especially when the signal statistics are not i.i.d. In that respect, one can appreciate convex relaxation approaches that are agnostic with regard to signal statistics.
In this paper, we pursue a Bayesian approach for sparse signal reconstruction that combines the advantages of the two approaches. On one hand, the approach is Bayesian, acknowledging the noise statistics and the signal sparsity rate, while on the other hand, the approach is agnostic on the signal amplitude statistics. The approach can bootstrap itself and estimate the required statistics (sparsity rate and noise variance) when they are unknown. The algorithm is implemented in a greedy manner and pursues an orderrecursive approach, helping it to enjoy low complexity. Specifically, the advantages of our approach are as follows

The Bayesian estimate of the sparse signal is performed even when the signal prior is nonGaussian or unknown.

Signals whose statistics are unknown are dealt with. In fact, contrary to other methods, these statistics need not be estimated, which is particularly useful when the signal statistics are not i.i.d. Therefore, it is agnostic with regard to variations in signal statistics.

The a priori statistics of the additive noise and the sparsity rate of the signal, which can be easily estimated from the data if not available, are utilized.

The greedy nature of the approach and the orderrecursive update of its metrics, make it simple.
The remainder of this paper is organized as follows. In Section II, we formulate the problem and present the MMSE setup in the nonGaussian/unknown statistics case. In Section III, we describe our greedy algorithm that is able to obtain the approximate MMSE estimate of the sparse vector. Section IV demonstrates how the greedy algorithm can be made faster by calculating various metrics in a recursive manner. This is followed by Section V, which describes our hyperparameter estimation process. In Section VI, we present our simulation results and in Section VII, we conclude the paper.
Ia Notation
We denote scalars with smallcase letters (e.g., ), vectors with smallcase, boldface letters (e.g., ), matrices with uppercase, boldface letters (e.g., ), and we reserve calligraphic notation (e.g., ) for sets. We use to denote the column of matrix , to denote the entry of vector , and to denote a subset of a set . We also use to denote the submatrix formed by the columns , indexed by set . Finally, we use , , , and to respectively denote the estimate, conjugate, transpose, and conjugate transpose of the vector .
Ii Problem Formulation and MMSE Setup
Iia The Signal Model
The analysis in this paper considers obtaining an sparse vector, , from an observations vector, . These observations obey the linear regression model
(3) 
where is a known regression matrix and is the additive Gaussian noise vector.
We shall assume that has a sparse structure and is modeled as where indicates Hadamard (elementbyelement) multiplication. The vector consists of elements that are drawn from some unknown distribution and the entries of are drawn i.i.d. from a Bernoulli distribution with success probability . We observe that the sparsity of vector is controlled by and, therefore, we call it the sparsity parameter/rate. Typically, in Bayesian estimation, the signal entries are assumed to be drawn from a Gaussian distribution but here we would like to emphasize that the distribution and color of the entries of do not matter.^{1}^{1}1The distribution may be unknown or known with unknown parameters or even Gaussian. Our developments are agnostic with regard to signal statistics.
IiB MMSE Estimation of
To determine , we compute the MMSE estimate of given observation . This estimate is formally defined by
(4) 
where the sum is executed over all possible support sets of . In the following, we explain how the expectation , the posterior and the sum in (4) can be evaluated.
Given the support , (3) becomes
(5) 
where is a matrix formed by selecting columns of indexed by support . Similarly, is formed by selecting entries of indexed by . Since the distribution of is unknown, the best we can do is to use the best linear unbiased estimate (BLUE) as an estimate of , i.e.,
(6) 
The posterior in (4) can be written using the Bayes rule as
(7) 
The probability, , is a factor common to all posterior probabilities that appear in 3 and hence can be ignored. Since the elements of are activated according to the Bernoulli distribution with success probability , we have
(8) 
It remains to evaluate the likelihood . If is Gaussian, would also be Gaussian and that is easy to evaluate. On the other hand, when the distribution of is unknown or even when it is known but nonGaussian, determining is in general very difficult. To go around this, we note that is formed by a vector in the subspace spanned by the columns of plus a Gaussian noise vector, . This motivates us to eliminate the nonGaussian component by projecting onto the orthogonal complement space of . This is done by multiplying by the projection matrix . This leaves us with , which is Gaussian with a zero mean and covariance
(9) 
where is the noise covariance matrix. Thus,
(10) 
Dropping the preexponential factor yields
(11) 
In the white noise case, , which we will focus on for the remainder of the paper, we have
(12) 
Substituting (8) and (12) into (7) finally yields an expression for the posterior probability. In this way, we have all the ingredients to compute the sum in (4). Computing this sum is a challenging task when is large because the number of support sets can be extremely large and the computational complexity can become unrealistic. To have a computationally feasible solution, this sum can be computed over a few support sets corresponding to significant posteriors. Let be the set of supports for which the posteriors are significant. Hence, we arrive at the following approximation to the MMSE estimate
(13) 
In the next section, we propose a greedy algorithm to find . Before proceeding, for ease of representation and convenience, we represent the posteriors in the log domain. In this regard, we define a dominant support selection metric, , to be used by the greedy algorithm as follows:
(14) 
Iii A Greedy Algorithm
We now present a greedy algorithm to determine the set of dominant supports, , required to evaluate (13). We search for the optimal support in a greedy manner. We first start by finding the best support of size 1, which involves evaluating for , i.e., a total of search points. Let be the optimal support. Now, we look for the optimal support of size 2, which involves a search of size . To reduce the search space, we pursue a greedy approach and look for the point such that maximizes . This involves search points (as opposed to the optimal search over points). We continue in this manner by forming and searching for in the remaining points and so on until we reach . The value of is selected to be slightly larger than the expected number of nonzero elements in the constructed signal such that is sufficiently small^{2}^{2}2, i.e., support of the constructed signal, follows the binomial distribution , which can be approximated by the Gaussian distribution if . For this case, .. A formal algorithmic description is presented in Table I and an example run of this algorithm for and is presented in Fig. 1.
One point to note here is that in our greedy move from to , we need to evaluate around times, which can be done in an orderrecursive manner starting from that of . Specifically, we note that every expansion, , from requires a calculation of from (14). This translates to appending a column, , to in the calculations of (14), which can be done in an orderrecursive manner. We summarize these calculations in Section IV. This orderrecursive approach reduces the calculation in each search step to an order of operations down from in the direct approach. Therefore, the complexity we incur is of the order in our greedy search for the best support.
Iiia A Repeated Greedy Search
It is obvious that to achieve the best signal estimate, should contain all possibilities of supports. However, our greedy algorithm would result in an approximation of the signal estimate because it selects only a handful of supports. The accuracy of the reconstructed signal is dependent on the number of support vectors in and may be increased by repeating the greedy algorithm a number of times (e.g., ). This would result in with number of support vectors for each of the sparsity levels (i.e., a total of supports). The selection of supports in subsequent repetitions of the algorithm is performed by making sure not to select an element at a particular sparsity level that has been selected at the same sparsity level in any of the previous repetitions. We note that a repeated greedy search in this manner would incur a complexity of order . For a detailed description of the steps followed by the method, the pseudocode is provided in the Appendix and the nGpFBMP code is provided on the author’s website^{3}^{3}3The MATLAB code of the nGpFBMP algorithm and the results from various experiments discussed in this paper are provided at http://faculty.kfupm.edu.sa/ee/naffouri/publications.html.
Remark
Let and be two different support vectors at the th sparsity level. Note from (14) that if for some and , then that inequality remains valid regardless of how and change. In other words, the selection of dominant supports at each sparsity level is independent of these quantities. This observation helps the algorithm to bootstrap itself and to estimate the unknown sparsity rate and noise variance, as demonstrated ahead.
Iv Efficient Computation of the Dominant Support Selection Metric
As explained in Section III, requires extensive computation to determine the dominant supports. The computational complexity of the proposed algorithm is therefore largely dependent upon the way is computed. In this section, a computationally efficient procedure to calculate this quantity is presented.
We note that the efficient computation of depends mainly on the efficient computation of the term . Our focus is therefore on computing efficiently.
Consider the general support with and let and denote the following subset and superset, respectively:
(15) 
where . In the following, we demonstrate how to update to obtain^{4}^{4}4We explicitly indicate the size of in this notation as it elucidates the recursive nature of the developed algorithms. . Here, we use to designate the supports and thus refers to . We note that
(16) 
By using the block inversion formula to express the inverse of the above and simplifying, we get
(17) 
This recursion is initialized by for . The recursion also depends on , and . The recursions for , and may be determined in a similar fashion as
(18) 
and^{5}^{5}5Notation such as is a short hand for .
(19) 
The two recursions (18) and (19) start at and are thus initialized by and for . This completes the recursion of which we utilize for recursive evaluation of as
(20) 
V Estimation of the hyperparameters and
One of the advantages of the proposed nGpFBMP is that it is agnostic with regard to signal statistics; the only parameters required are the noise variance, , and the sparsity rate, . In Section III, we pointed out that the dominant support selection process is independent of parameters and . We are therefore able to determine the supports irrespective of the initial estimates of these parameters. The independence of the dominant support selection process from and allows accurate and rapid estimation of these parameters. We note that although and are not needed for support calculations, they are required for computing the posteriors used in the calculation of in (13). To determine these estimates, we might opt for finding the maximumlikelihood (ML) or maximum a posteriori (MAP) estimates using the expectation maximization (EM) algorithm. However, this will add to the computational complexity and is unnecessary as a fairly accurate estimation could be performed in a very simple manner as follows.
The maximum a posteriori (MAP) estimate of support is given by . We use this to get the MAP estimate of , i.e., . This is in turn used to estimate , iteratively, as follows:
(21) 
Here, the superscripts refer to a particular iteration. The estimate is computed iteratively where in the first iteration of nGpFBMP, is initialized by , the given initial estimate, to compute . This is used to find the new estimate, , using (21) which is then supplied to nGpFBMP in the next iteration to compute . This process is repeated until the estimate of changes by less than or until a predetermined maximum number of iterations has been performed. Simulation results show that, in most cases, converges rapidly. At this stage, the estimate of the noise variance can be computed as follows:
(22) 
We note that we do not need any iteration for estimating the noise variance. We use the corresponding to the final estimate, , to find .
Vi Results
To demonstrate the performance of the proposed nGpFBMP, we compare it here with Fast Bayesian Matching Pursuit (FBMP)[9] and the convex relaxationbased () approach. The reason FBMP was selected is that it was shown to outperform a number of the contemporary sparse signal recovery algorithms, including OMP [13], StOMP [15], GPSRBasic [16], and BCS [18]. Comparison with FBMP shows that nGpFBMP performs where FBMP fails for various signal settings which are discussed in detail in the following.
Signal Setup
Experiments were conducted for signals drawn from Gaussian as well as nonGaussian distributions. The following signal configurations were used for the experiments:

Gaussian (i.i.d. ()).

NonGaussian (Uniform, noni.i.d. ()).

Unknown distribution (for this example, different images with unknown statistics were used),
where and refer to the mean and variance of the corresponding distributions, respectively.
Entries of sensing/measurement matrix were i.i.d., with zero means and complex Gaussian distribution where the columns were normalized to the unit norm. The size of selected for the experiments was . The noise had a zero mean and was white and Gaussian, , with determined according to the desired signaltonoise ratio (SNR). Initial estimates of the hyperparameters used for the simulations were , , , and , where estimates of the signal mean and variance were needed for FBMP.
In all of the experiments, parameter refinement was performed for both nGpFBMP and FBMP. For FBMP, the surrogate EM method proposed by its authors was used to refine the hyperparameters. The refinements were allowed to perform for a maximum of iterations. For fairness, support and amplitude refinement [3] procedures were performed on the results of the CS algorithm^{6}^{6}6Actual parameter values were provided to the CS algorithm instead of estimates; furthermore, support and amplitude refinement was also performed to demonstrate that, despite these measures, its performance was inferior to that of nGpFBMP.. Finally, the normalized meansquared error (NMSE) between the original signal, , and its MMSE estimate, , was used as the performance measure:
(23) 
where was the number of trials performed to compute NMSE between and its estimate.
Experiment 1 (Signal estimation performance comparison for varying SNR)
In the first set of experiments, NMSEs were measured for values of SNR between 0 dB and 30 dB and plotted to compare the performance of nGpFBMP with FBMP and the CS algorithm. The signal sparsity rate selected for these experiments was .
Figs. (a)a and (b)b show that the proposed method has better NMSE performance than both FBMP and CS for all considered signals. Only at very high values of SNR does the NMSE performance of FBMP approach that of nGpFBMP.
Experiment 2 (Signal estimation performance comparison for varying sparsity parameter )
In a similar set of experiments, NMSE and mean runtime were measured for different values of sparsity parameter . The value of SNR selected for these experiments was dB.
Figs. 3 and 4 demonstrate the superiority of nGpFBMP over FBMP and CS. Runtime graphs of Figs. 3 and 4 depict that the runtime of nGpFBMP increases for higher values of . This occurs because the initial estimate of was , and as the sparsity rate of increased, more iterations were required to estimate the value of . With higher values of , the difference in runtime is insignificant given the excellent NMSE performance of our method. We also observe that performance of nGpFBMP is relatively insensitive to changes in as the corresponding changes in NMSE are very small, thus demonstrating the strength of the proposed algorithm.
Experiment 3 (Comparison of signal estimates when the initial statistics of signal and noise are very close to the actual values)
Table II compares the average NMSEs of FBMP and nGpFBMP for different types of signals when the initial estimates () were chosen to be very near to their actual values. Since nGpFBMP is independent of these initial estimates its performance did not change. On the other hand, performance of FBMP improved, although it did not outperform nGpFBMP. The results show the robustness of our algorithm as it is not dependent on the initial estimates of . We note that each value in Table II has been computed by averaging the results of 500 independent experiments.
Signal type  FBMP  nGpFBMP 

Gaussian  
Uniform (i.i.d.)  
Uniform (noni.i.d.) 
Experiment 4 (Comparison of multiscale image recovery performance)
In another experiment, we carried out multiscale recovery of different images that were pixels. These images are shown in the first row of Fig. 5. Onelevel Haar wavelet decomposition of these images was performed, resulting in one ‘approximation’ (lowfrequency) and three ‘detail’ (highfrequency) images. Unlike the approximation component, the detail components are compressible in nature. We, therefore, generated their sparse versions by applying a suitable threshold; the noisy random measurements were acquired later from the threshold versions. The detail components were reconstructed from these measurements through nGpFBMP. Finally, inverse wavelet transform was applied to reconstruct the images from the recovered details and the original approximations. Reconstruction errors and times were recorded and, for comparison, recoveries were obtained using FBMP. The resulting reconstructed images are shown in Fig. 5. Numerical details of the results for these experiments are given in Table III, from which it is obvious that images reconstructed using nGpFBMP have lower NMSEs and require a significantly shorter reconstruction time than does FBMP.
Image  FBMP  nGpFBMP  

(dB)  Time (s)  (dB)  Time (s)  
Mondrian  
Fingerprint  
Facets 
Vii Conclusion
In this paper, we presented a robust Bayesian matching pursuit algorithm based on a fast recursive method. Compared with other robust algorithms, our algorithm does not require signals to be derived from some known distribution. This is useful when we can not estimate the parameters of the signal distributions. Application of the proposed method on several different signal types demonstrated its superiority and robustness.
References
 [1] K. Jia, X. Wang, and X. Tang, “Optical flow estimation using learned sparse model,” in IEEE Int. Conf. on Comp. Vision (ICCV), Nov. 2011, pp. 2391 –2398.
 [2] M. Lustig, D. Donoho, J. Santos, and J. Pauly, “Compressed sensing MRI,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 72 –82, Mar. 2008.
 [3] T. AlNaffouri, A. Quadeer, F. AlShaalan, and H. Hmida, “Impulsive noise estimation and cancellation in DSL using compressive sampling,” in IEEE Int. Symp. on Circuits and Syst. (ISCAS), May 2011, pp. 2133 –2136.
 [4] M. Firooz and S. Roy, “Network tomography via compressed sensing,” in IEEE Global Telecommun. Conf. (GLOBECOM 2010), Dec. 2010, pp. 1 –5.
 [5] T. Y. AlNaffouri, E. B. AlSafadi, and M. E. Eltayeb, “OFDM peaktoaverage power ratio reduction method,” U.S. Patent 2011/0 122 930, May, 2011.
 [6] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. of the Roy. Statistical Soc. Series BMethodological, vol. 58, no. 1, pp. 267–288, 1996.
 [7] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1998.
 [8] A. Quadeer, S. Ahmed, and T. AlNaffouri, “Structure based bayesian sparse reconstruction using nongaussian prior,” in 49th Annu. Allerton Conf. on Commun., Control, and Computing (Allerton), Sep. 2011, pp. 277 –283.
 [9] P. Schniter, L. C. Potter, and J. Ziniel, “Fast Bayesian Matching Pursuit: Model Uncertainty and Parameter Estimation for Sparse Linear Models,” IEEE Trans. Signal Process., to be published. [Online]. Available: http://www.ece.osu.edu/~zinielj/FBMP_TransSP.pdf
 [10] E. Candes and J. Romberg, “Sparsity and incoherence in compressive sampling,” Inverse Problems, vol. 23, no. 3, pp. 969–985, Jun. 2007.
 [11] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, pp. 489–509, Feb. 2006.
 [12] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, pp. 1289–1306, Apr. 2006.
 [13] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, pp. 4655–4666, Dec. 2007.
 [14] J. Haupt and R. Nowak, “Signal reconstruction from noisy random projections,” IEEE Trans. Inf. Theory, vol. 52, pp. 4036–4048, Sep. 2006.
 [15] D. L. Donoho, Y. Tsaig, I. Drori, and J.L. Starck, “Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit,” IEEE Trans. Inf. Theory, vol. 58, pp. 1094–1121, Feb. 2012.
 [16] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems,” IEEE J. Sel. Topics Signal Process., vol. 1, pp. 586–597, Dec. 2007.
 [17] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. of Mach. Learning Research, vol. 1, no. 3, pp. 211–244, 2001.
 [18] S. Ji and L. Carin, “Bayesian compressive sensing and projection optimization,” in Proc. of the 24th Int. Conf. on Mach. Learning, 2007, pp. 377–384.
 [19] C. M. Bishop and M. E. Tipping, “Variational relevance vector machines,” in Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence, 2000, pp. 46–53.
 [20] E. G. Larsson and Y. Selén, “Linear regression with a sparse parameter vector,” IEEE Trans. Signal Process., vol. 55, pp. 451–460, Feb 2007.