DetectionDirected Sparse Estimation using Bayesian Hypothesis Test and Belief Propagation
Abstract
In this paper, we propose a sparse recovery algorithm called detectiondirected (DD) sparse estimation using Bayesian hypothesis test (BHT) and belief propagation (BP). In this framework, we consider the use of sparsebinary sensing matrices which has the treelike property and the sampledmessage approach for the implementation of BP.
The key idea behind the proposed algorithm is that the recovery takes DDestimation structure consisting of two parts: support detection and signal value estimation. BP and BHT perform the support detection, and an MMSE estimator finds the signal values using the detected support set. The proposed algorithm provides noiserobustness against measurement noise beyond the conventional MAP approach, as well as a solution to remove quantization effect by the sampledmessage based BP independently of memory size for the message sampling.
We explain how the proposed algorithm can have the aforementioned characteristics via exemplary discussion. In addition, our experiments validate such superiority of the proposed algorithm, compared to recent algorithms under noisy setup. Interestingly the experimental results show that performance of the proposed algorithm approaches that of the oracle estimator as SNR becomes higher.
Noisy sparse recovery, sparse support detection, belief propagation, detectiondirected estimation
I Introduction
Sparse signal recovery in the presence of noise has been intensively investigated in many recent literature because any realworld devices are subject to at least a small amount of noise. We refer to such problems as noisy sparse signal recovery (NSR) problems. Let denote a sparse signal vector whose elements are sparsely nonzeros. Then, the NSR decoder observes a measurement vector , where is a fat sensing matrix ; and we limit our discussion to zeromean independently identically distributed (i.i.d.) Gaussian noise denoted by .
The NSR problem can then be defined as an norm minimization problem as similarly done in [1],[3],[7],[8]:
(1) 
where is an error tolerance parameter. In general, the minimization task in (1) is NP hard; therefore, norm approaches have been developed and discussed as alternatives [1][6]. Among the solvers, the Dantzig selector (L1DS), proposed by Candes and Tao [5], and the penalized leastsquare algorithm usually called LASSO [6] has been devised for the Gaussian noise setup.
Bayesian approaches to NSR have also received attention, where the minimization task in (1) is described as the maximum a posteriori (MAP) problem, and the sparse solution then is sought by imposing sparsifying prior density on as follows [8][12]:
(2) 
where is a probability density function (PDF).
The task in (2) is likewise computationally demanding; hence, many relaxedBayesian solvers have been devised according to various prior types and applied techniques, such as sparse Bayesianlearning (SBL) [9][13],[20], Approximate minimummeansquarederror (AMMSE) estimation [14][17], and belief propagation (BP) with sparse sensing matrices (BPSM) [18][21]. A summary of the relaxedBayesian solvers is in Table I.
Algorithm type  Sensing matrix type  Prior type  Utilized techniques 

BPSM  Sparseternary [18],[19]  Gaussianmixture [18],[19]  BeliefPropagation (BP) [18],[19] 
Sparsebinary [20],[21]  HierarchicalGaussian [20],[21]  Expectationmaximization (EM), BP [20],[21]  
Gaussianrandom [14]  Spikeandslab [14]  FastBayesianMatchingPursuit [14]  
AMMSE  Trainingbased [15]  Gaussianmixture [15],[16]  RandomOMP [15] 
Randomunitary [16]  ClosedformMMSE [16]  
Gaussiankernel [9]  HierarchicalGaussian [9][11]  Relevancevectormachine, EM [9][12]  
SBL  Gaussianrandom [10][13]  Spikeandslab [13]  MarkovchainMonteCarlo [13] 
Uniformrandom [12]  HierarchicalLaplace [12] 
We are mainly interested in the BPSM framework, in this paper, which has been investigated as a lowcomputational approach to solve linear estimation problems such as . In this framework, the matrix is assumed to be a sparse matrix which has the treelike property, and the statistical connection of and is described with the bipartite graph model of . The treelike property ensures that the corresponding graph is asymptotically cyclefree [23],[24],[33]. In addition, it is known that the vector finding problem can be decomposed to a sequence of scalar finding problems in the BPSM framework, where marginal posterior for each scalar estimation is obtained by an iterative messagepassing process. This decomposition have been explained by decoupling principle [22],[24].
For implementation of BP, two approaches have been mainly discussed according to the message representation: 1) the sampledmessage based BP [27],[28] where the message is directly sampled from the corresponding PDF with a certain step size such that the message is treated as a vector, and 2) the parametricmessage based BP (also called relaxed BP) [23],[29],[30] where the message is described as a function with a certain set of parameters such as mean and variance. If the sampledmessage approach is chosen, quantization error will be induced according to the step size. If the parametricmessage approach is used, some model approximation must be applied for stable iteration at the expense of approximation error.
As applications of the BPSM approach, the lowdensity paritycheck coding [31][33] and the CDMA multiuser detection problems [22],[25],[26] are well known in addition to the NSR works [18][21].
Based on the rigorous theoretical support for the BPSM framework by Guo et al. [22][25] and Tanaka et al. [26], and motivated by recent NSR works by Baron et al. [18],[19], Tan et al. [20], and Akcakaya et al. [21], in this paper, we aim to develop a BPSM type algorithm as an alternative for solving (2). We refer to the proposed algorithm as Detectiondirected sparse estimation via Bayesian hypothesis test and belief propagation (BHTBP). Differently from the works [18][21] solving the MAP problem in (2), the proposed algorithm takes a structure of detectiondirected (DD) estimation which consists of a signal support detector and a signal value estimator. The support detector is designed using a combination of the sampledmessage based BP and a novel Bayesian hypothesis test (BHT), and the signal value estimator behaves in minimum meansquare error (MMSE) sense. The DDstructure considers the common procedure of first using the measurements at hand to detect the signal support set. This detected support is then used in the model of the sparse signal, and an MMSE estimator finds the signal values as if the detected support set is in fact the correct set.
This DDmethodology was originally investigated by Middleton et al. for estimation of noisy signals [36], and includes wide application areas, such as communication systems [37],[38] and speech processing [39]. For NSR, a line of previous works in the AMMSE group has independently studied similar structures by Schnitter et al., Elad et al., and Lee in [14][17],[40].
Then, the proposed algorithm achieves the following properties:

Providing robust signal support detection against the measurement noise,

Removing quantization effect caused by the use of the sampledmessage based BP.
Here, the “oracle estimator” implies the estimator which has the perfect support knowledge.
The combination of BHT and BP enables robust detection of the signal support against measurement noise, which was partially introduced in our previous work [43]. In the support detection, the sampledmessage based BP provides marginal posterior for each scalar problem according to the decoupling principle, and the BHTprocess then detects the supportive state of each scalar element by measuring the inner products between the marginal posterior and reference functions composed of the prior density. This BHTdetector utilizes the signal posterior information more efficiently than the conventional MAPapproaches [18][21]. When the measurements are noisy, density spread occurs in marginal posteriors, leading difficulty in making correct decisions via the MAPapproach. In contrast, the BHTbased support detector compensates such a weakpoint by scanning the entire range of the posterior. Such hypothesisbased support detectors have been discussed in wavelet domain denoising problems [41],[42]; however, they were using thresholding techniques to sort out significant wavelet coefficients, different from our work which uses the inner product.
In addition, we emphasize that the proposed algorithm effectively removes quantization effect coming from the use of the sampledmessage based BP. The quantization effect limits performance of the MAPapproach in both the signal value estimation and the support detection. In the proposed algorithm, the use of the DDstructure makes the performance independent of the message sampling in the BP. In addition, we eliminate the quantization error in the support detection by applying the minimum value of the signal on its support, denoted by , to the reference functions. The importance of in the NSR problems was theoretically highlighted by Wainwright et al. in [34],[35] where they showed that the perfect support recovery is very difficult even with arbitrarily large SNR if is very small. Hence, we regulate in our signal model, reflecting the knowledge of in the BHTdetection. To the best of our knowledge, we have not seen the reflection of in practical algorithms in the sparse recovery literature, and surprisingly, this reflection enables performance of the proposed algorithm to approach that of the oracle estimator as SNR increases.
The computational complexity of the proposed algorithm is (if the cardinality of the support is fixed to ), which includes an additional cost owing to the BHTprocess to the cost of BP . Nevertheless, the cost of the proposed algorithm is lower in practice since the proposed algorithm can catch the signal support with smaller memory size than NSR algorithms only using the sampledmessage based BP.
The remainder of the paper is organized as follows. We first briefly review a line of the BPSM algorithms, and then make a remark for the relation between the BPSM solvers and approximate message passing (AMP)type algorithms in Section II. In Section III, we define our problem formulation. Section IV provides precise description for the proposed algorithm and Section V provides exemplary discussion to explain and support strengths of the proposed algorithm. In Section VI, we provide experimental validation to show the advantage of the proposed algorithm and compare to the recent algorithms. Finally, we conclude the paper in Section VII.
Ii Related Works
In this section, we provide a brief introduction to the previous works in the BPSM algorithms [18][21]. The common feature in these algorithms is the use of BP in conjunction with sparse sensing matrices, to approximate the signal posterior, where a sparsifying prior is imposed according to the signal model. In addition, we make a remark to distinguish the BPSM works from AMPtype algorithms.
Iia BPSM Algorithms
Baron et al. for the first time proposed the use of BP to the sparse recovery problem with sparse sensing matrices [18],[19]. The algorithm is called CSBP. Signal model of CSBP is a compressible signal which has a small number of large elements and a large number of nearzero elements. The author associated this signal model with twostate mixture Gaussian prior, given as
(3) 
where denotes the probability that an element has the large value, and . Therefore, the prior is fully parameterized with , and . CSBP performs MAP or MMSE estimation using the signal posterior obtained from BP, where the authors applied both the sampledmessage and the parametricmessage approaches for the BPimplementation. The recovery performance is not very good when measurement noise is severe since the CSBP was basically designed under noiseless setup.
Tan et al. proposed an algorithm under the BPSM setup called BPSBL [20]. This work is based on the SBLframework [9][12] which uses twolayer hierarchicalGaussian prior models given as
(4) 
where is the hyperprior following Gamma distribution with its parameters . At each iteration, the parameters of the prior are estimated using expectation maximization (EM). Therefore, the posterior for the signal estimation is iteratively approximated from the prior. The authors applied the BPSM setup to reduce the computational cost of EM. BPSBL is an algorithm using parametricmessage based BP where every message is approximated to be a Gaussian PDF which can be fully described by its mean and variance. In addition, BPSBL is input parameterfree, which means this algorithm is adaptive to any signal models and noise level since EM estimates the parameters associated with any given models during the iteration. However, such parameter estimation will not be accurate when noise is severe; therefore, denoising ability of BPSBL is limited when measurements are highly corrupted.
Most recently, Akcakaya et al. proposed SuPrEM under a framework similar to BPSBL which uses a combination of EM and the parametricmessage based BP [21]. The main difference from BPSBL is the use of a specific type of hyperprior called Jeffreys’ prior . The use of Jeffreys’ prior reduces the number of input parameters while sparsifying the signal. Therefore, the prior is given as
(5) 
The sensing matrix used in SuPrEM is restricted to a sparsebinary matrix which has fixed column and row weights, called lowdensity frames. They are reminiscent of the regular LDPC codes [31]. In addition, the signal model is confined to sparse signals consisting of nonzeros and zeros since SuPrEM includes a spasifying step which choose the largest elements at each end of iteration. The noise statistic is an optional input to the algorithm. Naturally, if the noise information is available, SuPrEM will provide an improved recovery performance.
IiB Relation to AMPs
It is interesting to associate the algorithms in the BPSM group to the class of AMP algorithms, which was originally invented by Donoho et al. in [44],[45], analyzed and refined by Bayati et al. [46] and Rangan [47], because both types of the algorithms utilize BP. Performance of the AMPs coincides with that of the solvers under the large limit assumption () in the noiseless case [44],[46]. Furthermore, in a recent result by Donoho et al. showed that performance of the AMPs is also equivalent to LASSO [6] under the noisy setting with appropriate parameter calibration [48].
The AMPs work well when sufficient density of the sensing matrix is maintained as since the AMPs were developed on the basis of an approximation of BPmessages by the central limit theorem [45]. Therefore, if a sparse matrix is applied to the AMPs, the message approximation does not work, then the decoder will fail in the recovery. We validate this claim by our own experimental result shown in Fig.1, where we simulated the recovery success rate of the standard AMP [44] without additive noise as a function of the matrix sparsity defined as percentage of nonzero entries in the sensing matrix. The recovery of the AMP are not successful when the sensing matrix is sparse ( typically less than 10 matrix sparsity) regardless of the number of measurements , as shown in an example of Fig.1. Namely, the AMPs recover sparse signals well with dense matrices at the expense of low computation , but it does not enjoy the benefits from the use of sparse matrices.
Iii Problem Formulation
The goal of the proposed algorithm is to recover the object signal given the knowledge of and noisy measurements as following
(6) 
where we consider the use of a fat sparsebinary sensing matrix which has very low matrix sparsity (typically less than 10 matrix sparsity) and the treelike property. We regulate the matrix sparsity using the fixed column weight since this regulation enables the matrix to span the measurement space with basis having equal energy. For the noise distribution, we assume i.i.d. zeromean Gaussian density such that the noise vector is drawn from .
In the remainder of this section, we introduce some necessary concepts, such as signal model and graph representation of , for our framework, and then discuss our approach to solve this NSR problem.
Iiia Graph Representation of
Bipartite graphs effectively represent linear systems with sparse matrices such as the matrix . Let denote the set of indices corresponding to the elements of the signal vector, , and denote the set of indices corresponding to the elements of the measurement vector, . In addition, we define the set of edges connecting and as where is the th element of . Then, a bipartite graph fully describes the neighboring relation in the linear system. For convenience, we define the neighbor set of and as and , respectively. Note that under our setup on .
IiiB Signal Model
Let denote a sparse vector which is a deterministic realization of a random vector . Here, we assume that the elements of are i.i.d., and each belongs to the support set, denoted by , with a rate . To indicate the supportive state of , we define a state vector where each is defined as
(7) 
Therfore, the signal sparsity is given as . For the signal values on the support, we will deal with two cases, which are

Gaussian signals:,

Signed signals :,
where denote the delta function peaked at . Namely, in the first case, the values on the support are distributed according to an i.i.d. zeromean Gaussian density with variance , and in the second case the magnitude of the values is fixed to and the sign follows Bernoulli distribution with probability . In addition, for the Gaussian signal, we regulate the minimum value on the support as a key parameter of the signal model, such that
(8) 
Indeed, Wainwright et al. emphasized the importance of in the NSR problems [34],[35]. They stated that if is very small, success of the noisy sparse recovery is very difficult even with arbitrarily large SNR. Namely, if the signal contains at least such a very small element belonging to the support, the recovery can be failed regardless of SNR level since the small element can be buried even by negligible noise.
In the Bayesian framework, the prior density takes a role of pursuing the sparsest solution from infinitely many solutions of the underdetermined system. Therefore, the proper choice of the sparsifying prior according to the signal model is highly important. According to our signal model, we consider spikeandslab densities as our prior. The prior for an element is given as
(9) 
This type of the priors has been widely used in modeling of strictly sparse signals [13],[14],[53] since the prior can simply characterize the signals with and , as well as easily associate the signal model with the other models such as hidden Markov chain and wavelet model. In addition, we note that the spikeandslab prior is a particular case of twostate Gaussian mixture in (3) as .
IiiC Solution Approach
The approach of the DDestimation is motivated by the naive MMSE sparse estimation which was also reported in [14][16], as follows:
(10) 
We note that (10) is a weighted sum of the signal value estimation over possible supports given noisy measurements . Therefore, we separate the signal value estimation and the support searching in (10) as the first step for its relaxation. Then, the signal value estimation can be represented as
(11) 
by assuming an ideal support detector for . The calculation of (11) is simple because it is well known as the convectional linear MMSE estimation, expressed as
(12) 
where as a submatrix of that contains only the columns corresponding to the detected support , and as a vector that includes only the nonzero elements from . In addition, we know that this MMSE estimation is optimal when the signal values on the detected support are Gaussian distributed [49].
For the part of the support search, we start from an exhaustive detection, described as
(13) 
To solve (13) efficiently, we decompose the state vector detection to scalar state detections based on the decoupling principle [22],[24], and apply binary scalar MAPdetection for each scalar state. Then, the binary scalar detection can be optimally achieved by a hypothesis test [50], given as
(14) 
where and are two possible hypotheses. This support detection approach simplifies the exhaustive search in (13) to hypothesis tests. Here, we note that the support detection in (14) performs with an independent decision rule from the value estimation in (12). Therefore, this DDapproach to (10) is reasonable since the Bayesian risks of the support detection and the signal values estimation can be independently minimized [36]. The overall flow of the DDestimation is depicted in Fig.2.
In order to justify our solution approach, we will provide experimental validation compared to the recent NSR algorithms, as a function of SNR defined as
(15) 
Performance of the support detection is evaluated pairwise state error rate (SER), defined as
(16) 
and the overall performance of the signal recovery is measured in terms of normalized mean square error (MSE), defined as
(17) 
Iv Proposed Algorithm
The proposed algorithm straightforwardly provides the estimate via the MMSE estimation in (12) once given the detected support set from (13). In addition, we state that the detected support set can be elementwisely obtained from (14), on the basis of the decoupling principle. Therefore, efficient design of the scalar state detector in (14) is very important in this work. In this section, we explain the implementation of the scalar state detector based on the hypotesis test given in (14), using the combination of the sampledmessage based BP and BHT.
Iva Sampledmessage based Belief Propagation
BP provides the marginal posterior for the hypothesis test in (14) for each element. Since the signal is realvalued, each BPmessage takes the form of a PDF, and the BPiteration consists of densitymessage passing. We provide a brief summary of densitymessage passing in Appendix I. To implement the BP passing densitymessages, we consider the sampledmessage approach which has been discussed by Sudderth et al. [27], and Noorshams et al. [28]. For the sparse recovery, Baron et al. applied the approach in [18],[19]. The main strength of the sampledmessage approach is adaptivity to various signal models. In addition, it shows faster convergence than the parametricmessage approach [23],[29],[30] if the sampling step size is sufficiently fine. The reason is that the sampledmessage approach does not use any model approximation or reduction for the message representation during the iteration. Based on (33) and (Appendix I Fundamentals of Densitymessage Passing) from Appendix I, we will provide a practical message update rule of the sampledmessage based BP for the proposed algorithm.
In our implementation, we set the sampling step on the basis of the three sigmarule [52], given as
(18) 
where is the number of samples for a densitymessage. Hence, we define the sampling process of the PDFs as
(19) 
where denotes the sampling process. Hence, the densitymessages are treated as vectors with size in this approach.
Let denote a sampled densitymessage from to , called the signal message;, and is the message from to , called the measurement message. The signal message includes information on the marginal posterior , being obtained from (33) by simply replacing the measurement densities, , with the measurement messages of the previous iteration except the intrinsic information. That is,
(20) 
, where is the normalization function to make . Similarly, the measurement message includes information on the measurement density , being obtained from the expression of (Appendix I Fundamentals of Densitymessage Passing) by replacing the associated marginal posteriors, , with the signal messages, that is,
(21) 
where is the operator for the linear convolution of PDFs. In addition, under the Gaussian noise assumption, we know in (21) based on (Appendix I Fundamentals of Densitymessage Passing). Here, we note that the measurement message calculation utilizes the noise statistics distinctively from that of the standard CSBP [see (7) in [19]]. This improvement remarkably enhances the recovery performance in the low SNR regime even though the noise statistic loses its effect with sufficiently high SNR.
The convolution operations in (21) can be efficiently computed by using the Fast fourier transform (FFT). Accordingly, we express for the measurement message calculation as
(22) 
where denotes the FFT operation. Therefore, for efficient use of FFT, the sampling step should be appropriately chosen such that is two’s power. In fact, the use of the FFT brings a small calculation gap since the FFTbased calculation performs a circular convolution that produces output having a heavy tail. The heaviness increases as the column weights of increases. However, the difference is can be ignored, especially when the messages are bellshaped densities.
IvB Bayesian Hypothesis Test for Support Detection
In order to perform the hypothesis test in (14), the decoder needs to calculate the probability ratio . By factorizing over , the hypothesis test becomes
(24) 
In practice, we replace the marginal posterior with the sampled marginal posterior from (23) for discrete signal processing, which is provided in Algorithm 1. However, we use notations of the continuous domain in the description of this hypothesis process. Given from BP, the given measurements do not provide any additional information on ; hence, holds true. Using such a fact and the Bayesian rule, the test in (IVB) is rewritten as
(25) 
where , and are reference functions consisting of the prior knowledge defined as
(26) 
The process to calculate the probability ratio in (25) is described as a block diagram in Fig.4. This process has a similar structure to matched filtering in communication systems, where the detector determines supportive state of by measuring inner products between the marginal posterior and reference functions. In addition, we emphasize that this BHTbased detector is only compatible with the sampledmessage based BP because the BHTprocess requires full information on the signal posterior which cannot be provided through the parametricmessage based BP.
IvC Computational Complexity
In the sampledmessage approach, the densitymessages in BP are vectors with size . Therefore, the decoder requires flops to calculate a signal message and flops for a measurement message per iteration, where denotes the column weight of the sparse sensing matrix . In addition, the cost of the FFTbased convolution is if we assume the row weight is using average sense. Hence, the periteration cost of the BPprocess is flops. For the hypothesis test, the decoder requires flops to generate the probability ratio. The cost for the hypothesis test is much smaller than that of BP; therefore, it is ignored. For the MMSE estimator to find signal values, the cost can be reduced upto flops by applying QR decomposition [51]. Thus, the total complexity of the proposed algorithm is flops and it is further simplified to since and are fixed. In addition, it is known that the recovery convergence via BP is achieved with iterations [19],[32]. Therefore, we finally obtain for the complexity of BHTBP. We note here that the complexity of BHTBP is much relaxed from that of the naive MMSE sparse estimator , given in (10).
V Exemplary Discussion for Proposed Algorithm
One can argue the DDstructure of the proposed algorithm is an abuse since the marginal posteriors from the BP already provides full information on the signal. Namely, this means that in (23) contains the perfect knowledge for detection and estimation of . Yes it is true, but our claim is that the MAPbased algorithms [18],[19] which solve the problem in (2) are not utilizing the full knowledge of the marginal posterior.
In this section, we show two weakpoints of the MAPapproach which finds each scalar estimate only using the peak location of the marginal posterior, through examples. We then discuss how the proposed algorithm can remedy such problematic behavior of the MAP, and utilize the posterior knowledge more efficiently. We claim that the proposed algorithm has strength in two aspects as given below:

Robust support detection against additive noise: The BHTdetector in the proposed algorithm correctly catches the supportive state given even under severe additive noise.

Removing quantization effect caused by the sampledmessage based BP: The quantization effect degrades performance of the MAPbased algorithms in both the signal value estimation and the support detection. In the proposed algorithm, the DDstructure removes the quantization effect from the signal value estimation, and the BHTdetector reduces the chance of misdetection of the supportive state given by applying the knowledge of to the reference functions.
Va Robust Support Detection against Additive Noise
The additive noise spreads probability mass in the marginal posterior , leading to difficulty in the supportive state detection via the MAPapproach given . For example, Fig.6 shows marginal posterior obtained from severe noisy measurements (SNR=10 dB) via BPiteration, where the signal value is originally , hence . We note that the posterior is approximately composed of a zerospike and a slabdensity with certain probability weights. In Fig.6, it is shown that the center of the slabdensity is moving to the true value over the iterations, but after 5 iterations the mass at , i.e., , is still smaller than that of the zerospike. The reason is that the noise effect spreads the slabdensity over near values, making the peak at to be blunt. We call this spreading effect as density spread.
Fig.6 more clearly describes the density spread in the marginal posterior according different SNR levels. When SNR level is sufficiently high (see cases more than SNR=20 dB in Fig.6), the MAPapproach can successfully detect the supportive state since probability mass is concentrated on the center of the slabdensity such that . However, when SNR is low (see the line of SNR=10 dB in Fig.6), the MAPapproach will fail in the detection because the zerospike becomes the highest peak in the marginal posterior such that due to the density spread. The density spread does not cause errors given since in this case the center of the slabdensity stays at during the BPiteration, regardless of the noise level.
In contrast, the BHTdetector in the proposed algorithm, decides the supportive state by considering the density spread effect. In (25), the inner products between and measure portions of the marginal posterior corresponding to and respectively. Since the inner products are associated with the entire range of the axis rather than specific pointmass, the BHTdetector can decide the supportive state by incorporating all spread mass due to noise. Therefore, the BHTdetector can success in the detection even in the SNR=10 dB case of Fig.6. This example supports that BHTdetector has ability to detect the signal support more robustly against noise than the MAPapproach.
VB Removing Quantization Effect caused by the Sampledmessage Based BP
VB1 In signal value estimation
Under the use of the sampledmessage based BP, quantization error is inevitable in the signal value estimation. When we apply rounding for the quantization, the MSE error is uniformly distributed with zeromean and variance according to the sampling step size , given as
(27) 
where denotes the quantization function with . Therefore, MSE performance of the MAPbased algorithms with the sampling cannot exceed the limit given by (VB1) even if the level of additive noise is extremely low. To relax this limit, we can increase memory size or confine the range of signal value by decreasing . However, such methods are impractical and restrict flexibility of signal models in the recovery. The DDstructure, described in (13) and (11), removes this weakpoint since the signal values are evaluated using an MMSE estimator in (12) independently of once the detected support is given. Furthermore, the probability ratio for the BHTdetection can be generated from sampled marginal densities with small .
VB2 In support detection
The quantization effect also can cause detection error of the supportive state given , limiting performance of the MAPapproach in the high SNR regime. In order to explain such cases, we provide an example of the measurement messagepassing with respect to as shown in Fig.7. In this example, we try to find the value of given , , , , and , under noiseless setup. Note that in practical operation of the algorithm, the value of each element corresponds to the peak location in the axis of each densitymessage. In the sampledmessage based BP, the message sampling causes quantization of the values such that we have , , , with the step size where denotes the quantization function. In this case, we can simply infer that this measurement message passes a corrupted value which is not matched with . If most of the measurement messages to has such corruption, the marginal posterior of will have the peak at an erroneous location, leading to detection error based on the MAPapproach. The same event can occur given . However, the effect is not significant because the case of does not bring about the support detection error. In addition, this type of errors is remarkable particularly in the high SNR regime since such corruption is covered by additive noise when SNR is low. Accordingly, the support detection performance of the MAPapproach shows an errorfloor in the high SNR regime.
Here, we aim to show how the proposed BHTdetector utilizes the prior knowledge of to remove the errorfloor of the MAP approach. As we mentioned in Section IIIA, we consider the signal model which has as a key parameter, according to the result of Wainright et al. [34],[35]. The Wainright’ result revealed that the regulation of is imperative for perfect support recovery under noisy setup. This means that we can have additional prior knowledge to the sparse recovery. From (8), the knowledge of tells us that there exist no nonzero elements which have value within . The BHTdetector reflects to calibrate the reference functions. Rather than the functions given in (26), we use its calibrated version given as
(28) 
to improve the performance for the Gaussian signal model, where
with a constant . For signed signals, we simply use .
This calibration depresses the weight of and puts more weight to over , as shown in Fig.8. This implies that the detector excludes elements within from the signal support set. Therefore, we can eliminate the misdetection cases given such as the example in Fig.7, removing the errorfloor in the high SNR regime effectively.
Algorithm  Complexity  Type of  Use of no 

for recovery  ise statistic  
BHTBP  sparsebinary  Yes  
Standard  sparsebinary  No  
CSBP  
CSBPNS  sparsebinary  Yes  
SuPrEM  sparsebinary  Yes  
BCS  denseGaussian  No  
L1DS  denseGaussian  No 
Vi Experimental Validation
We support our investigation for the proposed algorithm via experimental validation. We performed two types of the experiments for this validation as given below:

SER performance over SNR: To support our claims for support detection based on BHT, we simulate the SER performance, described in (16), as a function of SNR, compared to that of the MAPapproach used in CSBPs,
Here, for SuPrEM, BCS, and L1DS, we obtained the source codes from each author’s webpage. For CSBP, we implemented it using the sampledmessage based approach and the spikeandslab prior, making the algorithm into two different versions: 1) the standard CSBP which is the original by the corresponding papers [18],[19], and 2) CSBPNS which utilizes the noise statistic in the BPiteration. In implementation of the sampledmessage based BP for CSBPs and BHTBP, we used such that the sampling step for the densitymessages is with (18). For the choice of the sensing matrix, BHTBP and CSBPs were sparsebinary matrices with such that the matrix sparsity is , as discussed in Section IIIA, . L1DS and BCS are performed with the standard Gaussian matrix having the equal column energy as the sparsebinary matrix, for fairness, i.e., . In addition, SuPrEM worked with a sensing matrix generated from a lowdensity frame [21].
In these experiments, we examined the two types of signal models defined in Section IIIB, Gaussian signals and signed signals, with , , and . For the case of Gaussian signals, we restricted the magnitude level of the signal elements to where . In addition, we fixed the undersampling ratio to because the focus of this paper is to investigate the behavior of NSR over SNR. Also, we used Monte Carlo method with 200 trials to evaluate the performance in average sense.
Via SER Performance over SNR
Fig.9 compares a SER performance between the proposed algorithm and the MAPapproach used in CSBPs, where the BPprocess embedded in the both algorithms utilizes the noise statistic. We simulated the SER performance as a function of SNR for the both signal model.
ViA1 Result at low SNR
In the low SNR regime, the noiserobust property of the proposed BHTdetector provides 2 dB gain at SER = from the the MAPapproach for the Gaussian model, as we discussed in Section VA. In addition, we note that the case of signed signals shows better performance than the case of Gaussian signals with 4 dB gap at SER = . The reason is that in the signed case the nonzero values is fixed to ; therefore, the sparse patten of the signal are rarely corrupted by noise and can be detected with less difficulty than the Gaussian case which can have nonzero elements .
ViA2 Result at high SNR
As SNR increases, the SER performance of the proposed algorithm shows a waterfall behavior whereas that of the MAPapproach shows an errorfloor limited to SER of , for both signal models. In the MAP cases, more SNR cannot help when the curve manifests such an irreducible SER since the level is saturated by quantization effect caused by the sampledmessage based BP, as discussed in Section VB. In contrast, the SER curve of the proposed algorithm declines with more SNR by overcoming quantization effect, removing the errorfloor of the MAPapproach for both signal models. Therefore, this result indicates that the proposed BHTdetector can provide the perfect support knowledge to the signal value estimator if sufficient SNR level is present.
ViB MSE Comparison over SNR
This section provides a MSE comparison among the algorithms in Table.II and the oracle estimator, over SNR, where MSE denotes the performance of the oracle estimator, given as
(29) 
Fig.11 and Fig.11 display the results for the signed signal and the Gaussian signal case respectively.
ViB1 Result at low SNR
When the measurements are heavily contaminated by noise (below SNR =20 dB), performance of all recovery algorithms is basically degraded. Under such degradation, BHTBP and CSBPNS outperform the others because they are fully employing noise statistic during the recovery process, where difference of BHTBP and CSBPNS (2 dB SNR gap at MSE = ) is from types of the support detectors as we validated in Section VA.
The use of noise statistic remarkably affects the performance of BPbased algorithms as discussed in Section IVA. As a support, the standard CSBP shows dB SNR loss from CSBPNS for the both signal models. For SuPrEM, even if it also includes noise variance as a input parameter, the performance is underperformed since SuPrEM was devised mainly for the sparse signals which have the support set with fixed cardinality.
As SNR increases, BHTBP approaches the oracle performance MSE where the case of signed signal in Fig.11 shows faster approaching than the case of Gaussian signals in Fig.11 with approximately 4 dB gap. we note that this gap according to the signal models is originated from the gap in the SER performance.
ViB2 Result at high SNR
In the high SNR regime, performance of the algorithms are generally improved except SuPrEM, for the both signal models. Among the algorithms, BHTBP shows the most closely approaching performance to MSE. This is because the BHTdetector provides the perfect support knowledge beyond a certain SNR level. Although BCS shows a competitive performance within a certain range of SNR (SNR = 28 40 dB for the signed case, SNR = 30 34 dB for the Gaussian case), its performance is saturated to a certain level as SNR becomes higher.
For CSBPs, the use of the noise statistic in the BPprocess is no longer be effective beyond a certain SNR level. Indeed, the MSE in Fig.11 and Fig.11 commonly shows that the performance of the standard CSBP converges to that of CSBPNS beyond approximately SNR = 45 dB. In addition, after the convergence the performances are saturated to a certain level even with higher SNR. The cause of this saturation is the quantization effect, as discussed in Section VB. Using (VB1), we can calculate the normalized MSE degradation by the quantization under this experimental setup, given as
(30) 
The result in (30) closely lower bounds the MSE of CSBPs at SNR = 50 dB where the signed and Gaussian cases show MSE = and MSE = respectively. These results exactly explain the performance loss of CSBPs by quantization effect, standing out BHTBP to remove the quantization effect using the DDstructure.
Vii Conclusion
The theoretical and empirical research in this paper demonstrated that BHTBP is a powerful algorithm for NSR from a denoising standpoint. In BHTBP, we employed the DDstructure, which consists of support detection and signal value estimation. The support detector is designed by a combination of the sampledmessage based BP and Bayesian hypothesis test (BHT), and the signal value estimation performs in MMSE sense.
We have shown that BHTBP utilizes the posterior knowledge more efficiently than the MAPbased algorithms, over the entire range of SNR. In the low SNR regime, the BHTbased support detector provides noisyrobust detection against measurement noise. In the high SNR regime, the DDstructure eliminates quantization error due to the sampledmessage BP from the signal value estimation. In addition, we applied the knowledge of to the proposed algorithm based on the result of Wainright et al. [34],[35]. Then, we showed that the use of enables BHTBP to remove the errorfloor of the MAPbased algorithms, inducing the performance to approach that of the oracle estimator as SNR increases.
We supported such advantages of BHTBP via experimental validation. Our experiments showed that BHTBP outperforms the other recent NSR algorithms over entire range of SNR, approaching the recovery performance of the oracle estimator as SNR increases.
Appendix I
Fundamentals of Densitymessage Passing
The goal of BP is to iteratively approximate the marginal posterior densities via a message update rule. The message update rule can have various forms according to applications and frameworks. In the NSR problems, the BPmessages are represented as PDFs since the signal is a realvalued. We refer to such messages passing as densitymessages passing. In this appendix, we provide the fundamentals of the densitymessage passing under the BPSM framework.
In the discussion here, we consider a random linear model corresponding to (6), given as
(31) 
where and are random vectors for and respectively, and is a random vector for the Gaussian noise vector . In addition, we assume that the sensing matrix sufficiently sparse such that the corresponding bipartite graph is treelike.
Given , we can represent the marginal posterior density of in the form of using the Bayesian rule, given as
(32)  
(33) 
where we postulate that the measurements associated with , i.e., , are statistically independent given [22][24] using the treelike property of , to move to (33) from (32).
We note each decomposition of the likelihood density, i.e., called measurement density, which is associated with the marginal posteriors of elements in . From (31), a measurement is represented by
(34) 
where we define . Then, by factorizing over , the expression of becomes