Detection-Directed Sparse Estimation using Bayesian Hypothesis Test and Belief Propagation

# Detection-Directed Sparse Estimation using Bayesian Hypothesis Test and Belief Propagation

Jaewook Kang,  Heung-No Lee,  and Kiseon Kim,

School of Information and Communication,
Department of Nanobio Materials and Electronics,
Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
###### Abstract

In this paper, we propose a sparse recovery algorithm called detection-directed (DD) sparse estimation using Bayesian hypothesis test (BHT) and belief propagation (BP). In this framework, we consider the use of sparse-binary sensing matrices which has the tree-like property and the sampled-message approach for the implementation of BP.

The key idea behind the proposed algorithm is that the recovery takes DD-estimation structure consisting of two parts: support detection and signal value estimation. BP and BHT perform the support detection, and an MMSE estimator finds the signal values using the detected support set. The proposed algorithm provides noise-robustness against measurement noise beyond the conventional MAP approach, as well as a solution to remove quantization effect by the sampled-message based BP independently of memory size for the message sampling.

We explain how the proposed algorithm can have the aforementioned characteristics via exemplary discussion. In addition, our experiments validate such superiority of the proposed algorithm, compared to recent algorithms under noisy setup. Interestingly the experimental results show that performance of the proposed algorithm approaches that of the oracle estimator as SNR becomes higher.

{keywords}

Noisy sparse recovery, sparse support detection, belief propagation, detection-directed estimation

## I Introduction

Sparse signal recovery in the presence of noise has been intensively investigated in many recent literature because any real-world devices are subject to at least a small amount of noise. We refer to such problems as noisy sparse signal recovery (NSR) problems. Let denote a sparse signal vector whose elements are sparsely non-zeros. Then, the NSR decoder observes a measurement vector , where is a fat sensing matrix ; and we limit our discussion to zero-mean independently identically distributed (i.i.d.) Gaussian noise denoted by .

The NSR problem can then be defined as an -norm minimization problem as similarly done in [1],[3],[7],[8]:

 (1)

where is an error tolerance parameter. In general, the minimization task in (1) is NP hard; therefore, -norm approaches have been developed and discussed as alternatives [1]-[6]. Among the -solvers, the Dantzig selector (L1-DS), proposed by Candes and Tao [5], and the -penalized least-square algorithm usually called LASSO [6] has been devised for the Gaussian noise setup.

Bayesian approaches to NSR have also received attention, where the minimization task in (1) is described as the maximum a posteriori (MAP) problem, and the sparse solution then is sought by imposing sparsifying prior density on as follows [8]-[12]:

 (2)

where is a probability density function (PDF).

The task in (2) is likewise computationally demanding; hence, many relaxed-Bayesian solvers have been devised according to various prior types and applied techniques, such as sparse Bayesian-learning (SBL) [9]-[13],[20], Approximate minimum-mean-squared-error (AMMSE) estimation [14]-[17], and belief propagation (BP) with sparse sensing matrices (BP-SM) [18]-[21]. A summary of the relaxed-Bayesian solvers is in Table I.

We are mainly interested in the BP-SM framework, in this paper, which has been investigated as a low-computational approach to solve linear estimation problems such as . In this framework, the matrix is assumed to be a sparse matrix which has the tree-like property, and the statistical connection of and is described with the bipartite graph model of . The tree-like property ensures that the corresponding graph is asymptotically cycle-free [23],[24],[33]. In addition, it is known that the vector finding problem can be decomposed to a sequence of scalar finding problems in the BP-SM framework, where marginal posterior for each scalar estimation is obtained by an iterative message-passing process. This decomposition have been explained by decoupling principle [22],[24].

For implementation of BP, two approaches have been mainly discussed according to the message representation: 1) the sampled-message based BP [27],[28] where the message is directly sampled from the corresponding PDF with a certain step size such that the message is treated as a vector, and 2) the parametric-message based BP (also called relaxed BP) [23],[29],[30] where the message is described as a function with a certain set of parameters such as mean and variance. If the sampled-message approach is chosen, quantization error will be induced according to the step size. If the parametric-message approach is used, some model approximation must be applied for stable iteration at the expense of approximation error.

As applications of the BP-SM approach, the low-density parity-check coding [31]-[33] and the CDMA multiuser detection problems [22],[25],[26] are well known in addition to the NSR works [18]-[21].

Based on the rigorous theoretical support for the BP-SM framework by Guo et al. [22]-[25] and Tanaka et al. [26], and motivated by recent NSR works by Baron et al. [18],[19], Tan et al. [20], and Akcakaya et al. [21], in this paper, we aim to develop a BP-SM type algorithm as an alternative for solving (2). We refer to the proposed algorithm as Detection-directed sparse estimation via Bayesian hypothesis test and belief propagation (BHT-BP). Differently from the works [18]-[21] solving the MAP problem in (2), the proposed algorithm takes a structure of detection-directed (DD) estimation which consists of a signal support detector and a signal value estimator. The support detector is designed using a combination of the sampled-message based BP and a novel Bayesian hypothesis test (BHT), and the signal value estimator behaves in minimum mean-square error (MMSE) sense. The DD-structure considers the common procedure of first using the measurements at hand to detect the signal support set. This detected support is then used in the model of the sparse signal, and an MMSE estimator finds the signal values as if the detected support set is in fact the correct set.

This DD-methodology was originally investigated by Middleton et al. for estimation of noisy signals [36], and includes wide application areas, such as communication systems [37],[38] and speech processing [39]. For NSR, a line of previous works in the AMMSE group has independently studied similar structures by Schnitter et al., Elad et al., and Lee in [14]-[17],[40].

Then, the proposed algorithm achieves the following properties:

1. Providing robust signal support detection against the measurement noise,

2. Removing quantization effect caused by the use of the sampled-message based BP.

Here, the “oracle estimator” implies the estimator which has the perfect support knowledge.

The combination of BHT and BP enables robust detection of the signal support against measurement noise, which was partially introduced in our previous work [43]. In the support detection, the sampled-message based BP provides marginal posterior for each scalar problem according to the decoupling principle, and the BHT-process then detects the supportive state of each scalar element by measuring the inner products between the marginal posterior and reference functions composed of the prior density. This BHT-detector utilizes the signal posterior information more efficiently than the conventional MAP-approaches [18]-[21]. When the measurements are noisy, density spread occurs in marginal posteriors, leading difficulty in making correct decisions via the MAP-approach. In contrast, the BHT-based support detector compensates such a weakpoint by scanning the entire range of the posterior. Such hypothesis-based support detectors have been discussed in wavelet domain denoising problems [41],[42]; however, they were using thresholding techniques to sort out significant wavelet coefficients, different from our work which uses the inner product.

In addition, we emphasize that the proposed algorithm effectively removes quantization effect coming from the use of the sampled-message based BP. The quantization effect limits performance of the MAP-approach in both the signal value estimation and the support detection. In the proposed algorithm, the use of the DD-structure makes the performance independent of the message sampling in the BP. In addition, we eliminate the quantization error in the support detection by applying the minimum value of the signal on its support, denoted by , to the reference functions. The importance of in the NSR problems was theoretically highlighted by Wainwright et al. in [34],[35] where they showed that the perfect support recovery is very difficult even with arbitrarily large SNR if is very small. Hence, we regulate in our signal model, reflecting the knowledge of in the BHT-detection. To the best of our knowledge, we have not seen the reflection of in practical algorithms in the sparse recovery literature, and surprisingly, this reflection enables performance of the proposed algorithm to approach that of the oracle estimator as SNR increases.

The computational complexity of the proposed algorithm is (if the cardinality of the support is fixed to ), which includes an additional cost owing to the BHT-process to the cost of BP . Nevertheless, the cost of the proposed algorithm is lower in practice since the proposed algorithm can catch the signal support with smaller memory size than NSR algorithms only using the sampled-message based BP.

The remainder of the paper is organized as follows. We first briefly review a line of the BP-SM algorithms, and then make a remark for the relation between the BP-SM solvers and approximate message passing (AMP)-type algorithms in Section II. In Section III, we define our problem formulation. Section IV provides precise description for the proposed algorithm and Section V provides exemplary discussion to explain and support strengths of the proposed algorithm. In Section VI, we provide experimental validation to show the advantage of the proposed algorithm and compare to the recent algorithms. Finally, we conclude the paper in Section VII.

## Ii Related Works

In this section, we provide a brief introduction to the previous works in the BP-SM algorithms [18]-[21]. The common feature in these algorithms is the use of BP in conjunction with sparse sensing matrices, to approximate the signal posterior, where a sparsifying prior is imposed according to the signal model. In addition, we make a remark to distinguish the BP-SM works from AMP-type algorithms.

### Ii-a BP-SM Algorithms

Baron et al. for the first time proposed the use of BP to the sparse recovery problem with sparse sensing matrices [18],[19]. The algorithm is called CS-BP. Signal model of CS-BP is a compressible signal which has a small number of large elements and a large number of near-zero elements. The author associated this signal model with two-state mixture Gaussian prior, given as

 fX(x)=N∏i=1qN(xi;0,σ2X1)+(1−q)N(xi;0,σ2X0), (3)

where denotes the probability that an element has the large value, and . Therefore, the prior is fully parameterized with , and . CS-BP performs MAP or MMSE estimation using the signal posterior obtained from BP, where the authors applied both the sampled-message and the parametric-message approaches for the BP-implementation. The recovery performance is not very good when measurement noise is severe since the CS-BP was basically designed under noiseless setup.

Tan et al. proposed an algorithm under the BP-SM setup called BP-SBL [20]. This work is based on the SBL-framework [9]-[12] which uses two-layer hierarchical-Gaussian prior models given as

 fX(x|a,b)=N∏i=1∫∞0N(xi;0,τ−1i)fΓ(τi|ai,bi)dτi, (4)

where is the hyper-prior following Gamma distribution with its parameters . At each iteration, the parameters of the prior are estimated using expectation maximization (EM). Therefore, the posterior for the signal estimation is iteratively approximated from the prior. The authors applied the BP-SM setup to reduce the computational cost of EM. BP-SBL is an algorithm using parametric-message based BP where every message is approximated to be a Gaussian PDF which can be fully described by its mean and variance. In addition, BP-SBL is input parameter-free, which means this algorithm is adaptive to any signal models and noise level since EM estimates the parameters associated with any given models during the iteration. However, such parameter estimation will not be accurate when noise is severe; therefore, denoising ability of BP-SBL is limited when measurements are highly corrupted.

Most recently, Akcakaya et al. proposed SuPrEM under a framework similar to BP-SBL which uses a combination of EM and the parametric-message based BP [21]. The main difference from BP-SBL is the use of a specific type of hyper-prior called Jeffreys’ prior . The use of Jeffreys’ prior reduces the number of input parameters while sparsifying the signal. Therefore, the prior is given as

 fX(x)=N∏i=1∫∞0N(xi;0,τi)fJ(τi)dτi. (5)

The sensing matrix used in SuPrEM is restricted to a sparse-binary matrix which has fixed column and row weights, called low-density frames. They are reminiscent of the regular LDPC codes [31]. In addition, the signal model is confined to -sparse signals consisting of nonzeros and zeros since SuPrEM includes a spasifying step which choose the largest elements at each end of iteration. The noise statistic is an optional input to the algorithm. Naturally, if the noise information is available, SuPrEM will provide an improved recovery performance.

### Ii-B Relation to AMPs

It is interesting to associate the algorithms in the BP-SM group to the class of AMP algorithms, which was originally invented by Donoho et al. in [44],[45], analyzed and refined by Bayati et al. [46] and Rangan [47], because both types of the algorithms utilize BP. Performance of the AMPs coincides with that of the -solvers under the large limit assumption () in the noiseless case [44],[46]. Furthermore, in a recent result by Donoho et al. showed that performance of the AMPs is also equivalent to LASSO [6] under the noisy setting with appropriate parameter calibration [48].

The AMPs work well when sufficient density of the sensing matrix is maintained as since the AMPs were developed on the basis of an approximation of BP-messages by the central limit theorem [45]. Therefore, if a sparse matrix is applied to the AMPs, the message approximation does not work, then the decoder will fail in the recovery. We validate this claim by our own experimental result shown in Fig.1, where we simulated the recovery success rate of the standard AMP [44] without additive noise as a function of the matrix sparsity defined as percentage of nonzero entries in the sensing matrix. The recovery of the AMP are not successful when the sensing matrix is sparse ( typically less than 10 matrix sparsity) regardless of the number of measurements , as shown in an example of Fig.1. Namely, the AMPs recover sparse signals well with dense matrices at the expense of low computation , but it does not enjoy the benefits from the use of sparse matrices.

## Iii Problem Formulation

The goal of the proposed algorithm is to recover the object signal given the knowledge of and noisy measurements as following

 z=Φx0+n, (6)

where we consider the use of a fat sparse-binary sensing matrix which has very low matrix sparsity (typically less than 10 matrix sparsity) and the tree-like property. We regulate the matrix sparsity using the fixed column weight since this regulation enables the matrix to span the measurement space with basis having equal energy. For the noise distribution, we assume i.i.d. zero-mean Gaussian density such that the noise vector is drawn from .

In the remainder of this section, we introduce some necessary concepts, such as signal model and graph representation of , for our framework, and then discuss our approach to solve this NSR problem.

### Iii-a Graph Representation of Φ

Bipartite graphs effectively represent linear systems with sparse matrices such as the matrix . Let denote the set of indices corresponding to the elements of the signal vector, , and denote the set of indices corresponding to the elements of the measurement vector, . In addition, we define the set of edges connecting and as where is the -th element of . Then, a bipartite graph fully describes the neighboring relation in the linear system. For convenience, we define the neighbor set of and as and , respectively. Note that under our setup on .

### Iii-B Signal Model

Let denote a sparse vector which is a deterministic realization of a random vector . Here, we assume that the elements of are i.i.d., and each belongs to the support set, denoted by , with a rate . To indicate the supportive state of , we define a state vector where each is defined as

 Si=S(Xi)={1,ifi∈supp(X)0,else for all i∈V. (7)

Therfore, the signal sparsity is given as . For the signal values on the support, we will deal with two cases, which are

1. Gaussian signals:,

2. Signed signals :,

where denote the delta function peaked at . Namely, in the first case, the values on the support are distributed according to an i.i.d. zero-mean Gaussian density with variance , and in the second case the magnitude of the values is fixed to and the sign follows Bernoulli distribution with probability . In addition, for the Gaussian signal, we regulate the minimum value on the support as a key parameter of the signal model, such that

 xmin≤|x0,i| for all i∈supp(x0), (8)

Indeed, Wainwright et al. emphasized the importance of in the NSR problems [34],[35]. They stated that if is very small, success of the noisy sparse recovery is very difficult even with arbitrarily large SNR. Namely, if the signal contains at least such a very small element belonging to the support, the recovery can be failed regardless of SNR level since the small element can be buried even by negligible noise.

In the Bayesian framework, the prior density takes a role of pursuing the sparsest solution from infinitely many solutions of the underdetermined system. Therefore, the proper choice of the sparsifying prior according to the signal model is highly important. According to our signal model, we consider spike-and-slab densities as our prior. The prior for an element is given as

 fX(x)=qfX(x|S=1)+(1−q)fX(x|S=0)=qN(x;0,σ2X1)+(1−q)δ0. (9)

This type of the priors has been widely used in modeling of strictly sparse signals [13],[14],[53] since the prior can simply characterize the signals with and , as well as easily associate the signal model with the other models such as hidden Markov chain and wavelet model. In addition, we note that the spike-and-slab prior is a particular case of two-state Gaussian mixture in (3) as .

### Iii-C Solution Approach

The approach of the DD-estimation is motivated by the naive MMSE sparse estimation which was also reported in [14]-[16], as follows:

 ˆx0,MMSE=∑s∈{0,1}NE[X|S,Z=z]⋅Pr{S|Z=z}, (10)

We note that (10) is a weighted sum of the signal value estimation over possible supports given noisy measurements . Therefore, we separate the signal value estimation and the support searching in (10) as the first step for its relaxation. Then, the signal value estimation can be represented as

 ˆx0,BHT-BP=E[X|S=ˆs,Z=z], (11)

by assuming an ideal support detector for . The calculation of (11) is simple because it is well known as the convectional linear MMSE estimation, expressed as

 ˆx0,ˆs=(1σ2X1I+1σ2NΦTˆsΦˆs)−11σ2NΦTˆsz, (12)

where as a submatrix of that contains only the columns corresponding to the detected support , and as a vector that includes only the nonzero elements from . In addition, we know that this MMSE estimation is optimal when the signal values on the detected support are Gaussian distributed [49].

For the part of the support search, we start from an exhaustive detection, described as

 ˆs=argmaxs∈{0,1}NPr{S=s|Z=z}. (13)

To solve (13) efficiently, we decompose the state vector detection to scalar state detections based on the decoupling principle [22],[24], and apply binary scalar MAP-detection for each scalar state. Then, the binary scalar detection can be optimally achieved by a hypothesis test [50], given as

 Pr{Si=1|Z=z}Pr{Si=0|Z=z}H1≷H01 for all i∈V, (14)

where and are two possible hypotheses. This support detection approach simplifies the exhaustive search in (13) to hypothesis tests. Here, we note that the support detection in (14) performs with an independent decision rule from the value estimation in (12). Therefore, this DD-approach to (10) is reasonable since the Bayesian risks of the support detection and the signal values estimation can be independently minimized [36]. The overall flow of the DD-estimation is depicted in Fig.2.

In order to justify our solution approach, we will provide experimental validation compared to the recent NSR algorithms, as a function of SNR defined as

 SNR :=10log10E∥Φx0∥22Mσ2N dB . (15)

Performance of the support detection is evaluated pairwise state error rate (SER), defined as

 SER:=Pr{s(x0,i)≠ˆsi|i∈V}, (16)

and the overall performance of the signal recovery is measured in terms of normalized mean square error (MSE), defined as

 MSE:=∥ˆx0−x0∥22∥x0,s∥22. (17)

## Iv Proposed Algorithm

The proposed algorithm straightforwardly provides the estimate via the MMSE estimation in (12) once given the detected support set from (13). In addition, we state that the detected support set can be elementwisely obtained from (14), on the basis of the decoupling principle. Therefore, efficient design of the scalar state detector in (14) is very important in this work. In this section, we explain the implementation of the scalar state detector based on the hypotesis test given in (14), using the combination of the sampled-message based BP and BHT.

### Iv-a Sampled-message based Belief Propagation

BP provides the marginal posterior for the hypothesis test in (14) for each element. Since the signal is real-valued, each BP-message takes the form of a PDF, and the BP-iteration consists of density-message passing. We provide a brief summary of density-message passing in Appendix I. To implement the BP passing density-messages, we consider the sampled-message approach which has been discussed by Sudderth et al. [27], and Noorshams et al. [28]. For the sparse recovery, Baron et al. applied the approach in [18],[19]. The main strength of the sampled-message approach is adaptivity to various signal models. In addition, it shows faster convergence than the parametric-message approach [23],[29],[30] if the sampling step size is sufficiently fine. The reason is that the sampled-message approach does not use any model approximation or reduction for the message representation during the iteration. Based on (33) and (Appendix I Fundamentals of Density-message Passing) from Appendix I, we will provide a practical message update rule of the sampled-message based BP for the proposed algorithm.

In our implementation, we set the sampling step on the basis of the three sigma-rule [52], given as

 Ts=2⋅3σX1Nd (18)

where is the number of samples for a density-message. Hence, we define the sampling process of the PDFs as

 Samp[f(x)]:=f(mTs−3σX1)=f[m]form=0∼Nd−1, (19)

where denotes the sampling process. Hence, the density-messages are treated as vectors with size in this approach.

Let denote a sampled density-message from to , called the signal message;, and is the message from to , called the measurement message. The signal message includes information on the marginal posterior , being obtained from (33) by simply replacing the measurement densities, , with the measurement messages of the previous iteration except the intrinsic information. That is,

 ali→j[m]=η⎡⎣fX[m]×∏k∈NV(i)∖{j}bl−1k→i[m]⎤⎦, (20)

, where is the normalization function to make . Similarly, the measurement message includes information on the measurement density , being obtained from the expression of (Appendix I Fundamentals of Density-message Passing) by replacing the associated marginal posteriors, , with the signal messages, that is,

 blj→i[m]=fN[m;zj,σ2N]=Samp[N(n;zj,σ2N)]⊗(⨂k∈NC(j)∖{i}alk→j[−m]), (21)

where is the operator for the linear convolution of PDFs. In addition, under the Gaussian noise assumption, we know in (21) based on (Appendix I Fundamentals of Density-message Passing). Here, we note that the measurement message calculation utilizes the noise statistics distinctively from that of the standard CS-BP [see (7) in [19]]. This improvement remarkably enhances the recovery performance in the low SNR regime even though the noise statistic loses its effect with sufficiently high SNR.

The convolution operations in (21) can be efficiently computed by using the Fast fourier transform (FFT). Accordingly, we express for the measurement message calculation as

 blj→i[m]=F−1[F[fN[m;zj,σ2N]](∏kF[alk→j[−m]])] (22)

where denotes the FFT operation. Therefore, for efficient use of FFT, the sampling step should be appropriately chosen such that is two’s power. In fact, the use of the FFT brings a small calculation gap since the FFT-based calculation performs a circular convolution that produces output having a heavy tail. The heaviness increases as the column weights of increases. However, the difference is can be ignored, especially when the messages are bell-shaped densities.

Finally, an sampled approximation of the marginal posterior is obtained after a certain number of iteration via the update rule in (20),(21), as follows:

 Samp[fXi(x|Z=z)]≈fXi[m|Z=z]=η[fX[m]×∏j∈NV(i)bl∗j→i[m]]. (23)

### Iv-B Bayesian Hypothesis Test for Support Detection

In order to perform the hypothesis test in (14), the decoder needs to calculate the probability ratio . By factorizing over , the hypothesis test becomes

 Pr{Si=1|Z=z}Pr{Si=0|Z=z} =∫Pr{Si=1|Z=z,Xi}fXi(x|Z=z)dx∫Pr{Si=0|Z=z,Xi}fXi(x|Z=z)dxH1≷H01. (24)

In practice, we replace the marginal posterior with the sampled marginal posterior from (23) for discrete signal processing, which is provided in Algorithm 1. However, we use notations of the continuous domain in the description of this hypothesis process. Given from BP, the given measurements do not provide any additional information on ; hence, holds true. Using such a fact and the Bayesian rule, the test in (IV-B) is rewritten as

 ∫r1(x)fXi(x|Z=z)dx∫r0(x)fXi(x|Z=z)dxH1≷H0Pr{S=0}Pr{S=1}, (25)

where , and are reference functions consisting of the prior knowledge defined as

 r0(x):=fX(x|S=0)fX(x),r1(x):=fX(x|S=1)fX(x). (26)

The process to calculate the probability ratio in (25) is described as a block diagram in Fig.4. This process has a similar structure to matched filtering in communication systems, where the detector determines supportive state of by measuring inner products between the marginal posterior and reference functions. In addition, we emphasize that this BHT-based detector is only compatible with the sampled-message based BP because the BHT-process requires full information on the signal posterior which cannot be provided through the parametric-message based BP.

### Iv-C Computational Complexity

In the sampled-message approach, the density-messages in BP are vectors with size . Therefore, the decoder requires flops to calculate a signal message and flops for a measurement message per iteration, where denotes the column weight of the sparse sensing matrix . In addition, the cost of the FFT-based convolution is if we assume the row weight is using average sense. Hence, the per-iteration cost of the BP-process is flops. For the hypothesis test, the decoder requires flops to generate the probability ratio. The cost for the hypothesis test is much smaller than that of BP; therefore, it is ignored. For the MMSE estimator to find signal values, the cost can be reduced upto flops by applying QR decomposition [51]. Thus, the total complexity of the proposed algorithm is flops and it is further simplified to since and are fixed. In addition, it is known that the recovery convergence via BP is achieved with iterations [19],[32]. Therefore, we finally obtain for the complexity of BHT-BP. We note here that the complexity of BHT-BP is much relaxed from that of the naive MMSE sparse estimator , given in (10).

## V Exemplary Discussion for Proposed Algorithm

One can argue the DD-structure of the proposed algorithm is an abuse since the marginal posteriors from the BP already provides full information on the signal. Namely, this means that in (23) contains the perfect knowledge for detection and estimation of . Yes it is true, but our claim is that the MAP-based algorithms [18],[19] which solve the problem in (2) are not utilizing the full knowledge of the marginal posterior.

In this section, we show two weakpoints of the MAP-approach which finds each scalar estimate only using the peak location of the marginal posterior, through examples. We then discuss how the proposed algorithm can remedy such problematic behavior of the MAP, and utilize the posterior knowledge more efficiently. We claim that the proposed algorithm has strength in two aspects as given below:

1. Robust support detection against additive noise: The BHT-detector in the proposed algorithm correctly catches the supportive state given even under severe additive noise.

2. Removing quantization effect caused by the sampled-message based BP: The quantization effect degrades performance of the MAP-based algorithms in both the signal value estimation and the support detection. In the proposed algorithm, the DD-structure removes the quantization effect from the signal value estimation, and the BHT-detector reduces the chance of misdetection of the supportive state given by applying the knowledge of to the reference functions.

### V-a Robust Support Detection against Additive Noise

The additive noise spreads probability mass in the marginal posterior , leading to difficulty in the supportive state detection via the MAP-approach given . For example, Fig.6 shows marginal posterior obtained from severe noisy measurements (SNR=10 dB) via BP-iteration, where the signal value is originally , hence . We note that the posterior is approximately composed of a zero-spike and a slab-density with certain probability weights. In Fig.6, it is shown that the center of the slab-density is moving to the true value over the iterations, but after 5 iterations the mass at , i.e., , is still smaller than that of the zero-spike. The reason is that the noise effect spreads the slab-density over near values, making the peak at to be blunt. We call this spreading effect as density spread.

Fig.6 more clearly describes the density spread in the marginal posterior according different SNR levels. When SNR level is sufficiently high (see cases more than SNR=20 dB in Fig.6), the MAP-approach can successfully detect the supportive state since probability mass is concentrated on the center of the slab-density such that . However, when SNR is low (see the line of SNR=10 dB in Fig.6), the MAP-approach will fail in the detection because the zero-spike becomes the highest peak in the marginal posterior such that due to the density spread. The density spread does not cause errors given since in this case the center of the slab-density stays at during the BP-iteration, regardless of the noise level.

In contrast, the BHT-detector in the proposed algorithm, decides the supportive state by considering the density spread effect. In (25), the inner products between and measure portions of the marginal posterior corresponding to and respectively. Since the inner products are associated with the entire range of the -axis rather than specific point-mass, the BHT-detector can decide the supportive state by incorporating all spread mass due to noise. Therefore, the BHT-detector can success in the detection even in the SNR=10 dB case of Fig.6. This example supports that BHT-detector has ability to detect the signal support more robustly against noise than the MAP-approach.

### V-B Removing Quantization Effect caused by the Sampled-message Based BP

#### V-B1 In signal value estimation

Under the use of the sampled-message based BP, quantization error is inevitable in the signal value estimation. When we apply rounding for the quantization, the MSE error is uniformly distributed with zero-mean and variance according to the sampling step size , given as

 E∥∥QTS[Xi∈supp(X)]−Xi∈supp(X)∥∥22=T2s12 =(2⋅3σX1Nd)2/12, (27)

where denotes the quantization function with . Therefore, MSE performance of the MAP-based algorithms with the sampling cannot exceed the limit given by (V-B1) even if the level of additive noise is extremely low. To relax this limit, we can increase memory size or confine the range of signal value by decreasing . However, such methods are impractical and restrict flexibility of signal models in the recovery. The DD-structure, described in (13) and (11), removes this weakpoint since the signal values are evaluated using an MMSE estimator in (12) independently of once the detected support is given. Furthermore, the probability ratio for the BHT-detection can be generated from sampled marginal densities with small .

#### V-B2 In support detection

The quantization effect also can cause detection error of the supportive state given , limiting performance of the MAP-approach in the high SNR regime. In order to explain such cases, we provide an example of the measurement message-passing with respect to as shown in Fig.7. In this example, we try to find the value of given , , , , and , under noiseless setup. Note that in practical operation of the algorithm, the value of each element corresponds to the peak location in the -axis of each density-message. In the sampled-message based BP, the message sampling causes quantization of the values such that we have , , , with the step size where denotes the quantization function. In this case, we can simply infer that this measurement message passes a corrupted value which is not matched with . If most of the measurement messages to has such corruption, the marginal posterior of will have the peak at an erroneous location, leading to detection error based on the MAP-approach. The same event can occur given . However, the effect is not significant because the case of does not bring about the support detection error. In addition, this type of errors is remarkable particularly in the high SNR regime since such corruption is covered by additive noise when SNR is low. Accordingly, the support detection performance of the MAP-approach shows an error-floor in the high SNR regime.

Here, we aim to show how the proposed BHT-detector utilizes the prior knowledge of to remove the error-floor of the MAP approach. As we mentioned in Section III-A, we consider the signal model which has as a key parameter, according to the result of Wainright et al. [34],[35]. The Wainright’ result revealed that the regulation of is imperative for perfect support recovery under noisy setup. This means that we can have additional prior knowledge to the sparse recovery. From (8), the knowledge of tells us that there exist no nonzero elements which have value within . The BHT-detector reflects to calibrate the reference functions. Rather than the functions given in (26), we use its calibrated version given as

 r0(x,xmin) := fX(x|S=0;xmin)fX(x;xmin), r1(x,xmin) := fX(x|S=1)fX(x;xmin), (28)

to improve the performance for the Gaussian signal model, where

 fX(x|S=0;xmin)=N(x;0,cxmin),fX(x;xmin)=qfX(x|S=1)+(1−q)fX(x|S=0;xmin),

with a constant . For signed signals, we simply use .

This calibration depresses the weight of and puts more weight to over , as shown in Fig.8. This implies that the detector excludes elements within from the signal support set. Therefore, we can eliminate the misdetection cases given such as the example in Fig.7, removing the error-floor in the high SNR regime effectively.

## Vi Experimental Validation

We support our investigation for the proposed algorithm via experimental validation. We performed two types of the experiments for this validation as given below:

1. SER performance over SNR: To support our claims for support detection based on BHT, we simulate the SER performance, described in (16), as a function of SNR, compared to that of the MAP-approach used in CS-BPs,

2. MSE comparison over SNR: To compare the recovery performance of the recent NSR algorithms listed in Table.II and the oracle estimator, in the presence of noise, we examine MSE performance, described in (17), as a function of SNR.

Here, for SuPrEM, BCS, and L1-DS, we obtained the source codes from each author’s webpage. For CS-BP, we implemented it using the sampled-message based approach and the spike-and-slab prior, making the algorithm into two different versions: 1) the standard CS-BP which is the original by the corresponding papers [18],[19], and 2) CS-BP-NS which utilizes the noise statistic in the BP-iteration. In implementation of the sampled-message based BP for CS-BPs and BHT-BP, we used such that the sampling step for the density-messages is with (18). For the choice of the sensing matrix, BHT-BP and CS-BPs were sparse-binary matrices with such that the matrix sparsity is , as discussed in Section III-A, . L1-DS and BCS are performed with the standard Gaussian matrix having the equal column energy as the sparse-binary matrix, for fairness, i.e., . In addition, SuPrEM worked with a sensing matrix generated from a low-density frame [21].

In these experiments, we examined the two types of signal models defined in Section III-B, Gaussian signals and signed signals, with , , and . For the case of Gaussian signals, we restricted the magnitude level of the signal elements to where . In addition, we fixed the undersampling ratio to because the focus of this paper is to investigate the behavior of NSR over SNR. Also, we used Monte Carlo method with 200 trials to evaluate the performance in average sense.

### Vi-a SER Performance over SNR

Fig.9 compares a SER performance between the proposed algorithm and the MAP-approach used in CS-BPs, where the BP-process embedded in the both algorithms utilizes the noise statistic. We simulated the SER performance as a function of SNR for the both signal model.

#### Vi-A1 Result at low SNR

In the low SNR regime, the noise-robust property of the proposed BHT-detector provides 2 dB gain at SER = from the the MAP-approach for the Gaussian model, as we discussed in Section V-A. In addition, we note that the case of signed signals shows better performance than the case of Gaussian signals with 4 dB gap at SER = . The reason is that in the signed case the nonzero values is fixed to ; therefore, the sparse patten of the signal are rarely corrupted by noise and can be detected with less difficulty than the Gaussian case which can have nonzero elements .

#### Vi-A2 Result at high SNR

As SNR increases, the SER performance of the proposed algorithm shows a waterfall behavior whereas that of the MAP-approach shows an error-floor limited to SER of , for both signal models. In the MAP cases, more SNR cannot help when the curve manifests such an irreducible SER since the level is saturated by quantization effect caused by the sampled-message based BP, as discussed in Section V-B. In contrast, the SER curve of the proposed algorithm declines with more SNR by overcoming quantization effect, removing the error-floor of the MAP-approach for both signal models. Therefore, this result indicates that the proposed BHT-detector can provide the perfect support knowledge to the signal value estimator if sufficient SNR level is present.

### Vi-B MSE Comparison over SNR

This section provides a MSE comparison among the algorithms in Table.II and the oracle estimator, over SNR, where MSE denotes the performance of the oracle estimator, given as

 MSE∗:=Tr⎡⎣(1σ2X1I+1σ2NΦTsΦs)−1⎤⎦∥x0,s∥22. (29)

Fig.11 and Fig.11 display the results for the signed signal and the Gaussian signal case respectively.

#### Vi-B1 Result at low SNR

When the measurements are heavily contaminated by noise (below SNR =20 dB), performance of all recovery algorithms is basically degraded. Under such degradation, BHT-BP and CS-BP-NS outperform the others because they are fully employing noise statistic during the recovery process, where difference of BHT-BP and CS-BP-NS (2 dB SNR gap at MSE = ) is from types of the support detectors as we validated in Section V-A.

The use of noise statistic remarkably affects the performance of BP-based algorithms as discussed in Section IV-A. As a support, the standard CS-BP shows dB SNR loss from CS-BP-NS for the both signal models. For SuPrEM, even if it also includes noise variance as a input parameter, the performance is underperformed since SuPrEM was devised mainly for the -sparse signals which have the support set with fixed cardinality.

As SNR increases, BHT-BP approaches the oracle performance MSE where the case of signed signal in Fig.11 shows faster approaching than the case of Gaussian signals in Fig.11 with approximately 4 dB gap. we note that this gap according to the signal models is originated from the gap in the SER performance.

#### Vi-B2 Result at high SNR

In the high SNR regime, performance of the algorithms are generally improved except SuPrEM, for the both signal models. Among the algorithms, BHT-BP shows the most closely approaching performance to MSE. This is because the BHT-detector provides the perfect support knowledge beyond a certain SNR level. Although BCS shows a competitive performance within a certain range of SNR (SNR = 28 40 dB for the signed case, SNR = 30 34 dB for the Gaussian case), its performance is saturated to a certain level as SNR becomes higher.

For CS-BPs, the use of the noise statistic in the BP-process is no longer be effective beyond a certain SNR level. Indeed, the MSE in Fig.11 and Fig.11 commonly shows that the performance of the standard CS-BP converges to that of CS-BP-NS beyond approximately SNR = 45 dB. In addition, after the convergence the performances are saturated to a certain level even with higher SNR. The cause of this saturation is the quantization effect, as discussed in Section V-B. Using (V-B1), we can calculate the normalized MSE degradation by the quantization under this experimental setup, given as

 T2s/12E∥∥Xi∈supp(X)∥∥22=4.5786×10−5, (30)

The result in (30) closely lower bounds the MSE of CS-BPs at SNR = 50 dB where the signed and Gaussian cases show MSE = and MSE = respectively. These results exactly explain the performance loss of CS-BPs by quantization effect, standing out BHT-BP to remove the quantization effect using the DD-structure.

## Vii Conclusion

The theoretical and empirical research in this paper demonstrated that BHT-BP is a powerful algorithm for NSR from a denoising standpoint. In BHT-BP, we employed the DD-structure, which consists of support detection and signal value estimation. The support detector is designed by a combination of the sampled-message based BP and Bayesian hypothesis test (BHT), and the signal value estimation performs in MMSE sense.

We have shown that BHT-BP utilizes the posterior knowledge more efficiently than the MAP-based algorithms, over the entire range of SNR. In the low SNR regime, the BHT-based support detector provides noisy-robust detection against measurement noise. In the high SNR regime, the DD-structure eliminates quantization error due to the sampled-message BP from the signal value estimation. In addition, we applied the knowledge of to the proposed algorithm based on the result of Wainright et al. [34],[35]. Then, we showed that the use of enables BHT-BP to remove the error-floor of the MAP-based algorithms, inducing the performance to approach that of the oracle estimator as SNR increases.

We supported such advantages of BHT-BP via experimental validation. Our experiments showed that BHT-BP outperforms the other recent NSR algorithms over entire range of SNR, approaching the recovery performance of the oracle estimator as SNR increases.

## Appendix I Fundamentals of Density-message Passing

The goal of BP is to iteratively approximate the marginal posterior densities via a message update rule. The message update rule can have various forms according to applications and frameworks. In the NSR problems, the BP-messages are represented as PDFs since the signal is a real-valued. We refer to such messages passing as density-messages passing. In this appendix, we provide the fundamentals of the density-message passing under the BP-SM framework.

In the discussion here, we consider a random linear model corresponding to (6), given as

 Z=ΦX+N, (31)

where and are random vectors for and respectively, and is a random vector for the Gaussian noise vector . In addition, we assume that the sensing matrix sufficiently sparse such that the corresponding bipartite graph is tree-like.

Given , we can represent the marginal posterior density of in the form of using the Bayesian rule, given as

 fXi(x|Z=z) =fX(x)×fZ(z|Xi=xi)fZ(z) (32) ∝fX(x)×∏j∈NV(i)fZj(z|Xi=xi), (33)

where we postulate that the measurements associated with , i.e., , are statistically independent given [22]-[24] using the tree-like property of , to move to (33) from (32).

We note each decomposition of the likelihood density, i.e., called measurement density, which is associated with the marginal posteriors of elements in . From (31), a measurement is represented by

 Zj=Xi+∑k∈NC(j)∖{i}Xk+Nj=Xi+Yj, (34)

where we define . Then, by factorizing over , the expression of becomes

 fZj(z|Xi=xi) =fZj(x+y|Xi=xi) =∫YjfZj(x+y|Xi=xi,Yj)fYj(y|Xi=xi)=fYj(y)dy