Bayesian signal reconstruction for 1-bit compressed sensing

# Bayesian signal reconstruction for 1-bit compressed sensing

Yingying Xu, Yoshiyuki Kabashima and Lenka Zdeborová Department of Computational Intelligence and Systems Science,
Tokyo Institute of Technology, Yokohama 226-8502, Japan
Institut de Physique Théorique, IPhT, CEA Saclay, and URA 2306, CNRS,
F-91191 Gif-sur-Yvette, France
###### Abstract

The 1-bit compressed sensing framework enables the recovery of a sparse vector from the sign information of each entry of its linear transformation. Discarding the amplitude information can significantly reduce the amount of data, which is highly beneficial in practical applications. In this paper, we present a Bayesian approach to signal reconstruction for 1-bit compressed sensing, and analyze its typical performance using statistical mechanics. As a basic setup, we consider the case that the measuring matrix has i.i.d entries, and the measurements are noiseless. Utilizing the replica method, we show that the Bayesian approach enables better reconstruction than the -norm minimization approach, asymptotically saturating the performance obtained when the non-zero entries positions of the signal are known, for signals whose non-zero entries follow zero mean Gaussian distributions. We also test a message passing algorithm for signal reconstruction on the basis of belief propagation. The results of numerical experiments are consistent with those of the theoretical analysis.

## 1 Introduction

Compressed (or compressive) sensing (CS) is currently one of the most popular topics in information science, and has been used for applications in various engineering fields, such as audio and visual electronics, medical imaging devices, and astronomical observations [1]. Typically, smooth signals, such as natural images and communications signals, can be represented by a sparsity-inducing basis, such as a Fourier or wavelet basis [2, 3]. The goal of CS is to reconstruct a high-dimensional signal from its lower-dimensional linear transformation data, utilizing the prior knowledge on the sparsity of the signal [4]. This results in time, cost, and precision advantages.

Mathematically, the CS problem can be expressed as follows: an -dimensional vector is linearly transformed into an -dimensional vector by an -dimensional measurement matrix , as [4]. The observer is free to choose the measurement protocol. Given and , the central problem is how to reconstruct . When , due to the loss of information, the inverse problem has infinitely many solutions. However, when it is guaranteed that has only nonzero entries in some convenient basis (i.e., when the signal is sparse enough) and the measurement matrix is incoherent with that basis, there is a high probability that the inverse problem has a unique and exact solution. Considerable efforts have been made to clarify the condition for the uniqueness and correctness of the solution, and to develop practically feasible signal reconstruction algorithms [5, 6, 7, 8, 9].

Recently, a scheme called 1-bit compressed sensing (1-bit CS) was proposed. In 1-bit CS, the signal is recovered from only the sign data of the linear measurements , where for is a component-wise operation when is a vector [10]. Discarding the amplitude information can significantly reduce the amount of data to be stored and/or transmitted. This is highly advantageous for most real-world applications, particularly those in which the measurement is accompanied by the transmission of digital information [11]. In 1-bit CS, the amplitude information is lost during the measurement stage, making perfect recovery of the original signal impossible. Thus, we generally need more measurements to compensate for the loss of information. The scheme is considered to have practical relevance in situations where perfect recovery is not required, and measurements are inexpensive but precise quantization is expensive. These features are very different from those of general CS.

The most widely used signal reconstruction scheme in CS is -norm minimization, which searches for the vector with the smallest -norm under the constraint . This is based on the work of Candès et al. [4][6], who also suggested the use of a random measurement matrix with independent and identically distributed entries. Because the optimization problem is convex and can be solved using efficient linear programming techniques, these ideas have led to various fast and efficient algorithms. The -reconstruction is now widely used, and is responsible for the surge of interest in CS over the past few years. Against this background, -reconstruction was the first technique attempted in the development of the 1-bit CS problem. In [10], an approximate signal recovery algorithm was proposed based on the minimization of the -norm under the constraint , and its utility was demonstrated by numerical experiments. In [12], the capabilities of this method were analyzed, and a new algorithm based on the cavity method was presented. However, the significance of the -based scheme may be rather weak for 1-bit CS, because the loss of convexity prevents the development of mathematically guaranteed and practically feasible algorithms.

Therefore, we propose another approach based on Bayesian inference for 1-bit CS, focused on the case that each entry of is independently generated from a standard Gaussian distribution, and the output is noisless. Although the Bayesian approach is guaranteed to achieve the optimal performance when the actual signal distribution is given, quantifying the performance gain is a nontrivial task. We accomplish this task utilizing the replica method, which shows that when non-zero entries of the signal follow zero mean Gaussian distributions, the Bayesian optimal inference asymptotically saturates the mean squared error (MSE) performance obtained when the positions of non-zero signal entries are known as . This means that, in such cases, at least in terms of MSEs, the correct prior knowledge of the sparsity asymptotically becomes as informative as the knowledge of the exact positions of the non-zero entries. Unfortunately, performing the exact Bayesian inference is computationally difficult. This difficulty is resolved by employing the generalized approximate message passing technique, which is regarded as a variation of belief propagation or the cavity method [13, 14].

This paper is organized as follows. The next section sets up the 1-bit CS problem. In section 3, we examine the signal recovery performance achieved by the Bayesian scheme utilizing the replica method. In section 4, an approximate signal recovery algorithm based on belief propagation is developed. The utility of this algorithm is tested and its asymptotic performance is analyzed in section 5. The final section summarizes our work.

## 2 Problem setup and Bayesian optimality

Let us suppose that entry of an -dimensional signal (vector) is independently generated from an identical sparse distribution:

 P(x)=(1−ρ)δ(x)+ρ~P(x), (1)

where represents the density of nonzero entries in the signal, and is a distribution function of that has a finite second moment and does not have finite mass at . In 1-bit CS, the measurement is performed as

 \boldmathy=sign(\boldmathΦx0), (2)

where operates in a component-wise manner, and for simplicity we assume that each entry of the measurement matrix is provided as a sample of a Gaussian distribution of zero mean and variance .

We shall adopt the Bayesian approach to reconstruct the signal from the 1-bit measurement assuming that is correctly known in the recovery stage. Let us denote an arbitrary recovery scheme for the measurement as , where we impose a normalization constraint to compensate for the loss of amplitude information by the 1-bit measurement. Equations (1) and (2) indicate that, for a given , the joint distribution of the sparse vector and its 1-bit measurement is

 P(\boldmathx,\boldmathy|Φ)=M∏μ=1Θ(yμ(Φx)μ)×N∏i=1((1−ρ)δ(xi)+ρ~P(xi)), (3)

where for , and vanishes otherwise. This generally provides with the mean square error, which is hereafter handled as the performance measure for the signal reconstruction111 Errors of other types, such as -norm, can also be chosen as the performance measure. The argument shown in this section holds similarly even when such measures are used., as follows:

 MSE(^x(⋅))=∑y∫dxP(\boldmathx,\boldmathy|Φ)∣∣∣^x(y)|^x(y)|−x|x|∣∣∣2. (4)

The following theorem forms the basis of our Bayesian approach.

###### Theorem 1.

is lower bounded as

 MSE(^x(⋅))≥2∑yP(y|Φ)⎛⎝1−∣∣ ∣∣⟨x|x|⟩|y,Φ∣∣ ∣∣⎞⎠, (5)

where

 P(y|Φ) = ∫dxP(x,y|Φ) (6) = ∫dxM∏μ=1Θ(yμ(Φx)μ)×N∏i=1((1−ρ)δ(xi)+ρ~P(xi)) (7)

is the marginal distribution of the 1-bit measurement and generally denotes the posterior mean of an arbitrary function of , , given . The equality holds for the Bayesian optimal signal reconstruction

 ^xBayes(y)=√Nρ∣∣ ∣∣⟨x|x|⟩|y,Φ∣∣ ∣∣−1⟨x|x|⟩|y,Φ. (8)
###### Proof.

Employing the Bayes formula in (4) yields the expression

 MSE(^x(⋅)) = ∑y∫dxP(\boldmathx,\boldmathy|Φ)∣∣∣^x(y)|^x(y)|−x|x|∣∣∣2 (9) = ∑y∫dxP(\boldmathx|\boldmathy,Φ)P(\boldmathy|Φ)(∣∣∣^x(y)|^x(y)|∣∣∣2+∣∣∣x|x|∣∣∣2−2^x(y)⋅x|^x(y)||x|) (10) = 2∑yP(\boldmathy|Φ)⎛⎝1−^x(y)|^x(y)|⋅⟨x|x|⟩|y,Φ⎞⎠. (11)

Inserting the Cauchy–Schwarz inequality

 (12)

into the right-hand side of (11) yields the lower bound of (5), where the equality holds when is parallel to . This, in conjunction with the normalization constraint of , leads to (8). ∎

The above theorem guarantees that the Bayesian approach achieves the best possible performance in terms of MSE. Therefore, we hereafter focus on the reconstruction scheme of (8), quantitatively evaluate its performance, and develop a computationally feasible approximate algorithm.

## 3 Performance assessment by the replica method

In statistical mechanics, the macroscopic behavior of the system is generally analyzed by evaluating the partition function or its negative logarithm, free energy. In our signal reconstruction problem, the marginal likelihood of (7) plays the role of the partition function. However, this still depends on the quenched random variables and . Therefore, we must further average the free energy as to evaluate the typical performance, where denotes the configurational average concerning and .

Unfortunately, directly averaging the logarithm of random variables is, in general, technically difficult. Thus, we resort to the replica method to practically resolve this difficulty [15]. For this, we first evaluate the -th moment of the marginal likelihood for using the formula

 Pn(\boldmathy|Φ)=∫n∏a=1(d\boldmathxaP(% \boldmathxa))n∏a=1M∏μ=1Θ((y)μ(Φxa)μ), (13)

which holds only for . Here, () denotes the -th replicated signal. Averaging (13) with respect to and results in the saddle-point evaluation concerning the macroscopic variables and ().

Although (13) holds only for , the expression obtained by the saddle-point evaluation under a certain assumption concerning the permutation symmetry with respect to the replica indices is obtained as an analytic function of , which is likely to also hold for . Therefore, we next utilize the analytic function to evaluate the average of the logarithm of the partition function as

 ¯f=−limn→0(∂/∂n)N−1log[Pn(y|Φ)]y,Φ. (14)

In particular, under the replica symmetric ansatz, where the dominant saddle-point is assumed to be of the form

 (15)

The above procedure expresses the average free energy density as

 ¯f = −extrω{∫dx0P(x0)∫Dzϕ(√^qz+^mx0;^Q)+12Q^Q+12q^q−m^m (16) +2α∫DtH(m√ρq−m2t)logH(√qQ−qt)}.

Here, , , is a Gaussian measure, denotes the extremization of a function with respect to , , and

 ϕ(√^qz+^mx0;^Q) =log{∫dxP(x)exp(−^Q+^q2x2+(√^qz+^mx0)x)}. (17)

The derivation of is provided in A.

In evaluating the right-hand side of (14), not only gives the marginal likelihood (the partition function), but also the conditional density of for taking the configurational average. This accordance between the partition function and the distribution of the quenched random variables is generally known as the Nishimori condition in spin glass theory [16], for which the replica symmetric ansatz (15) is supported by other schemes than the replica method [17, 18], yielding the identity . This indicates that the true signal, , can be handled on an equal footing with the other replicated signals in the replica computation. As , this higher replica symmetry among the replicated variables allows us to further simplify the replica symmetric ansatz (15) by imposing four extra constraints: , , , and . As a consequence, the extremization condition of (16) is summarized by the non-linear equations

 m = ∫Dt(∫dxxe−^m2x2+√^mtxP(x))2∫dxe−^m2x2+√^mtxP(x) (18) ^m = απ√2π(ρ−m)∫dtexp{−ρ+m2(ρ−m)t2}H(√mρ−mt). (19)

In physical terms, the value of determined by these equations is the typical overlap between the original signal and the posterior mean . The law of large numbers and the self-averaging property guarantee that both and converge to with a probability of unity for typical samples. This indicates that the typical value of the direction cosine between and can be evaluated as . Therefore, the MSE in (4) can be expressed using and as

 MSE=2(1−√mρ). (20)

The symmetry between and the other replicated variables provides with further information-theoretic meanings. Inserting into the definition of gives , which indicates that accords with the entropy density of for typical measurement matrices . The expression guarantees that the conditional entropy of given and , , always vanishes. These indicate that also implies a mutual information density between and . This physically quantifies the optimal information gain (per entry) of that can be extracted from the 1-bit measurement for typical .

## 4 Bayesian optimal signal reconstruction by GAMP

Equation (20) represents the potential performance of the Bayesian optimal signal reconstruction of 1-bit CS. However, in practice, exploiting this performance is a non-trivial task, because performing the exact Bayesian reconstruction (8) is computationally difficult. To resolve this difficulty, we now develop an approximate reconstruction algorithm following the framework of belief propagation (BP). Actually, BP has been successfully employed for standard CS problems with linear measurements, showing excellent performance in terms of both reconstruction accuracy and computational efficiency [19]. To incorporate the non-linearity of the 1-bit measurement, we employ a variant of BP known as generalized approximate message passing (GAMP) [13], which can also be regarded as an approximate Bayesian inference algorithm for perceptron-type networks [14].

In general, the canonical BP equations for the probability measure are expressed in terms of messages, and , which represent probability distribution functions that carry posterior information and output measurement information, respectively. They can be written as

 mμ→i(xi)=1Zμ→i∫∏j≠idxjP(yμ|uμ)∏j≠imj→μ(xj) (21) mi→μ(xi)=1Zi→μP(xi)∏γ≠μmγ→i(xi) (22)

Here, and are normalization factors ensuring that , and we also define . Using (21), the approximation of marginal distributions , which are often termed beliefs, are evaluated as

 mi(xi)=1ZiP(xi)M∏μ=1mμ→i(xi), (23)

where is a normalization factor for . To simplify the notation, we hereafter convert all measurement results to by multiplying each row of the measurement matrix by , giving , and denote the resultant matrix as . In the new notation, .

Next, we introduce means and variances of in the posterior information message distributions as

 ai→μ≡∫dxiximi→μ(xi) (24) νi→μ≡∫dxix2imi→μ(xi)−a2i→μ. (25)

We also define and for notational convenience. Similarly, the means and variances of the beliefs, and , are introduced as and . Note that represents the approximation of the posterior mean . This, in conjunction with a consequence of the law of large numbers , indicates that the Bayesian optimal reconstruction is approximately performed as .

To enhance the computational tractability, let us rewrite the functional equations of (21) and (22) into algebraic equations using sets of and . To do this, we insert the identity

 1 = ∫duμδ(uμ−N∑i=1Φμixi) (26) = ∫duμ12π∫d^uμexp{−i^uμ(uμ−N∑i=1Φμixi)}

into (21), which yields

 mμ→i(xi)=12πZμ→i∫duμP(yμ|uμ)∫d% ^uμexp{−i^uμ(uμ−Φμixi)} ×∏j≠i{∫dxjmj→μ(xj)exp{i^uμΦμjxj}}. (27)

The smallness of allows us to truncate the Taylor series of the last exponential in equation (27) up to the second order of . Integrating for , we obtain the expression

 mμ→i(xi)=12πZμ→i∫duμP(yμ|uμ)∫d% ^uμexp{−i^uμ(uμ−Φμixi)} ×exp{i^uμ(ωμ−Φμiai→μ)−^u2μ2(Vμ−Φ2μiνi→μ)}, (28)

and carrying out the resulting Gaussian intergral of , we obtain

 mμ→i(xi) = 1Zμ→i√2π(Vμ−Φ2μiνi→μ)∫duμP(yμ|uμ) (29) ×exp{−(uμ−ωμ−Φμi(xi−ai→μ))22(Vμ−Φ2μiνi→μ)}.

Since vanishes as while , we can omit in (29). In addition, we replace in with its expectation , utilizing the law of large numbers. This removes the dependence on the index , making all equal to their average

 V≡1NN∑i=1νi. (30)

The smallness of again allows us to truncate the Taylor series of the exponential in (29) up to the second order. Thus, we have a parameterized expression of :

 mμ→i(xi)∝exp{−Aμ→i2x2i+Bμ→ixi}, (31)

where the parameters and are evaluated as

 Aμ→i=(g′out)μΦ2μi (32) Bμ→i=(gout)μΦμi+(g′out)μΦ2μiai→μ (33)

using

 (gout)μ ≡ ∂∂ωμlog(∫duμP(yμ|uμ)exp(−(uμ−ωμ)22V)) (34) (g′out)μ ≡ −∂2∂ω2μlog(∫duμP(yμ|uμ)exp(−(uμ−ωμ)22V)). (35)

The derivation of these is given in B. Equations (32) and (33) act as the algebraic expression of (21). In the sign output channel, inserting into (34) gives and for 1-bit CS as

 (gout)μ=exp(−ω2μ2V)√2πVH(−ωμ√V) (36) (g′out)μ=(gout)2μ+ωμV(gout)μ. (37)

To obtain a similar expression for (22), we substitute the last expression of (31) into (22), which leads to

 mi→μ(xi)=1~Zi→μ[(1−ρ)δ(xi)+ρ~P(xi)]e−(x2i/2)∑γ≠μAγ→i+xi∑γ≠μBγ→i. (38)

This indicates that in (22) can be expressed as a Gaussian distribution with mean and variance . Inserting these into (24) and (25) provides the algebraic expression of (22) as

 ai→μ=fa(1∑γ≠μAγ→i,∑γ≠μBγ→i∑γ≠μAγ→i), (39) νi→μ=fc(1∑γ≠μAγ→i,∑γ≠μBγ→i∑γ≠μAγ→i), (40)

where and stand for the mean and variance of an auxiliary distribution of

 M(x|Σ2,R)=1Z(Σ2,R)[(1−ρ)δ(x)+ρ~P(x)]1√2πΣ2e−(x−R)22Σ2 (41)

where is a normalization constant, respectively. For instance, when is a Gaussian distribution of mean and variance , we have

 fa(Σ2,R)=¯xΣ2+Rσ2(1−ρ)(σ2+Σ2)3/2ρΣexp{−R22Σ2+(R−¯x)22(σ2+Σ2)}+(σ2+Σ2), (42) fc(Σ2,R)={ρ(1−ρ)Σ(σ2+Σ2)−5/2[σ2Σ2(σ2+Σ2)+(¯xΣ2+Rσ2)2] ×exp{−R22Σ2−(R−¯x)22(σ2+Σ2)}+ρ2exp{−(R−¯x)2σ2+Σ2}σ2Σ4(σ2+Σ2)2} ×{(1−ρ)exp{−R22Σ2}+ρΣ√σ2+Σ2exp{−(R−¯x)22(σ2+Σ2)}}−2. (43)

For the signal reconstruction, we need to evaluate the moments of . This can be performed by simply adding back the dependent part to (39) and (40) as

 ai=fa(Σ2i,Ri), (44) νi=fc(Σ2i,Ri), (45)

where , . For large , typically converges to a constant, independent of the index, as . This, in conjunction with (32) and (33), yields

 Σ2=(1N∑μ(g′out)μ)−1, (46) Ri=(∑μ(gout)μΦμi)Σ2+ai. (47)

BP updates messages using (32), (33), (39), and (40) () in each iteration. This requires a computational cost of per iteration, which may limit the practical utility of BP to systems of relatively small size. To enhance the practical utility, let us rewrite the BP equations into those of messages for large , which will result in a significant reduction of computational complexity to per iteration. To do this, we express by applying Taylor’s expansion to (39) around as

 ai→μ = fa(1∑γAγ→i−Aμ→i,∑γBγ→i−Bμ→i∑γAγ→i−Aμ→i) (48) ≃ ai+∂fa(Σ2,Ri)∂Ri(−Bμ→iΣ2)+O(N−1),

where and is approximated as , because of the smallness of . Multiplying this by and summing the resultant expressions over yields

 ωμ=∑iΦμiai−(gout)μV, (49)

where we have used , which can be confirmed by (42) and (43).

Let us assume that and are initially set to certain values. Inserting these into (30) and (49) gives and . Substituting these into equations (36) and (37) yields a set of , which, in conjunction with , offers and through (46) and (47). Inserting these into (44) and (45) offers a new set of . In this way, the iteration of (30), (49) (36), (37) (46), (47) (44), (45) (30), (49) constitutes a closed set of equations to update the sets of and . This is the generic GAMP algorithm given a likelihood function and a prior distribution [13].

We term the entire procedure the Approximate Message Passing for 1-bit Compressed Sensing (1bitAMP) algorithm. The pseudocode of this algorithm is summarized in Figure 1. Three issues are noteworthy. First, for relatively large systems, e.g., , the iterative procedure converges easily in most cases. Nevertheless, since it relies on the law of large numbers, some divergent behavior appears as becomes smaller. Even for such cases, however, employing an appropriate damping factor in conjunction with a normalization of at each update considerably improves the convergence property. Second, the most time-consuming parts of this iteration are the matrix-vector multiplications in (47) and in (49). This indicates that the computational complexity is per iteration. Finally, in equation (47) and in equation (49) correspond to what is known as the Onsager reaction term in the spin glass literature [20, 21]. These terms stabilize the convergence of 1bitAMP, effectively canceling the self-feedback effects.