A Generalized Ldpc Framework for Robust and Sublinear Compressive Sensing
Abstract
Compressive sensing aims to recover a highdimensional sparse signal from a relatively small number of measurements. In this paper, a novel design of the measurement matrix is proposed. The design is inspired by the construction of generalized lowdensity paritycheck codes, where the capacityachieving pointtopoint codes serve as subcodes to robustly estimate the signal support. In the case that each entry of the dimensional sparse signal lies in a known discrete alphabet, the proposed scheme requires only measurements and arithmetic operations. In the case of arbitrary, possibly continuous alphabet, an error propagation graph is proposed to characterize the residual estimation error. With measurements and computational complexity, the reconstruction error can be made arbitrarily small with high probability.
I Introduction
Compressive sensing aims to recover a highdimensional sparse signal from a relatively small number of measurements [1, 2]. There are two different designs of the measurement matrices: random construction and deterministic construction. Convex optimization approaches have been first proposed to recover the noiseless signal with random measurements [3]. Greedy algorithms that involve lower complexity have been proposed [4, 5, 6]. However, most of the algorithms that are based on random measurement matrix design inevitably involve a complexity of .
Inspired by the error control code designs, deterministic structured measurement matrices have been proposed to reduce the computational complexity to (near) linear time [7, 8]. In practice, when the signal dimension is many thousands or millions, even linear time complexity often becomes prohibitive. In response, sublinear compressive sensing based on second order ReedMuller codes has been proposed, but the reconstruction error was not characterized [9, 10].
Recently, compressive sensing schemes with a novel design of measurement matrix and sublinear recovery algorithms have been developed, requiring measurements and arithmetic operations under the noiseless setting [11, 12]. In those schemes, the measurements are split into multiple groups and each group is a subvector, which are linear combinations of the same set of signal components. Treating the measurement groups as bins, the design matrix basically hashes the signals to different measurement bins, which is similar to the bipartite graph induced by lowdensity paritycheck (LDPC) code structure. In [11] and [12], the measurement vector in each bin is designed to carry the signal support information by leveraging the discrete Fourier transform (DFT) matrix. The design has been extended to the noisy case, involving measurements and computational complexity, with the limitation that the signal entries must lie in a known discrete alphabet.
In this paper, we propose a generalized LDPC code inspired compressive sensing scheme to further reduce the the number of measurements required and computation complexity. The scheme adopts the sublinear recovery algorithm framework in [11]. For the measurement matrix design, the scheme also adopts the LDPC structure to disperse the signal into measurement bins. The main difference is that each measurement bin is a subcode, where some recently developed capacityachieving codes are utilized to encode the signal support.
Specifically, this paper makes the following contributions. First, our scheme is the first to achieve nearly order optimal noisy measurements and computational complexity for the case of known discrete alphabet. Second, the previous design based on DFT matrix is susceptible to quantization errors, while the proposed measurement matrix consists of only entries, which are easier and more robust in practice. Third, we propose an error propagation graph with error message passing rules to capture the error propagation for the case of arbitrary signals with unknown alphabet. Analysis shows that with measurements and complexity the signal estimation error can be made arbitrarily small as increase. The proposed design and error propagation graph have potential applications in sparse Fourier transform [13] and WalshHadamard transform with arbitrary signal alphabet [14].
Ii System Model
Suppose is a sparse vector. The problem is to recover from the dimensional () measurement vector
(1) 
where is the measurement matrix and is the noise vector with each entry being independently and identically distributed (i.i.d.) Gaussian variables with zero mean and variance .
Throughout the paper, we use bold capital letter and bold normal letter to denote a matrix and a vector, respectively. Given a matrix , denotes the entry located at the th row and th column, and denotes the th column. Given , is the bit binary representation of with and mapped to and , respectively. For example, , . Let if and otherwise.
Iii Measurement Matrix Design
The LDPC inspired design of the measurement matrix is proposed in [11, 12]. In particular, the measurement matrix is constructed as
(2) 
where , and the operator is defined as
(3) 
The number of measurements is thus . For example,
(4) 
In fact, is inspired by the paritycheck matrix of LDPC codes. The relationship between the signal entries and the measurements can be represented by a bipartite graph. In the bipartite graph, there are left nodes with corresponding to the th left nodes, and right nodes, which are also referred to as bins. The th left node is connected with the th bin if . The measurement vector is thus grouped into subvectors as , where is the th bin value given by
(5) 
Fig. 1 illustrates the bipartite representation between signals and measurements. In this paper, we construct from the ensemble of left regular bipartite graph , where every signal is connected to measurement bins uniformly at random.
The recovery algorithms adopts the framework proposed in [15]. The recovery algorithm calls for a robust bin detection, which can 1) identify if a measurement bin is connected to no nonzero signal component (zeroton), to a single nonzero component (singleton) or to multiple nonzero components (multiton); 2) robustly estimate the signal index and value from singleton bins. It can be proved that by some proper , the recovery algorithm can correctly estimate with high probability if we have a robust bin detection. The key challenge is how to design to achieve robust bin detection.
In previous works [12, 11, 15], is constructed based on the DFT matrix. The signal index information is embedded in the phase difference between the entries of . In this paper, we propose a new design of , which only consists of entries and is more robust to noise and quantization errors.
We motivate the design using a simplified setting. Assume 1) a measurement bin is known to be a singleton, 2) there is no noise, and 3) the sign of the signal that is hashed to bin is known. The question is how can we design to detect the signal index and its value? Let be the bit binary representation of . If , then the signal index can be easily recovered based on the signs of each entry in . A robust design of is to overcome the challenges posed by the three assumptions. First, we let to be an allone vector such that the signs of can be estimated. Second, is designed to be coded bits of for robust estimation of under the noisy setting. The subvector length is , where is the code rate of the applied lowcomplexity errorcontrol code [16]. Third, we let be a binary vector with each entry generated according to i.i.d. Rademacher distribution for singleton verification.
In all, the th column of consists of three subvectors:
(6) 
where , , and . Accordingly, the measurement vector for bin can be split into three subvectors:
(7) 
In our design, we choose , and for signals with known discrete alphabet and arbitrary alphabet, respectively.
In the bipartite graph, each measurement bin can be regarded as a super check node where a subcode is further used to encode the index information of the signals. The structure is similar to that of generalized LDPC codes [17]. The wellestablished lowcomplexity capacityapproaching pointtopoint codes can serve as subcodes to enhance the robust design.
Iv Recovery Algorithm Design
We adopt the recovery algorithm framework proposed in [11]. The algorithm is implemented in an iterative “peeling” process. In every iteration, a singleton bin is identified. The index and value of the signal that is hashed to the singleton bin are estimated. The contribution of the estimated signal to the other connected bins are cancelled out (peeled off).
The main difference of our work lies in the signal support estimation from a singleton bin, referred to as the singleton test, which is described in Algorithm 1. In particular, for some that is hashed to a singleton bin , encodes the support information . Suppose the signal sign estimation is correct, i.e., . Without noise, and thus is exactly . Under the noisy setting, some of the signs are flipped due to noise, which can be regarded as transmission over the binary symmetric channel (BSC). With lowcomplexity codes used as subcodes, can be recovered by inputting to the corresponding decoder with complexity [16].
The overall recovery algorithm is described as follows.
First, run the singleton test on every bin using Algorithm 1. Let denote the set of estimated signal indices. Remove the declared singleton bins.
Then, repeat the following until :

Select arbitrary and remove from set .

For every remaining bin with , perform the following:

Subtract the signal node value from bin : .

Run the singleton test on the bin using Algorithm 1. If it is a singleton, add the output index to and remove bin .

Algorithm 1 has a computational complexity of , where . Performing the singleton test on all bin takes complexity . In each subsequent iteration, we perform Algorithm 1 only on every (remaining) connected bin of a recovered signal component. Since the leftnode degree is constant, each iteration involves computational complexity of . It will be proved that the algorithm terminates after iterations with high probability. The computational complexity of all the iterations involved is thus . With the choice of and , the total complexity is and for signals with discrete alphabet and arbitrary alphabet, respectively.
V Main Results and Proof
Theorem 1
Given any , there exists such that for every and every dimensional sparse signal whose entries take their values in a known discrete alphabet, the proposed scheme achieves . The number of measurements required is . The computational complexity is arithmetic operations.
Theorem 2
Given any , there exists such that for every and every dimensional sparse signal with for every , the proposed scheme achieves and for every . The number of measurements required is . The computational complexity is arithmetic operations.
We focus on the proof of Theorem 2 due to space limitations. Theorem 1 follows as a special case. Unlike signals from discrete alphabet, the signal estimates have residual errors, which propagate to later iterations due to the peeling process. In this paper, we propose an error propagation graph to keep track of the accumulated errors.
An error propagation graph for is a subgraph induced by the recovery algorithm, which contains the signal nodes that are estimated in the previous iterations and have paths to . Fig. 2 illustrates the the error propagation graph for .
Define the estimation error of as
(8) 
Let be the measurement bin used to recover the signal index . Define the point error of as
(9) 
Then is a Gaussian variable with zero mean and variance . We will keep track of using the error propagation graph.
Let denote the signal indices that are recovered in the th iteration. Consider the estimation of , . The measurement vector of bin and the residual estimation error are given by
(10)  
(11) 
Consider the estimation for , . With the peeling of , , the updated measurement vector of and the estimation error become
(12)  
(13) 
where for every realization of .
The estimation error can be calculated recursively according to some message passing rules over the graph. In particular, let be the estimation error propagated from signal node and be the error vector propagated from the measurement bin . The errors can be calculated according to the following rules:
(14)  
(15) 
where denotes the indices of the measurement bins (signal nodes) incoming to signal node (measurement bin) .
By induction and the error message passing rules (14) and (15), the error propagation effect is characterized by the following lemma.
Lemma 1
The estimation error of , , is calculated as
(16) 
where be the connected subgraph of the bipartite graph containing , is the number of paths from to in , and is some coefficient depending on both and the path satisfying .
Fig. 2 illustrates an example. The number of paths from to is , with the corresponding coefficients being and . The number of paths from to is , with the coefficients being .
We further bound the errors by leveraging results on random hypergraph. The bipartite graph induced by corresponds to a hypergraph where the left nodes and right nodes represent hyperedges and vertices. The hyperedge is incident on vertex if . Then the random bipartite graph induces a uniform random hypergraph.
Lemma 2
[18] Suppose is some constant large enough, then with probability , contains only trees or unicyclic components, and the largest component contains signal nodes.
Let denote the event that the bipartite graph satisfies the condition as described in Lemma 2 with . We first bound the detection error probability conditioned on . Suppose holds, then , otherwise the component is not unicyclic. Moreover the largest component contains signal nodes. Therefore, conditioned on , is Gaussian distributed with zero mean and the variance of can be upper bounded as
(17) 
Lemma 3
Conditioned on that holds, given any and , , the recovery algorithm can correctly identity the signal support with probability with some .

The support detection may be subject to zeroton, multiton and singleton detection errors. The error probability of detecting zerotons and multitons can be upper bounded by following similar steps in missing[15] and is omitted due to space limitation. We focus on the singleton detection.
Suppose the measurement vector of a singleton is given by
(18) where the entries in are i.i.d. Gaussian variable with zero mean and variance . Then the error probability of sign estimation is calculated as
(19) (20) where is the function for standard normal distribution.
Suppose the signs of is correctly detected and we want to detect by recovering the signs of . By compensating the signs of as , the random transformation
(21) is equivalent to transmission over a BSC with crossover probability less than [14].
From the recovery process, is the noise plus residual estimate errors given by . According to (17), for some and a large enough , the variance of is dominated by that of . The entries of have a variance bounded by some constant. Therefore, given that for some , the worstcase SNR for every singleton estimation is lower bounded by some constant. The error probability of sign estimation (19) is with some . Applying an error control code of length with a low enough code rate , can be decoded correctly with probability .
Note that if holds and the singleton, multiton and zeroton bins are correctly estimated, the peeling decoder terminates by recovering every nonzero signal entry [19]. Since there are at most iterations and every iteration involves at most singletons, the error probability can be upper bounded by using the union bound. Moreover, conditioned on that is correctly estimated, the probability that the singleton verification is not passed is equivalent to a zeroton detection error, which can be upper bounded by . The lemma is hence established.
Support recovery fails only if either does not hold or that a bin detection error occurs conditioned on holds. By Lemma 2 and Lemma 3, the overall error probability of support recovery is , vanishing as increases.
The error probability can be upper bounded as
(22)  
(23) 
where (23) follows because is Gaussian variable with zero mean and variance upper bounded by (17) conditioned on and every realization of . By Lemma 2, the error probability (23) is smaller than any with a large enough and some . Hence, Theorem 2 is established.
Vi Simulation
Throughout the simulation, we assume that the nonzero signal amplitude is taken uniformly at random from and define SNR = , which is the worstcase SNR. The signal dimension is . The number of measurement bins is chosen to be . We adopt a regular random LDPC code with rate as subcode to encode the signal support information, and thus . We let and . Fig. 3 and Fig. 4 plot the error probability of support recovery and relative mean square error, respectively. The relative mean square error is only calculated and averaged over the signals with their support correctly estimated. We run 200 simulations for each SNR. In the simulation, for every sparsity level , the errorcontrol code and nonzero signal entries are generated once and fixed.
Although analysis shows that is sufficient to guarantee vanishing error probability, choosing also gives a good performance. The error probability of support recovery and relative mean square error decreases as SNR increases. In order to achieve more reliable signal recovery, we can adopt a more sophisticated errorcontrol code or a code with lower code rate.
References
 [1] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [2] E. Candes and T. Tao, “Nearoptimal signal recovery from random projections: Universal encoding strategies?” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406–5425, 2006.
 [3] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005.
 [4] J. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
 [5] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.
 [6] D. L. Donoho, Y. Tsaig, I. Drori, and J.L. Starck, “Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 1094–1121, 2012.
 [7] W. Xu and B. Hassibi, “Efficient compressive sensing with deterministic guarantees using expander graphs,” in IEEE Information Theory Workshop, 2007, pp. 414–419.
 [8] P. Indyk and M. Ružić, “Nearoptimal sparse recovery in the l1 norm,” in Annual IEEE Symposium on Foundations of Computer Science, 2008, pp. 199–207.
 [9] L. Applebaum, S. D. Howard, S. Searle, and R. Calderbank, “Chirp sensing codes: Deterministic compressed sensing measurements for fast recovery,” Applied and Computational Harmonic Analysis, vol. 26, no. 2, pp. 283–290, 2009.
 [10] R. Calderbank, S. Howard, and S. Jafarpour, “Construction of a large class of deterministic sensing matrices that satisfy a statistical isometry property,” IEEE Journal of Selected Topics in Signal Processing,, vol. 4, no. 2, pp. 358–374, 2010.
 [11] S. Pawar and K. Ramchandran, “A hybrid DFTLDPC framework for fast, efficient and robust compressive sensing,” in Proc. Annual Allerton Conference on Commun., Control, and Computing, Monticello, IL, 2012, pp. 1943–1950.
 [12] M. Bakshi, S. Jaggi, S. Cai, and M. Chen, “Shofa: Robust compressive sensing with orderoptimal complexity, measurements, and bits,” arXiv preprint arXiv:1207.2335, 2012.
 [13] S. Pawar and K. Ramchandran, “Computing a ksparse nlength discrete Fourier transform using at most 4k samples and o (k log k) complexity,” in Proc. IEEE Int. Symp. Information Theory, Istanbul, 2013, pp. 464–468.
 [14] X. Chen and D. Guo, “Robust sublinear complexity Walsh Hadamard transform with arbitrary sparse support,” in Proc. IEEE Int. Symp. Information Theory, Hong Kong, June 2015.
 [15] X. Li, S. Pawar, and K. Ramchandran, “Sublinear time support recovery for compressed sensing using sparsegraph codes,” arXiv preprint arXiv:1412.7646, 2014.
 [16] A. Barg and G. Zémor, “Error exponents of expander codes under linearcomplexity decoding,” SIAM Journal on Discrete Mathematics, vol. 17, no. 3, pp. 426–445, 2004.
 [17] N. Miladinovic and M. P. Fossorier, “Generalized LDPC codes and generalized stopping sets,” IEEE Transactions on Communications, vol. 56, no. 2, pp. 201–212, 2008.
 [18] M. Karoński and T. Łuczak, “The phase transition in a random hypergraph,” Journal of Computational and Applied Mathematics, vol. 142, no. 1, pp. 125–135, 2002.
 [19] E. Price, “Efficient sketches for the set query problem,” in Proceedings of the TwentySecond annual ACMSIAM Symposium on Discrete Algorithms, 2011, pp. 41–56.