Summary Based Structures with Improved Sublinear Recovery for Compressed Sensing
Abstract
We introduce a new class of measurement matrices for compressed sensing, using low order summaries over binary sequences of a given length. We prove recovery guarantees for three reconstruction algorithms using the proposed measurements, including minimization and two combinatorial methods. In particular, one of the algorithms recovers sparse vectors of length in sublinear time , and requires at most measurements. The empirical oversampling constant of the algorithm is significantly better than existing sublinear recovery algorithms such as Chaining Pursuit and Sudocodes. In particular, for and , the oversampling factor is between 3 to 8. We provide preliminary insight into how the proposed constructions, and the fast recovery scheme can be used in a number of practical applications such as market basket analysis, and real time compressed sensing implementation.
1 Introduction
Despite significant advances in the field of Compressed Sensing (CS), certain aspects of CS remain relatively immature. Thus far, CS has been viewed primarily as a data acquisition technique [1]. As a result, the applicability of CS to other computational applications has not enjoyed commensurate investigation. In addition, to the best of the authors’ knowledge, there is no unified CS system that has been implemented for practical realtime applications. A few recent works have addressed the former by applying sparse reconstruction ideas to certain inference problems including learning and adaptive computational schemes ( e.g. [2, 3, 4]). Several other works have addressed the latter by designing hardware, which exploits the fact that CS enables the monitoring of a given bandwidth at a much lower sampling rate than traditional Nyquistbased methods (see e.g., [6]). The motivating factor behind these works is that for a given maximum sampling rate (limited by the poor power consumption scaling with sampling rate) achievable by digitizing hardware, it is possible to either acquire signals over a much greater bandwidth, or with much less power for a given bandwidth. Recent work, inspired by this line of thought, has led to the development of hardware CS encoders (see e.g. [7, 8, 9, 10]). However, none of the previous works address the problem of realtime signal decoding, which is a critical requirement in many applications.
Although variant by the nature of the problem and physical constraints, perhaps two fundamental issues in the practical implementations
of CS are the following: 1) construction of measurement matrices that are provably good, certifiable and inexpensive to implement (either
as real time sketches or as prebuilt constructions), 2) Time efficient and robust recovery algorithms. Our aim is to introduce and provide an analysis of a sparse reconstruction system that addresses the aforementioned problems and allude to the extensions of CS in the less explored directions.
We introduce a new class of measurement matrices for sparse recovery that are deterministic, structured and highly scalable. The constructions are based on labeling the ambient state space with binary sequences of length , and summing up entries of that share the same pattern (up to a fixed length) at various locations in their labeling sequences. The class of corresponding matrices are RIPless matrices that are congruent with the Basis Pursuit algorithms, which are standard techniques for sparse reconstruction [11]. In addition, we provide two efficient combinatorial algorithms along with theoretical guarantees for the proposed measurement structures. The proposed algorithms are sublinear in the ambient dimension of the signal. In particular, we propose a summarized support index inference (SSII) algorithm with a running time of that requires measurements to recover sparse vectors, and has a empirical required oversampling factor significantly better than existing sublinear methods. Due to the particular structure of the measurements and decoding algorithms, we believe that the proposed compression/decompression framework is amenable to real time CS implementation, and offers significant simplification in the design of an existing CS encoder/decoder. Furthermore, observations collected based on the proposed constructions appear as low order statistics or “summaries” in a number of practical situations in which a similar intrinsic labeling of
the state space exists. This includes certain inference and discrete optimization problems such as market basket (commodity bundle) analysis, advertising, online recommendation systems, genomic feature selection, social networks, etc.
It should be acknowledged that there are various results on sublinear sparse recovery in the literature, including [12, 13, 14, 15, 16]. Unlike most previous works, the constructions of this paper offer sublinear storage requirement and are compatible with the practical scenarios that we consider. The recovery time of the algorithm is sublinear in the signal dimension, and the empirical recovery bounds are significantly better than the existing sublinear algorithms, such as Chaining Pursuit and Sudocodes, especially for small and moderate sparsity levels and very large signal dimensions.
2 Proposed Measurement Structures
We define a class of structured binary measurement matrices, based on the following definition
Definition 1.
Let , and be integers. A summary is a pair , where is a subset of of size , and is a binary sequence of length . A summary codebook is a collection of summaries, where ’s are distinct subsets, and is the length binary representation of the integer . If , is called the complete summary codebook.
To a given summary codebook , we associate a binary matrix of size where , and , in the following way. For every , there is a row in that satisfies:
(1) 
where is the bit binary representation of , and is the subsequence of the binary sequence , indexed by the entries of the set ^{1}^{1}1Note that these structured matrices can be defined for any finite alphabets other than the binary field.. In other words, has a 1 in those columns whose binary labeling conform to . Every column of has exactly ones, and each row has exactly ones. To clarify this definition, we consider the following example illustrated in Figure 1, in which and . Suppose that a summary is given with and . All possible binary sequences of length 4 that match are listed in Figure 1. To find the corresponding indices of the listed labels, we should convert them to decimal values and increase by 1, which gives and . The row of a measurement matrix that includes this summary is a vector of length that has a in those indices, as displayed.
The defined matrices are very well motivated by some practical problems. In general, in a situation where the given signal space retains an intrinsic structured labeling similar to the one described, such constructions prove very useful. In particular, we consider the following two motivational examples.
Resource Optimization. Assume that a set of features (or parameters) is available, and assume that certain accumulations or collections of features form ”lucrative” profiles (structures). In particular, a lucrative profile can be a subset of features which is representable by a binary sequence , where determines the presence of the ’th feature. A practical assumption is that lucrative profiles are limited and weighted, meaning that their profitabilities are variable. The vector formed by the respective profits of all feature collections is thus an approximately sparse vector. Furthermore, the available information about the profitability of profiles is often derived from a pool of observations or real world implementations, and are mostly given in the form of summaries. More formally, what can be learned is the average profitability of a certain configuration of only features. For example, it can be assessed that when and are present and is absent, regardless of all other features, the average profit is some . The collection of summaries form an observation vector , that is related to through a set of linear equations , where has a form similar to those obtained by summary codebooks. This setting arises in many practical applications such as market basket (commodity bundle) analysis, where the objective is to configure the structure of a market that complies the best with the needs and the behaviors of the customers. To that end, it is essential to understand which market configurations are winning and what packages of features (e.g. commodities, pricing options, interest rates, etc.) should be offered to customers, and with what percentages . Furthermore, the customers’ behavioral information is often given in terms of high level summaries, e.g. in the lines of the statement “people who buy A and B, are likely to buy C”.
Compressed Sensing Hardware. There are a few factors that severely limit the scalability of the existing CS hardware designs to larger problem dimensions. One of these factors is the generation of the measurement matrix . In the simplest existing design, is typically a pseudorandom matrix generated with a linear feedback shift register (LFSR) [8, 10]. The timing synchronization of a large number of measurements as well as the planar nature of physical implementations is very limiting. Using a more structured matrix may allow considerable simplification and reduction of the required hardware easing some of the previously mentioned limitations. The measurement structure defined in this work is potentially highly amenable to the implementation of practical CS hardware, due to the following two reasons. 1) There exist simple sublinear recovery algorithms for the proposed matrices, other than the linear programming method. This will be elaborated in the proceeding sections. 2) Due to the highly structured design, the integration matrix can be implemented using one single LFSR seed, and a number of asynchronous digital circuits. Due to the lack of space and the irrelevance of the context, we avoid a detailed description of the latter, and postpone this to a future work.
3 Proposed Recovery Algorithms
For the measurement matrices described in the previous section we propose three reconstruction algorithms and provide success guarantees. These algorithms include the Basis Pursuit algorithm (a.k.a minimization), as well as two fast algorithms that can recover sparse vectors from a sublinear number of measurements and in a sublinear amount of time. The detailed specifications will be given in the sequel. For the sake of the theoretical arguments that appear in the remainder of this section, we need to define the following notions:
Definition 2.
Let and be integers with . We define , and to be the largest integer such that when binary sequences of length are selected at random, the following happens respectively:

With probability 1, there exists a summary that appears in exactly one of the sequences.

With probability at least , for each of the binary sequences, at least a fraction of its summaries are unique.

With probability at least , for each of the binary sequences, at least a fraction of its summaries that include the first bit are unique.
It is important to note that the recovery guarantees of the presented combinatorial algorithms are only valid for a class of vectors in which no two disjoint subsets of nonzero coefficients have the exact same sum. For simplicity, we refer to these vectors as “distinguishable” signals. This is not the case for Basis Pursuit.
3.1 Basis Pursuit
The success of the basis pursuit algorithm for recovering sparse signals is certified by several conditions. Two major classes of conditions are the Restricted Isometry Property (RIP) and the null space property [11, 5]. It is provable that the measurement structures defined in this paper do not maintain the RIP properties, due to the existence of columns with fairly large coherence. This however does not discard the suitability of these constructions for minimization, since RIP is known to provide a sufficient condition (see e.g., [17]). Instead, we prove that certain null space conditions hold for the considered class of matrices, and therefore provide a sparse signal recovery bound for minimization. We restrict our attention to nonnegative vectors in this case. The reconstruction method is the following program with the additional nonnegativity constraint.
(2)  
The performance of the above program was studied for 01 matrices in [18]. In particular, it was shown that a nonnegative vector can be recovered from (2), if and only if it is the unique nonnegative solution of the linear system of equations, which is stated formally in the following lemma.
Lemma 3.1 (from [18]).
Suppose is a matrix with constant column sum, and is a nonnegative vector. is the unique solution to (2), if and only if is the unique nonnegative solution to .
Using the above lemma, we can evaluate the performance of the Basis Pursuit algorithm when used with the presented measurement matrices. The following theorem is fundamental to this analysis.
Theorem 3.1 (Strong Recovery for Basis Pursuit).
Let be an integer, and let correspond to a complete summary codebook. Then every sparse nonnegative vector is perfectly recovered by (2).
Proof.
Let and let be a nonnegative sparse vector. Also, let the bit binary labels associated to the support set of be . We show that if corresponds to a complete summary codebook, then is the unique nonnegative solution to . Therefore, by Lemma 3.1 it follows that can be recovered via (2). We prove this by contradiction. Suppose that there is another nonnegative vector with . Due to the nonnegativity assumption, we may assume that the support sets of and do not overlap. Let the bit labels of the support set of be the binary sequences . From the definition of , we can assert that there is a summary that appears in exactly one of the sequences . Let us assume without the loss of generality that the first bits of are unique, and that is the all zero binary sequence. Therefore, there are at least measurements in that are equal to the entry of that corresponds to the label . These measurements are those that correspond to the summaries
(3) 
Since, , there must be a nonzero entries in with labeling indices that satisfy the above summaries. In particular, without loss of generality assume that the first bits of are all zero. However, since the support sets of and do not overlap, is different from in at least one bit, say for some . Now consider the summary , which represent the set of all binary sequences that are zero on the first bits and one on the th bit. Because corresponds a complete codebook, there is a row of that is based on , and moreover the corresponding value of is nonzero, because conforms to . On the other hand this cannot be true when considering the equations , because it requires that one of the labels conform to , which cannot be (recall that is the all zero codeword, whereas includes a 1). The existence of such a label contradicts the assumption that is the only label whose first bits are all zero.
The complexity of Basis Pursuit is generally polynomial in the ambient dimension of the signal. Specifically, one can implement (2) in operations, without exploiting any of the available structural information of the measurement matrix. Although there are some advantages to Basis Pursuit, such as robustness to noise, its complexity is impractical for problems where scales exponentially. In these situations, sublinear time algorithms are preferred.
3.2 Summarized Support Index Inference
The first sublinear algorithm discussed in this subsection is called the summarized support index inference (SSII). The algorithm is based on iteratively inferring the nonzero entries of the signal based on one of the distinct values of and its various occurrences. The method is described below.
At the beginning of the algorithm, distinct nonzero values of the observations are identified, and are separated from the zero values. Due to the distinguishability assumption on , each distinct nonzero value of is a sum of a unique subset of nonzeros of , and can thus be used to infer the position of at least one nonzero entry. The index of a nonzero entry of is determined by its unique labeling, which is a binary sequence of length . Therefore, the algorithm attempts to infer all relevant binary sequences. Suppose that a nonzero value of is chosen that has occurrences, say without loss of generality, . Also, let the summary which corresponds to the th row of be denoted by (see equation (1)). The algorithm explores the possibility that are all equal to a single nonzero entry of , by trying to build a binary sequence that conforms to the summaries , i.e., by setting:
(4) 
If there is a conflict in the set of equations in (4), then that value of is discarded in the current iteration, and the search is continued for other values. Otherwise, two events may occur. If (4) uniquely identifies , then one nonzero position and value of has been determined. It is subtracted, measurements are updated and the algorithm is continued. However, there might be a case where only bits of are determined by (4). In this case, we use the zero values of to infer the remaining bits in the following way. Let the set of known and unknown bits of be denoted by and , respectively. We consider the summaries which contribute to , and among all, consider all distinct subsets . If there is a subset such that among all the measurements corresponding to where does not conflict with , exactly one of them are nonzero, say , then the bits of over can be uniquely determined by setting . This procedure is repeated until either is completely identified, or all possibilities are exhausted. A high level description of the presented method is given in Alg. 1, for which we can assert the following weak and strong recovery guarantees.
Theorem 3.2 (Strong Recovery for SSII).
Let be an integer, and let correspond to a complete summary codebook. Then every sparse distinguishable vector is perfectly recovered by Alg. 1.
Proof.
Let and let be a sparse vector. Also, let the bit binary labels associated to the support set of be . We show that at least one of these labels can be inferred from one of the nonzero values of the vector , by solving (4). From the definition, there is a summary that appears in exactly one of the labels . Without loss of generality, let’s assume that the first bits of are unique, and that is the all zero binary sequence. Also, let the nonzero value of in the position given by be . Now consider all summaries for which the value of the corresponding entry in is equal to . Let these summaries be denoted by , where is the number of occurrences of in . We show that there is a unique binary sequence that conforms to all of these summaries. In other words, we prove that equation (4) has a unique solution which is equal to .
Due to the distinguishability assumption on the nonzero values of , The set should include the following summaries:
(5) 
Where indicates the all zero bit sequence of length . Clearly the only length binary sequence that conforms to all of the above summaries is the all zero binary sequence, namely . Thus, we only need to show that for all other summaries . This also follows immediately from the distinguishability assumption on , and the fact that every instance of in the vector is only the result of the nonzero value in labeled by (i.e. it is not the direct sum of another subset of the entries of ).
Theorem 3.3 (Weak Recovery for SSII).
Let be an integer, and let correspond to a random summary codebook. Then, a random sparse distinguishable vector is recovered by Alg. 1 with probability at least .
Proof.
We define an event which is stronger that the success event of Algorithm 1, namely a sufficient condition for the success of SSII. Let the bit binary labels associated to the support set of be , and let be the summary codebook based on which is constructed. The sufficient condition for success of SSII is that for every , and every bit , there exists a summary in such that and in addition and . In other words, for each of the labels corresponding to the support of and each of the bits, there is a summary in the codebook that includes the considered bit and only conforms to that particular label.
We find a lower bound on the probability of the complementary event by using union bounds. Note that there are distinct subsets in the summaries of the codebook , which are chosen randomly. We assume that the subsets are chosen independently at random, and allow repetition. In case of repetition, the repeated subset is excluded, which only makes things worst. Consider a label and the first bit. The probability that a randomly chosen subset of bits of length includes the first bit is . Furthermore, let us say that at least a fraction of the summaries that conform to and include the first bit, does not conform to the remaining ’s (i.e. only appear in ). Then, when a random subset is chosen, with probability at least , the following happens:
(6) 
Therefore, the probability that the above event does not happen for any of randomly chosen subsets is at most . From the definition of and the fact that , we know that with probability at least , , and therefore, the probability that (6) does not happen for any set in the codebook is at most . If we union bound the probability of such event for all possible labels and all possible bits, we conclude that the probability of the undesirable event is bounded by:
(7) 
Which concludes the proof of the theorem.
The explicit recovery bounds given by above theorem are calculated in Section 4. Alg. 1 can be implemented very efficiently, with operations, which is sublinear in the dimension of the problem. The computational advantage is owed to the most part to the structural definition of the measurement matrices which facilitates sublinear search over the column space of the matrix. In addition, we do not require an exponential memory for decoding, since the information about and the current inferred indices of the unknown vector at each stage can be retained by only storing the corresponding binary indices.
3.3 Mix and Match Algorithm
We describe a third recovery method, which is on the lines of the algorithm proposed in [2] with slight modifications. The algorithm consists of two subroutines: a value identification phase in which the nonzero values of the unknown signal is determined, and a second phase for identifying the support set of . The method is based on measurements given by , where only is used for the first phase, and and are used in the second phase. For details of this method please refer to [2]. We analyze this algorithm for the proposed measurement structures of this paper, which is different from the analysis of [2].
Theorem 3.4 (Weak Recovery for M&M).
Let be an integer, and where and correspond to a random summary codebook, and a complete summary codebook, respectively. Then, a random nonnegative sparse distinguishable vector is recovered by Alg. 2 with probability at least .
Proof.
Let the bit binary labels associated to the support set of be , and let be the summary codebook based on which is constructed. It can be shown that the value identification subroutine of Alg. 2 identifies all nonzero values of the nonnegative vector correctly, if in the observation vector , all every nonzero value of appear at least once. We find the probability that this condition holds, when the subsets of the random coodbook are chosen at random. For every , we define the following set of subsets of :
(8) 
If a subset in the codebook belongs to , then the nonzero entry that corresponds to the label appears in the observation vector . Therefore, we are interested in finding the probability that the set of subsets of has a nonempty overlap with all ’s. Let us assume that for some , the following holds:
(9) 
When a subset is chosen at random, the probability that it belongs to is at least . Therefore the probability that does not overlap with the set of all subsets appearing in is at most . Using a union bound over all , we conclude that the probability that this undesirable event happens for at least one of the sets is at most , which means that the probability of success is at least . However, we know from the definition of , and the fact that , that with probability at least , we have . Therefore the overall probability of success is at least:
(10) 
The complexity of Alg. 2 is , and thus explodes when grows.
4 Recovery Bounds
We derive recovery bounds for (2) and Alg.’s 1 and 2 by obtaining explicit bounds on the terms of definition 2 and replacing them in the recovery guarantees of Section 3, namely Theorems 3.13.4. The proof of the following lemma is based on some combinatorial techniques and Chernoff concentration bounds.
Lemma 4.1.
Let and be integers and . Also, let , and . Then,

.

.

.
By exploiting the expressions of the above lemma in Theorems 3.13.4, we obtain the following bounds for different methods:
Basis Pursuit. If a complete summary codebook is used to build , then the number of measurements is , and every sparsity is guaranteed to be recovered. When put together (recall that ), an upper bound on the the required number of measurements for recoverable sparsity is given by:
(11) 
In particular, for small values of , the above bound is comparable with the bound of minimization for random Gaussian matrices [19].
SSII Algorithm. We focus on the weak bound, namely the one obtained from Theorem 3.3. The general strategy is to take the values of and according to Lemma 4.1 with , and choose and in such a way that firstly, is bounded away from zero, and secondly, the probability of recovery failure approaches zero as . Taking for some , a few basic algebraic steps lead to the following:
(12) 
It follows that the above expression approaches zero if . Furthermore, can be chosen arbitrarily close to zero. Therefore, it follows that an upper bound on the required number of measurements for successful recovery with high probability is given by:
(13) 
M&M Algorithm. We take , and according to Lemma 4.1, it follows that:
(14) 
Which asymptotically vanishes if . Recall that the number of measurements in this case is determined by the matrix described in Theorem 3.4, which is equal to . Therefore, it follows that an upper bound on the required number of measurements for successful recovery with high probability is given by:
(15) 
In particular, when , this means only measurements are required, and the running time of the algorithm is (see Section 3), both of which are almost optimal.
5 Simulations
Since Alg. 2 is only efficient for very small values of , we present the empirical performance of Alg. 1. Due to the efficiency of the method, it is possible to perform simulations for very large values of . In Figure 2, the empirical required oversampling rate for Alg. 2 and the proposed constructions is plotted versus the signal dimension , for various sparsity levels . The required criteria here is that the probability of successful recovery be larger than 90%. Note that when is increased by orders of magnitude, the required number of measurements is increased by a factor of 3, which is an indication of the logarithmic dependence of to . Furthermore, as the signal becomes less sparse (i.e. increases), the required oversampling factor decreases. For , this ratio is only about 3 for , and about 8 for . This is significantly better than existing sublinear recovery algorithms. Note that the optimal value of for constructing the measurement matrices for every is found empirically.
In Figure 3, the probability of successful recovery is plotted against the sparsity level for , and and . We can see that although the number of measurements has only increased by a factor of , the recoverable sparsity (given a fixed probability of success) has improved in some cases by a factor of . These curves are comparable with the performance of minimization over dense matrices, with , as displayed, which is an indication of the strong performance of the proposed scheme.
References
 [1] Compressed Sensing Online Resources, http://dsp.rice.edu/cs.
 [2] S. Jagabathula and D. Shah, Inferring popular rankings under constrained sensing, NIPS, 2008.
 [3] X. Jiang, Y. Yao and L. Guibas, Stable Identification of Cliques with Restricted Sensing, NIPS 2009.
 [4] V. Cevher, Learning with Compressible Priors, NIPS 2009.
 [5] Mihailo Stojnic, Weiyu Xu and Babak Hassibi, Compressed Sensing  Probabilistic Analysis of a Nullspace Characterizatio, ICASSP 2008.
 [6] Y. Eldar, Compressed Sensing of Analog Signals in ShiftInvariant Spaces, IEEE Tran. on Sig. Proc., 57(8), 29862997.
 [7] J. Laska, S. Kirolos, M. Duarte, T. Ragheb and R. Baraniuk and Y. Massoud, Theory and implementation of an analogtoinformation coverter using random demodulation, ISCAS 2007.
 [8] M. Mishali and Y. C. Eldar, From Theory to Practice: SubNyquist Sampling of Sparse Wideband Analog Signals, IEEE J. of Sel. Top. on Sig. Proc., 4(2):375391, 2010.
 [9] J. Luo, Yi Lu and B.Prabhaka, Prototyping Counter Braids on NetFPGA, Tech. Rep. 2008.
 [10] J. Yoo, S. Becker, E. Candès and Azita Emami, A Random Modulation PreIntegration Receiver for SubNyquist Rate Signal Acquisition, Preprint 2010.
 [11] E. J. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory, 51 42034215.
 [12] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin, One Sketch for All: Fast Algorithms for Compressed Sensing, STOC 2007.
 [13] D. Sarvotham, D. Baron, and R. Baraniuk, Sudocodes  Fast Measurement and Reconstruction of Sparse Signals, ISIT 2006.
 [14] W. Dai, O. Milenkovic and H.Pham, Structured sublinear compressive sensing via dense belief propagation, Preprint 2011.
 [15] G. Cormode and S. Muthukrishnan, Combinatorial algorithms for compressed sensing, CISS 2006.
 [16] R. Berinde, A. Gilbert, P. Indyk, M. Karloff, and M. Strauss, Combining Geometry and Combinatorics: a Unified Approach to Sparse Signal rRecovery, Allerton 2008.
 [17] J. Blanchard, C. Cartis, and J. Tanner, Compressed Sensing: How Sharp Is the Restricted Isometry Property?, SIAM Rev. 53 105125, 2010.
 [18] A. Khajehnejad, W. Xu, A. G. Dimakis, B. Hassibi, Sparse Recovery of Positive Signals with Minimal Expansion, IEEE Tran. on Signal Proc., 59(1),196208, 2010.
 [19] V. Chandrasekaran, B. Recht, P. Parrilo and A. Willsky, The Convex Geometry of Linear Inverse Problems, available at arXiv:1012.0621v1.