SHO-FA: Robust compressive sensing with order-optimal complexity, measurements, and bits
Suppose is any exactly -sparse vector in . We present a class of “sparse” matrices , and a corresponding algorithm that we call SHO-FA (for Short and Fast111Also, SHO-FA sho good! In fact, it’s all !) that, with high probability over , can reconstruct from . The SHO-FA algorithm is related to the Invertible Bloom Lookup Tables (IBLTs) recently introduced by Goodrich et al., with two important distinctions – SHO-FA relies on linear measurements, and is robust to noise. The SHO-FA algorithm is the first to simultaneously have the following properties: (a) it requires only measurements, (b) the bit-precision of each measurement and each arithmetic operation is (here corresponds to the desired relative error in the reconstruction of ), (c) the computational complexity of decoding is arithmetic operations and that of encoding is arithmetic operations, and (d) if the reconstruction goal is simply to recover a single component of instead of all of , with significant probability over this can be done in constant time. All constants above are independent of all problem parameters other than the desired probability of success. For a wide range of parameters these properties are information-theoretically order-optimal. In addition, our SHO-FA algorithm works over fairly general ensembles of “sparse random matrices”, is robust to random noise, and (random) approximate sparsity for a large range of . In particular, suppose the measured vector equals , where and correspond respectively to the source tail and measurement noise. Under reasonable statistical assumptions on and our decoding algorithm reconstructs with an estimation error of . The SHO-FA algorithm works with high probability over , , and , and still requires only steps and measurements over -bit numbers. This is in contrast to most existing algorithms which focus on the “worst-case” model, where it is known measurements over -bit numbers are necessary. Our algorithm has good empirical performance, as validated by simulations.222A preliminary version of this work was presented in [BakJCC:12]. In parallel and independently of this work, an algorithm with very similar design and performance was proposed and presented at the same venue in [PawR:12].
In recent years, spurred by the seminal work on compressive sensing of [CanRT:06, Don:06], much attention has focused on the problem of reconstructing a length- “compressible” vector over with fewer than linear measurements. In particular, it is known (e.g. [Can:08, BarDDW:08]) that with linear measurements one can computationally efficiently333The caveat is that the reconstruction techniques require one to solve an LP. Though polynomial-time algorithms to solve LPs are known, they are generally considered to be impractical for large problem instances. obtain a vector such that the reconstruction error is ,444In fact this is the so-called guarantee. One can also prove stronger reconstruction guarantees for algorithms with similar computational performance, and it is known that a reconstruction guarantee is not possible if the algorithm is required to be zero-error [CohDD:09], but is possible if some (small) probability of error is allowed [GilLPS:10, PriW:11]. where is the best possible -sparse approximation to (specifically, the non-zero terms of correspond to the largest components of in magnitude, hence corresponds to the “tail” of ). A number of different classes of algorithms are able to give such performance, such as those based on -optimization (e.g. [CanRT:06, Don:06]), and those based on iterative “matching pursuit” (e.g. [TroG:07, DonTDS:12]). Similar results, with an additional additive term in the reconstruction error hold even if the linear measurements themselves also have noise added to them (e.g. [Can:08, BarDDW:08]). The fastest of these algorithms use ideas from the theory of expander graphs, and have running time [BerIR:08, BerI:09, GilI:10].
The class of results summarized above are indeed very strong – they hold for all vectors, including those with “worst-case tails”, i.e. even vectors where the components of smaller than the largest coefficients (which can be thought of as “source tail”) are chosen in a maximally worst-case manner. In fact [BaIPW:10] proves that to obtain a reconstruction error that scales linearly with the -norm of the , the tail of , requires linear measurements.
Number of measurements: However, depending on the application, such a lower bound based on “worst-case ” may be unduly pessimistic. For instance, it is known that if is exactly -sparse (has exactly exactly non-zero components, and hence ), then based on Reed-Solomon codes [ReeS:60] one can efficiently reconstruct with noiseless measurements (e.g. [ParH:08]) via algorithms with decoding time-complexity , or via codes such as in [KudP:10, MitG:12] with noiseless measurements with decoding time-complexity .555In general the linear systems produced by Reed-Solomon codes are ill-conditioned, which causes problems for large . In the regime where [JafWHC:09] use the “sparse-matrix” techniques of [BerIR:08, BerI:09, GilI:10] to demonstrate that measurements suffice to reconstruct .
Noise: Even if the source is not exactly -sparse, a spate of recent work has taken a more information-theoretic view than the coding-theoretic/worst-case point-of-view espoused by much of the compressive sensing work thus far. Specifically, suppose the length- source vector is the sum of any exactly -sparse vector and a “random” source noise vector (and possibly the linear measurement vector also has a “random” measurement noise vector added to it). Then as long as the noise variances are not “too much larger” than the signal power, the work of [AkcT:10] demonstrates that measurements suffice (though the proofs in [AkcT:10] are information-theoretic and existential – the corresponding “typical-set decoding” algorithms require time exponential in ). Indeed, even the work of [BaIPW:10], whose primary focus was to prove that linear measurements are necessary to reconstruct in the worst case, also notes as an aside that if corresponds to an exactly sparse vector plus random noise, then in fact measurements suffice. The work in [WuV:10, WuV:11] examines this phenomenon information-theoretically by drawing a nice connection with the Rényi information dimension of the signal/noise. The heuristic algorithms in [KrzMSSZ:12] indicate that approximate message passing algorithms achieve this performance computationally efficiently (in time ), and [DonJM:11] prove this rigorously. Corresponding lower bounds showing samples are required in the higher noise regime are provided in [FleRG:09, Wai:09].
Number of measurement bits: However, most of the works above focus on minimizing the number of linear measurements in , rather than the more information-theoretic view of trying to minimize the number of bits in over all measurements. Some recent work attempts to fill this gap – notably “Counting Braids” [LuMPDK:08, YiMP:08] (this work uses “multi-layered non-linear measurements”), and “one-bit compressive sensing” [PlaV:11, JacLBB:11] (the corresponding decoding complexity is somewhat high (though still polynomial-time) since it involves solving an LP).
Decoding time-complexity: The emphasis of the discussion thus far has been on the number of linear measurements/bits required to reconstruct . The decoding algorithms in most of the works above have decoding time-complexities666For ease of presentation, in accordance with common practice in the literature, in this discussion we assume that the time-complexity of performing a single arithmetic operation is constant. Explicitly taking the complexity of performing finite-precision arithmetic into account adds a multiplicative factor (corresponding to the precision with which arithmetic operations are performed) in the time-complexity of most of the works, including ours. that scale at least linearly with . In regimes where is significantly smaller than , it is natural to wonder whether one can do better. Indeed, algorithms based on iterative techniques answer this in the affirmative. These include Chaining Pursuit [GilSTV06], group-testing based algorithms [CorM:06], and Sudocodes [SarBB:06] – each of these have decoding time-complexity that can be sub-linear in (but at least ), but each requires at least linear measurements.
Database query: Finally, we consider a database query property that is not often of primary concern in the compressive sensing literature. That is, suppose one is given a compressive sensing algorithm that is capable of reconstructing with the desired reconstruction guarantee. Now suppose that one instead wishes to reconstruct, with reasonably high probability, just “a few” (constant number) specific components of , rather than all of it. Is it possible to do so even faster (say in constant time) – for instance, if the measurements are in a database, and one wishes to query it in a computationally efficient manner? If the matrix is “dense” (most of its entries are non-zero) then one can directly see that this is impossible. However, several compressive sensing algorithms (for instance [JafWHC:09]) are based on “sparse” matrices , and it can be shown that in fact these algorithms do indeed have this property “for free” (as indeed does our algorithm), even though the authors do not analyze this. As can be inferred from the name, this database query property is more often considered in the database community, for instance in the work on IBLTs [GooM:11].
I-a Our contributions
Conceptually, the “iterative decoding” technique we use is not new. Similar ideas have been used in various settings in, for instance [Spi:95, Pri:11, GooM:11, KudP:10]. However, to the best of our knowledge, no prior work has the same performance as our work – namely – information-theoretically order-optimal number of measurements, bits in those measurements, and time-complexity, for the problem of reconstructing a sparse signal (or sparse signal with a noisy tail and noisy measurements) via linear measurements (along with the database query property).The key to this performance is our novel design of “sparse random” linear measurements, as described in Section LABEL:sec:noiseless.
To summarize, the desirable properties of SHO-FA are that with high probability777For most of the properties, we show that this probability is at least , though we explicitly prove only .:
Number of measurements: For every -sparse , with high probability over , linear measurements suffice to reconstruct . This is information-theoretically order-optimal.
Number of measurement bits: The total number of bits in required to reconstruct to a relative error of is . This is information-theoretically order-optimal for any (for any ).
Decoding time-complexity: The total number of arithmetic operations required is . This is information-theoretically order-optimal.
“Database-type queries”: With constant probability any single “database-type query” can be answered in constant time. That is, the value of a single component of can be reconstructed in constant time with constant probability. 888The constant can be made arbitrarily close to zero, at the cost of a multiplicative factor in the number of measurements required. In fact, if we allow the number of measurements to scale as , we can support any number of database queries, each in constant time, with probability of every one being answered correctly at with probability at least .
Encoding/update complexity: The computational complexity of generating from and is , and if changes to some in locations, the computational complexity of updating to is . Both of these are information-theoretically order-optimal.
Noise: Suppose and have i.i.d. components999Even if the statistical distribution of the components of and are not i.i.d. Gaussian, statements with a similar flavor can be made. For instance, pertaining to the effect of the distribution of , it turns out that our analysis is sensitive only on the distribution of the sum of components of , rather then the components themselves. Hence, for example, if the components of are i.i.d. non-Gaussian, it turns out that via the Berry-Esseen theorem [BerE] one can derive similar results to the ones derived in this work. In another direction, if the components of are not i.i.d. but do satisfy some “regularity constraints”, then using Bernstein’s inequality [Bernstein] one can again derive analogous results. However, these arguments are more sensitive and outside the scope of this paper, where the focus is on simpler models. drawn respectively from and . For every and for for any , a modified version of SHO-FA (SHO-FA-NO) that with high probability reconstructs with an estimation error of 101010As noted in Footnote 4, this reconstruction guarantee implies the weaker reconstruction guarantee .
Practicality: As validated by simulations (shown in Appendix LABEL:app:sim), most of the constant factors involved above are not large.
Different bases: As is common in the compressive sensing literature, our techniques generalize directly to the setting wherein is sparse in an alternative basis (say, for example, in a wavelet basis).
Universality: While we present a specific ensemble of matrices over which SHO-FA operates, we argue that in fact similar algorithms work over fairly general ensembles of “sparse random matrices” (see Section LABEL:sec:shofaint), and further that such matrices can occur in applications, for instance in wireless MIMO systems [Guo:10] (Figure LABEL:fig:base_station gives such an example) and Network Tomography [XuMT:11].
|Reference||Reconstruction||# Measurements||# Decoding steps||Precision|
|Wu-Verdú ’10 [WuV:10]||R||R||R||Exact||–|
|Donoho et al. [DonJM:11]||R||R||R||Exact||o(1)||–|
|Cohen et al. [CohDD:09]||D||D||D||–||–|
|Ba et al. [BaIPW:10]||D/R||D||D||–|
|Ba et al. [BaIPW:10]||R||D||R|
|Baraniuk et al. [BarDDW:08]|
|Indyk et al. [IndR:08]||D||D||D||D||–|
|Akçakaya et al. [AkcT:08]||R||D||R||–|
|Sup. Rec.||Cond. on|
|Wu-Verdú ’11 [WuV:11]||R||R||R||R||–|
|Wainwright [Wai:09]||D||R||Sup. Rec.||–||–|
|Fletcher et al. [FleRG:09]||D||R||Sup. Rec.||–||–|
|Aeron et al. [AerSM:10]||D||R||Sup. Rec.||–|
|Jacques et al. [JacLBB:11]||R||D||sgn||1|
|Sarvotham et al. [SarBB:06]||R||D||Exact||–|
|Gilbert et al. [GilSTV06]||R||P.L.||P.L.||0||–|
|This work/Pawar et al. [PawR:12]||R||D||Exact|