SHOFA: Robust compressive sensing with orderoptimal complexity, measurements, and bits
Abstract
Suppose is any exactly sparse vector in . We present a class of “sparse” matrices , and a corresponding algorithm that we call SHOFA (for Short and Fast^{1}^{1}1Also, SHOFA sho good! In fact, it’s all !) that, with high probability over , can reconstruct from . The SHOFA algorithm is related to the Invertible Bloom Lookup Tables (IBLTs) recently introduced by Goodrich et al., with two important distinctions – SHOFA relies on linear measurements, and is robust to noise. The SHOFA algorithm is the first to simultaneously have the following properties: (a) it requires only measurements, (b) the bitprecision of each measurement and each arithmetic operation is (here corresponds to the desired relative error in the reconstruction of ), (c) the computational complexity of decoding is arithmetic operations and that of encoding is arithmetic operations, and (d) if the reconstruction goal is simply to recover a single component of instead of all of , with significant probability over this can be done in constant time. All constants above are independent of all problem parameters other than the desired probability of success. For a wide range of parameters these properties are informationtheoretically orderoptimal. In addition, our SHOFA algorithm works over fairly general ensembles of “sparse random matrices”, is robust to random noise, and (random) approximate sparsity for a large range of . In particular, suppose the measured vector equals , where and correspond respectively to the source tail and measurement noise. Under reasonable statistical assumptions on and our decoding algorithm reconstructs with an estimation error of . The SHOFA algorithm works with high probability over , , and , and still requires only steps and measurements over bit numbers. This is in contrast to most existing algorithms which focus on the “worstcase” model, where it is known measurements over bit numbers are necessary. Our algorithm has good empirical performance, as validated by simulations.^{2}^{2}2A preliminary version of this work was presented in [BakJCC:12]. In parallel and independently of this work, an algorithm with very similar design and performance was proposed and presented at the same venue in [PawR:12].
I Introduction
In recent years, spurred by the seminal work on compressive sensing of [CanRT:06, Don:06], much attention has focused on the problem of reconstructing a length “compressible” vector over with fewer than linear measurements. In particular, it is known (e.g. [Can:08, BarDDW:08]) that with linear measurements one can computationally efficiently^{3}^{3}3The caveat is that the reconstruction techniques require one to solve an LP. Though polynomialtime algorithms to solve LPs are known, they are generally considered to be impractical for large problem instances. obtain a vector such that the reconstruction error is ,^{4}^{4}4In fact this is the socalled guarantee. One can also prove stronger reconstruction guarantees for algorithms with similar computational performance, and it is known that a reconstruction guarantee is not possible if the algorithm is required to be zeroerror [CohDD:09], but is possible if some (small) probability of error is allowed [GilLPS:10, PriW:11]. where is the best possible sparse approximation to (specifically, the nonzero terms of correspond to the largest components of in magnitude, hence corresponds to the “tail” of ). A number of different classes of algorithms are able to give such performance, such as those based on optimization (e.g. [CanRT:06, Don:06]), and those based on iterative “matching pursuit” (e.g. [TroG:07, DonTDS:12]). Similar results, with an additional additive term in the reconstruction error hold even if the linear measurements themselves also have noise added to them (e.g. [Can:08, BarDDW:08]). The fastest of these algorithms use ideas from the theory of expander graphs, and have running time [BerIR:08, BerI:09, GilI:10].
The class of results summarized above are indeed very strong – they hold for all vectors, including those with “worstcase tails”, i.e. even vectors where the components of smaller than the largest coefficients (which can be thought of as “source tail”) are chosen in a maximally worstcase manner. In fact [BaIPW:10] proves that to obtain a reconstruction error that scales linearly with the norm of the , the tail of , requires linear measurements.
Number of measurements: However, depending on the application, such a lower bound based on “worstcase ” may be unduly pessimistic. For instance, it is known that if is exactly sparse (has exactly exactly nonzero components, and hence ), then based on ReedSolomon codes [ReeS:60] one can efficiently reconstruct with noiseless measurements (e.g. [ParH:08]) via algorithms with decoding timecomplexity , or via codes such as in [KudP:10, MitG:12] with noiseless measurements with decoding timecomplexity .^{5}^{5}5In general the linear systems produced by ReedSolomon codes are illconditioned, which causes problems for large . In the regime where [JafWHC:09] use the “sparsematrix” techniques of [BerIR:08, BerI:09, GilI:10] to demonstrate that measurements suffice to reconstruct .
Noise: Even if the source is not exactly sparse, a spate of recent work has taken a more informationtheoretic view than the codingtheoretic/worstcase pointofview espoused by much of the compressive sensing work thus far. Specifically, suppose the length source vector is the sum of any exactly sparse vector and a “random” source noise vector (and possibly the linear measurement vector also has a “random” measurement noise vector added to it). Then as long as the noise variances are not “too much larger” than the signal power, the work of [AkcT:10] demonstrates that measurements suffice (though the proofs in [AkcT:10] are informationtheoretic and existential – the corresponding “typicalset decoding” algorithms require time exponential in ). Indeed, even the work of [BaIPW:10], whose primary focus was to prove that linear measurements are necessary to reconstruct in the worst case, also notes as an aside that if corresponds to an exactly sparse vector plus random noise, then in fact measurements suffice. The work in [WuV:10, WuV:11] examines this phenomenon informationtheoretically by drawing a nice connection with the Rényi information dimension of the signal/noise. The heuristic algorithms in [KrzMSSZ:12] indicate that approximate message passing algorithms achieve this performance computationally efficiently (in time ), and [DonJM:11] prove this rigorously. Corresponding lower bounds showing samples are required in the higher noise regime are provided in [FleRG:09, Wai:09].
Number of measurement bits: However, most of the works above focus on minimizing the number of linear measurements in , rather than the more informationtheoretic view of trying to minimize the number of bits in over all measurements. Some recent work attempts to fill this gap – notably “Counting Braids” [LuMPDK:08, YiMP:08] (this work uses “multilayered nonlinear measurements”), and “onebit compressive sensing” [PlaV:11, JacLBB:11] (the corresponding decoding complexity is somewhat high (though still polynomialtime) since it involves solving an LP).
Decoding timecomplexity: The emphasis of the discussion thus far has been on the number of linear measurements/bits required to reconstruct . The decoding algorithms in most of the works above have decoding timecomplexities^{6}^{6}6For ease of presentation, in accordance with common practice in the literature, in this discussion we assume that the timecomplexity of performing a single arithmetic operation is constant. Explicitly taking the complexity of performing finiteprecision arithmetic into account adds a multiplicative factor (corresponding to the precision with which arithmetic operations are performed) in the timecomplexity of most of the works, including ours. that scale at least linearly with . In regimes where is significantly smaller than , it is natural to wonder whether one can do better. Indeed, algorithms based on iterative techniques answer this in the affirmative. These include Chaining Pursuit [GilSTV06], grouptesting based algorithms [CorM:06], and Sudocodes [SarBB:06] – each of these have decoding timecomplexity that can be sublinear in (but at least ), but each requires at least linear measurements.
Database query: Finally, we consider a database query property that is not often of primary concern in the compressive sensing literature. That is, suppose one is given a compressive sensing algorithm that is capable of reconstructing with the desired reconstruction guarantee. Now suppose that one instead wishes to reconstruct, with reasonably high probability, just “a few” (constant number) specific components of , rather than all of it. Is it possible to do so even faster (say in constant time) – for instance, if the measurements are in a database, and one wishes to query it in a computationally efficient manner? If the matrix is “dense” (most of its entries are nonzero) then one can directly see that this is impossible. However, several compressive sensing algorithms (for instance [JafWHC:09]) are based on “sparse” matrices , and it can be shown that in fact these algorithms do indeed have this property “for free” (as indeed does our algorithm), even though the authors do not analyze this. As can be inferred from the name, this database query property is more often considered in the database community, for instance in the work on IBLTs [GooM:11].
Ia Our contributions
Conceptually, the “iterative decoding” technique we use is not new. Similar ideas have been used in various settings in, for instance [Spi:95, Pri:11, GooM:11, KudP:10]. However, to the best of our knowledge, no prior work has the same performance as our work – namely – informationtheoretically orderoptimal number of measurements, bits in those measurements, and timecomplexity, for the problem of reconstructing a sparse signal (or sparse signal with a noisy tail and noisy measurements) via linear measurements (along with the database query property).The key to this performance is our novel design of “sparse random” linear measurements, as described in Section LABEL:sec:noiseless.
To summarize, the desirable properties of SHOFA are that with high probability^{7}^{7}7For most of the properties, we show that this probability is at least , though we explicitly prove only .:

Number of measurements: For every sparse , with high probability over , linear measurements suffice to reconstruct . This is informationtheoretically orderoptimal.

Number of measurement bits: The total number of bits in required to reconstruct to a relative error of is . This is informationtheoretically orderoptimal for any (for any ).

Decoding timecomplexity: The total number of arithmetic operations required is . This is informationtheoretically orderoptimal.

“Databasetype queries”: With constant probability any single “databasetype query” can be answered in constant time. That is, the value of a single component of can be reconstructed in constant time with constant probability. ^{8}^{8}8The constant can be made arbitrarily close to zero, at the cost of a multiplicative factor in the number of measurements required. In fact, if we allow the number of measurements to scale as , we can support any number of database queries, each in constant time, with probability of every one being answered correctly at with probability at least .

Encoding/update complexity: The computational complexity of generating from and is , and if changes to some in locations, the computational complexity of updating to is . Both of these are informationtheoretically orderoptimal.

Noise: Suppose and have i.i.d. components^{9}^{9}9Even if the statistical distribution of the components of and are not i.i.d. Gaussian, statements with a similar flavor can be made. For instance, pertaining to the effect of the distribution of , it turns out that our analysis is sensitive only on the distribution of the sum of components of , rather then the components themselves. Hence, for example, if the components of are i.i.d. nonGaussian, it turns out that via the BerryEsseen theorem [BerE] one can derive similar results to the ones derived in this work. In another direction, if the components of are not i.i.d. but do satisfy some “regularity constraints”, then using Bernstein’s inequality [Bernstein] one can again derive analogous results. However, these arguments are more sensitive and outside the scope of this paper, where the focus is on simpler models. drawn respectively from and . For every and for for any , a modified version of SHOFA (SHOFANO) that with high probability reconstructs with an estimation error of ^{10}^{10}10As noted in Footnote 4, this reconstruction guarantee implies the weaker reconstruction guarantee .

Practicality: As validated by simulations (shown in Appendix LABEL:app:sim), most of the constant factors involved above are not large.

Different bases: As is common in the compressive sensing literature, our techniques generalize directly to the setting wherein is sparse in an alternative basis (say, for example, in a wavelet basis).

Universality: While we present a specific ensemble of matrices over which SHOFA operates, we argue that in fact similar algorithms work over fairly general ensembles of “sparse random matrices” (see Section LABEL:sec:shofaint), and further that such matrices can occur in applications, for instance in wireless MIMO systems [Guo:10] (Figure LABEL:fig:base_station gives such an example) and Network Tomography [XuMT:11].
Reference  Reconstruction  # Measurements  # Decoding steps  Precision  
Goal  
ReedSolomon [ReeS:60]  D  D  Exact  [Ale:05]  –  
Singleton [Sin:64]  D/R  D  Exact  –  –  
MitzenmacherVarghese [MitG:12]  R  D  Exact  –  
KudekarPfister [KudP:10]  R  D  Exact  –  
TroppGilbert [TroG:07]  G  D  Exact  –  
WuVerdú ’10 [WuV:10]  R  R  R  Exact  –  
Donoho et al. [DonJM:11]  R  R  R  Exact  o(1)  –  
R  
CormodeMuthukrishnan [CorM:06]  R  D  –  
Cohen et al. [CohDD:09]  D  D  D  –  –  
PriceWoodruff [PriW:11]  D  D  D  –  –  
Ba et al. [BaIPW:10]  D/R  D  D  –  
Ba et al. [BaIPW:10]  R  D  R  
Candés [Can:08],  R  D  D  D  LP  –  
Baraniuk et al. [BarDDW:08]  
Indyk et al. [IndR:08]  D  D  D  D  –  
Akçakaya et al. [AkcT:08]  R  D  R  –  
Sup. Rec.  Cond. on  
WuVerdú ’11 [WuV:11]  R  R  R  R  –  
Wainwright [Wai:09]  D  R  Sup. Rec.  –  –  
Fletcher et al. [FleRG:09]  D  R  Sup. Rec.  –  –  
Aeron et al. [AerSM:10]  D  R  Sup. Rec.  –  
PlanVershynin [PlaV:11]  R  D  sgn  LP  1  
Jacques et al. [JacLBB:11]  R  D  sgn  1  
Sarvotham et al. [SarBB:06]  R  D  Exact  –  
Gilbert et al. [GilSTV06]  R  P.L.  P.L.  0  –  
This work/Pawar et al. [PawR:12]  R  D  Exact  
R  D  R  R 