Distributed Compressive Sensing

Distributed Compressive Sensing

Dror Baron, Marco F. Duarte, Michael B. Wakin,
Shriram Sarvotham, and Richard G. Baraniuk111 This work was supported by the grants NSF CCF-0431150 and CCF-0728867, DARPA HR0011-08-1-0078, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1-0936 and N00014-08-1-1112, AFOSR FA9550-07-1-0301, ARO MURI W311NF-07-1-0185, and the Texas Instruments Leadership University Program. Preliminary versions of this work have appeared at the Allerton Conference on Communication, Control, and Computing [1], the Asilomar Conference on Signals, Systems and Computers [2], the Conference on Neural Information Processing Systems [3], and the Workshop on Sensor, Signal and Information Processing [4].
E-mail: drorb@ee.technion.ac.il, {duarte, shri, richb}@rice.edu, mwakin@mines.edu; Web: dsp.rice.edu/cs

Department of Electrical Engineering, Technion – Israel Institute of Technology, Haifa, Israel
Department of Electrical and Computer Engineering, Rice University, Houston, TX
Division of Engineering, Colorado School of Mines, Golden, CO
Halliburton, Houston, TX
Abstract

Compressive sensing is a signal acquisition framework based on the revelation that a small collection of linear projections of a sparse signal contains enough information for stable recovery. In this paper we introduce a new theory for distributed compressive sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. Our theoretical contribution is to characterize the fundamental performance limits of DCS recovery for jointly sparse signal ensembles in the noiseless measurement setting; our result connects single-signal, joint, and distributed (multi-encoder) compressive sensing. To demonstrate the efficacy of our framework and to show that additional challenges such as computational tractability can be addressed, we study in detail three example models for jointly sparse signals. For these models, we develop practical algorithms for joint recovery of multiple signals from incoherent projections. In two of our three models, the results are asymptotically best-possible, meaning that both the upper and lower bounds match the performance of our practical algorithms. Moreover, simulations indicate that the asymptotics take effect with just a moderate number of signals. DCS is immediately applicable to a range of problems in sensor arrays and networks.

This paper is dedicated to the memory of Hyeokho Choi, our colleague, mentor, and friend.

Keywords: Compressive sensing, distributed source coding, sparsity, random projection, random matrix, linear programming, array processing, sensor networks.

1 Introduction

A core tenet of signal processing and information theory is that signals, images, and other data often contain some type of structure that enables intelligent representation and processing. The notion of structure has been characterized and exploited in a variety of ways for a variety of purposes. In this paper, we focus on exploiting signal correlations for the purpose of compression.

Current state-of-the-art compression algorithms employ a decorrelating transform such as an exact or approximate Karhunen-Loève transform (KLT) to compact a correlated signal’s energy into just a few essential coefficients [5, 6, 7]. Such transform coders exploit the fact that many signals have a sparse representation in terms of some basis, meaning that a small number of adaptively chosen transform coefficients can be transmitted or stored rather than signal samples. For example, smooth signals are sparse in the Fourier basis, and piecewise smooth signals are sparse in a wavelet basis [8]; the commercial coding standards MP3 [9], JPEG [10], and JPEG2000 [11] directly exploit this sparsity.

1.1 Distributed source coding

While the theory and practice of compression have been well developed for individual signals, distributed sensing applications involve multiple signals, for which there has been less progress. Such settings are motivated by the proliferation of complex, multi-signal acquisition architectures, such as acoustic and RF sensor arrays, as well as sensor networks. These architectures sometimes involve battery-powered devices, which restrict the communication energy, and high aggregate data rates, limiting bandwidth availability; both factors make the reduction of communication critical.

Fortunately, since the sensors presumably observe related phenomena, the ensemble of signals they acquire can be expected to possess some joint structure, or inter-signal correlation, in addition to the intra-signal correlation within each individual sensor’s measurements. In such settings, distributed source coding that exploits both intra- and inter-signal correlations might allow the network to save on the communication costs involved in exporting the ensemble of signals to the collection point [12, 13, 14, 15]. A number of distributed coding algorithms have been developed that involve collaboration amongst the sensors [16, 17, 18, 19]. Note, however, that any collaboration involves some amount of inter-sensor communication overhead.

In the Slepian-Wolf framework for lossless distributed coding [12, 13, 14, 15], the availability of correlated side information at the decoder (collection point) enables each sensor node to communicate losslessly at its conditional entropy rate rather than at its individual entropy rate, as long as the sum rate exceeds the joint entropy rate. Slepian-Wolf coding has the distinct advantage that the sensors need not collaborate while encoding their measurements, which saves valuable communication overhead. Unfortunately, however, most existing coding algorithms [14, 15] exploit only inter-signal correlations and not intra-signal correlations. To date there has been only limited progress on distributed coding of so-called “sources with memory.” The direct implementation for sources with memory would require huge lookup tables [12]. Furthermore, approaches combining pre- or post-processing of the data to remove intra-signal correlations combined with Slepian-Wolf coding for the inter-signal correlations appear to have limited applicability, because such processing would alter the data in a way that is unknown to other nodes. Finally, although recent papers [20, 21, 22] provide compression of spatially correlated sources with memory, the solution is specific to lossless distributed compression and cannot be readily extended to lossy compression settings. We conclude that the design of constructive techniques for distributed coding of sources with both intra- and inter-signal correlation is a challenging problem with many potential applications.

1.2 Compressive sensing (CS)

A new framework for single-signal sensing and compression has developed under the rubric of compressive sensing (CS). CS builds on the work of Candès, Romberg, and Tao [23] and Donoho [24], who showed that if a signal has a sparse representation in one basis then it can be recovered from a small number of projections onto a second basis that is incoherent with the first.222Roughly speaking, incoherence means that no element of one basis has a sparse representation in terms of the other basis. This notion has a variety of formalizations in the CS literature [25, 24, 26]. CS relies on tractable recovery procedures that can provide exact recovery of a signal of length and sparsity , i.e., a signal that can be written as a sum of basis functions from some known basis, where can be orders of magnitude less than .

The implications of CS are promising for many applications, especially for sensing signals that have a sparse representation in some basis. Instead of sampling a -sparse signal times, only incoherent measurements suffice, where can be orders of magnitude less than . Moreover, the measurements need not be manipulated in any way before being transmitted, except possibly for some quantization. Finally, independent and identically distributed (i.i.d.) Gaussian or Bernoulli/Rademacher (random ) vectors provide a useful universal basis that is incoherent with all others. Hence, when using a random basis, CS is universal in the sense that the sensor can apply the same measurement mechanism no matter what basis sparsifies the signal [27].

While powerful, the CS theory at present is designed mainly to exploit intra-signal structures at a single sensor. In a multi-sensor setting, one can naively obtain separate measurements from each signal and recover them separately. However, it is possible to obtain measurements that each depend on all signals in the ensemble by having sensors collaborate with each other in order to combine all of their measurements; we term this process a joint measurement setting. In fact, initial work in CS for multi-sensor settings used standard CS with joint measurement and recovery schemes that exploit inter-signal correlations [28, 29, 30, 31, 32]. However, by recovering sequential time instances of the sensed data individually, these schemes ignore intra-signal correlations.

1.3 Distributed compressive sensing (DCS)

In this paper we introduce a new theory for distributed compressive sensing (DCS) to enable new distributed coding algorithms that exploit both intra- and inter-signal correlation structures. In a typical DCS scenario, a number of sensors measure signals that are each individually sparse in some basis and also correlated from sensor to sensor. Each sensor separately encodes its signal by projecting it onto another, incoherent basis (such as a random one) and then transmits just a few of the resulting coefficients to a single collection point. Unlike the joint measurement setting described in Section 1.2, DCS requires no collaboration between the sensors during signal acquisition. Nevertheless, we are able to exploit the inter-signal correlation by using all of the obtained measurements to recover all the signals simultaneously. Under the right conditions, a decoder at the collection point can recover each of the signals precisely.

The DCS theory rests on a concept that we term the joint sparsity — the sparsity of the entire signal ensemble. The joint sparsity is often smaller than the aggregate over individual signal sparsities. Therefore, DCS offers a reduction in the number of measurements, in a manner analogous to the rate reduction offered by the Slepian-Wolf framework [13]. Unlike the single-signal definition of sparsity, however, there are numerous plausible ways in which joint sparsity could be defined. In this paper, we first provide a general framework for joint sparsity using algebraic formulations based on a graphical model. Using this framework, we derive bounds for the number of measurements necessary for recovery under a given signal ensemble model. Similar to Slepian-Wolf coding [13], the number of measurements required for each sensor must account for the minimal features unique to that sensor, while at the same time features that appear among multiple sensors must be amortized over the group. Our bounds are dependent on the dimensionality of the subspaces in which each group of signals reside; they afford a reduction in the number of measurements that we quantify through the notions of joint and conditional sparsity, which are conceptually related to joint and conditional entropies. The common thread is that dimensionality and entropy both quantify the volume that the measurement and coding rates must cover. Our results are also applicable to cases where the signal ensembles are measured jointly, as well as to the single-signal case.

While our general framework does not by design provide insights for computationally efficient recovery, we also provide interesting models for joint sparsity where our results carry through from the general framework to realistic settings with low-complexity algorithms. In the first model, each signal is itself sparse, and so we could use CS to separately encode and decode each signal. However, there also exists a framework wherein a joint sparsity model for the ensemble uses fewer total coefficients. In the second model, all signals share the locations of the nonzero coefficients. In the third model, no signal is itself sparse, yet there still exists a joint sparsity among the signals that allows recovery from significantly fewer than measurements per sensor. For each model we propose tractable algorithms for joint signal recovery, followed by theoretical and empirical characterizations of the number of measurements per sensor required for accurate recovery. We show that, under these models, joint signal recovery can recover signal ensembles from significantly fewer measurements than would be required to recover each signal individually. In fact, for two of our three models we obtain best-possible performance that could not be bettered by an oracle that knew the the indices of the nonzero entries of the signals.

This paper focuses primarily on the basic task of reducing the number of measurements for recovery of a signal ensemble in order to reduce the communication cost of source coding that ensemble. Our emphasis is on noiseless measurements of strictly sparse signals, where the optimal recovery relies on -norm optimization,333The “norm” merely counts the number of nonzero entries in the vector . which is computationally intractable. In practical settings, additional criteria may be relevant for measuring performance. For example, the measurements will typically be real numbers that must be quantized, which gradually degrades the recovery quality as the quantization becomes coarser [33, 34]. Characterizing DCS in light of practical considerations such as rate-distortion tradeoffs, power consumption in sensor networks, etc., are topics of future research [31, 32].

1.4 Paper organization

Section 2 overviews the single-signal CS theories and provides a new result on CS recovery. While some readers may be familiar with this material, we include it to make the paper self-contained. Section 3 introduces our general framework for joint sparsity models and proposes three example models for joint sparsity. We provide our detailed analysis for the general framework in Section 4; we then address the three models in Section 5. We close the paper with a discussion and conclusions in Section 6. Several appendices contain the proofs.

2 Compressive Sensing Background

2.1 Transform coding

Consider a real-valued signal444Without loss of generality, we will focus on one-dimensional signals (vectors) for notational simplicity; the extension to multi-dimensional signal, e.g., images, is straightforward. indexed as , . Suppose that the basis provides a -sparse representation of ; that is,

(1)

where is a linear combination of vectors chosen from , are the indices of those vectors, and are the coefficients; the concept is extendable to tight frames [8]. Alternatively, we can write in matrix notation , where is an column vector, the sparse basis matrix is with the basis vectors as columns, and is an column vector with nonzero elements. Using to denote the norm, we can write that ; we can also write the set of nonzero indices , with . Various expansions, including wavelets [8], Gabor bases [8], curvelets [35], etc., are widely used for representation and compression of natural signals, images, and other data.

The standard procedure for compressing sparse and nearly-sparse signals, known as transform coding, is to (i) acquire the full -sample signal ; (ii) compute the complete set of transform coefficients ; (iii) locate the largest, significant coefficients and discard the (many) small coefficients; (iv) encode the values and locations of the largest coefficients. This procedure has three inherent inefficiencies: First, for a high-dimensional signal, we must start with a large number of samples . Second, the encoder must compute all of the transform coefficients , even though it will discard all but of them. Third, the encoder must encode the locations of the large coefficients, which requires increasing the coding rate since the locations change with each signal.

We will focus our theoretical development on exactly -sparse signals and defer discussion of the more general situation of compressible signals where the coefficients decay rapidly with a power law but not to zero. Section 6 contains additional discussion on real-world compressible signals, and [36] presents simulation results.

2.2 Incoherent projections

These inefficiencies raise a simple question: For a given signal, is it possible to directly estimate the set of large ’s that will not be discarded? While this seems improbable, Candès, Romberg, and Tao [23, 25] and Donoho [24] have shown that a reduced set of projections can contain enough information to recover sparse signals. A framework to acquire sparse signals, often referred to as compressive sensing (CS) [37], has emerged that builds on this principle.

In CS, we do not measure or encode the significant directly. Rather, we measure and encode projections of the signal onto a second set of functions , where denotes the transpose of and denotes the inner product. In matrix notation, we measure , where is an column vector and the measurement matrix is with each row a measurement vector . Since , recovery of the signal from the measurements is ill-posed in general; however the additional assumption of signal sparsity makes recovery possible and practical.

The CS theory tells us that when certain conditions hold, namely that the basis cannot sparsely represent the vectors (a condition known as incoherence [25, 24, 26]) and the number of measurements is large enough (proportional to ), then it is indeed possible to recover the set of large (and thus the signal ) from the set of measurements [25, 24]. This incoherence property holds for many pairs of bases, including for example, delta spikes and the sine waves of a Fourier basis, or the Fourier basis and wavelets. Signals that are sparsely represented in frames or unions of bases can be recovered from incoherent measurements in the same fashion. Significantly, this incoherence also holds with high probability between any arbitrary fixed basis or frame and a randomly generated one. In the sequel, we will focus our analysis to such random measurement procedures.

2.3 Signal recovery via -norm minimization

The recovery of the sparse set of significant coefficients can be achieved using optimization by searching for the signal with the sparsest coefficient vector that agrees with the observed measurements in (recall that ). Recovery relies on the key observation that, under mild conditions on and , the coefficient vector is the unique solution to the -norm minimization

(2)

with overwhelming probability. (Thanks to the incoherence between the two bases, if the original signal is sparse in the coefficients, then no other set of sparse signal coefficients can yield the same projections .)

In principle, remarkably few incoherent measurements are required to recover a -sparse signal via -norm minimization. More than measurements must be taken to avoid ambiguity; the following theorem, proven in Appendix A, establishes that random measurements will suffice. Similar results were established by Venkataramani and Bresler [38].

Theorem 1

Let be an orthonormal basis for , and let . Then:

  1. Let be an measurement matrix with i.i.d. Gaussian entries with . Then all signals having expansion coefficients that satisfy can be recovered uniquely from the -dimensional measurement vector via the -norm minimization (2) with probability one over .

  2. Let such that . Let be an measurement matrix with i.i.d. Gaussian entries (notably, independent of ) with . Then can be recovered uniquely from the -dimensional measurement vector via the -norm minimization (2) with probability one over .

  3. Let be an measurement matrix, where . Then, aside from pathological cases (specified in the proof), no signal with can be uniquely recovered from the -dimensional measurement vector .

Remark 1

The second statement of the theorem differs from the first in the following respect: when , there will necessarily exist -sparse signals that cannot be uniquely recovered from the -dimensional measurement vector . However, these signals form a set of measure zero within the set of all -sparse signals and can safely be avoided with high probability if is randomly generated independently of .

Comparing the second and third statements of Theorem 1, we see that one measurement separates the achievable region, where perfect recovery is possible with probability one, from the converse region, where with overwhelming probability recovery is impossible. Moreover, Theorem 1 provides a strong converse measurement region in a manner analogous to the strong channel coding converse theorems of information theory [12].

Unfortunately, solving the -norm minimization problem is prohibitively complex, requiring a combinatorial enumeration of the possible sparse subspaces. In fact, the -norm minimization problem in general is known to be NP-hard [39]. Yet another challenge is robustness; in the setting of Theorem 1, the recovery may be very poorly conditioned. In fact, both of these considerations (computational complexity and robustness) can be addressed, but at the expense of slightly more measurements.

2.4 Signal recovery via -norm minimization

The practical revelation that supports the new CS theory is that it is not necessary to solve the -norm minimization to recover the set of significant . In fact, a much easier problem yields an equivalent solution (thanks again to the incoherence of the bases); we need only solve for the smallest -norm coefficient vector that agrees with the measurements [25, 24]:

(3)

This optimization problem, also known as Basis Pursuit, is significantly more approachable and can be solved with traditional linear programming techniques whose computational complexities are polynomial in .

There is no free lunch, however; according to the theory, more than measurements are required in order to recover sparse signals via Basis Pursuit. Instead, one typically requires measurements, where is an overmeasuring factor. As an example, we quote a result asymptotic in . For simplicity, we assume that the sparsity scales linearly with ; that is, , where we call the sparsity rate.

Theorem 2

[39, 40, 41] Set with . Then there exists an overmeasuring factor , , such that, for a -sparse signal in basis , the following statements hold:

  1. The probability of recovering via -norm minimization from random projections, , converges to one as .

  2. The probability of recovering via -norm minimization from random projections, , converges to zero as .

In an illuminating series of papers, Donoho and Tanner [40, 41, 42] have characterized the overmeasuring factor precisely. In our work, we have noticed that the overmeasuring factor is quite similar to . We find this expression a useful rule of thumb to approximate the precise overmeasuring ratio. Additional overmeasuring is proven to provide robustness to measurement noise and quantization error [25].

Throughout this paper we use the abbreviated notation to describe the overmeasuring factor required in various settings even though depends on the sparsity and signal length .

2.5 Signal recovery via greedy pursuit

Iterative greedy algorithms have also been developed to recover the signal from the measurements . The Orthogonal Matching Pursuit (OMP) algorithm, for example, iteratively selects the vectors from the matrix that contain most of the energy of the measurement vector . The selection at each iteration is made based on inner products between the columns of and a residual; the residual reflects the component of that is orthogonal to the previously selected columns. The algorithm has been proven to successfully recover the acquired signal from incoherent measurements with high probability, at the expense of slightly more measurements, [26, 43]. Algorithms inspired by OMP, such as regularized orthogonal matching pursuit [44], CoSaMP [45], and Subspace Pursuit [46] have been shown to attain similar guarantees to those of their optimization-based counterparts. In the following, we will exploit both Basis Pursuit and greedy algorithms for recovering jointly sparse signals from incoherent measurements.

2.6 Properties of random measurements

In addition to offering substantially reduced measurement rates, CS has many attractive and intriguing properties, particularly when we employ random projections at the sensors. Random measurements are universal in the sense that any sparse basis can be used, allowing the same encoding strategy to be applied in different sensing environments. Random measurements are also future-proof: if a better sparsity-inducing basis is found for the signals, then the same measurements can be used to recover a more accurate view of the environment. Random coding is also robust: the measurements coming from each sensor have equal priority, unlike Fourier or wavelet coefficients in current coders. Finally, random measurements allow a progressively better recovery of the data as more measurements are obtained; one or more measurements can also be lost without corrupting the entire recovery.

2.7 Related work

Several researchers have formulated joint measurement settings for CS in sensor networks that exploit inter-signal correlations [28, 29, 30, 31, 32]. In their approaches, each sensor simultaneously records a single reading of some spatial field (temperature at a certain time, for example).555Note that in Section 2.7 only, refers to the number of sensors, since each sensor acquires a signal sample. Each of the sensors generates a pseudorandom sequence , and modulates the reading as . Each sensor then transmits its numbers in sequence to the collection point where the measurements are aggregated, obtaining measurements . Thus, defining and , the collection point automatically receives the measurement vector after transmission steps. The samples of the spatial field can then be recovered using CS provided that has a sparse representation in a known basis. These methods have a major limitation: since they operate at a single time instant, they exploit only inter-signal and not intra-signal correlations; that is, they essentially assume that the sensor field is i.i.d. from time instant to time instant. In contrast, we will develop signal models and algorithms that are agnostic to the spatial sampling structure and that exploit both inter- and intra-signal correlations.

Recent work has adapted DCS to the finite rate of innovation signal acquisition framework [47] and to the continuous-time setting [48]. Since the original submission of this paper, additional work has focused on the analysis and proposal of recovery algorithms for jointly sparse signals [49, 50].

3 Joint Sparsity Signal Models

In this section, we generalize the notion of a signal being sparse in some basis to the notion of an ensemble of signals being jointly sparse.

3.1 Notation

We will use the following notation for signal ensembles and our measurement model. Let denote the set of indices for the signals in the ensemble. Denote the signals in the ensemble by , with and assume that each signal . We use to denote sample in signal , and assume for the sake of illustration — but without loss of generality — that these signals are sparse in the canonical basis, i.e., . The entries of the signal can take arbitrary real values.

We denote by the measurement matrix for signal ; is and, in general, the entries of are different for each . Thus, consists of random measurements of . We will emphasize random i.i.d. Gaussian matrices in the following, but other schemes are possible, including random Bernoulli/Rademacher matrices, and so on.

To compactly represent the signal and measurement ensembles, we denote and define , , and as

(4)

with 0 denoting a matrix of appropriate size with all entries equal to 0. We then have . Equation (4) shows that separate measurement matrices have a characteristic block-diagonal structure when the entries of the sparse vector are grouped by signal.

Below we propose a general framework for joint sparsity models (JSMs) and three example JSMs that apply in different situations.

3.2 General framework for joint sparsity

We now propose a general framework to quantify the sparsity of an ensemble of correlated signals , which allows us to compare the complexities of different signal ensembles and to quantify their measurement requirements. The framework is based on a factored representation of the signal ensemble that decouples its location and value information.

To motivate this factored representation, we begin by examining the structure of a single sparse signal, where with nonzero entries. As an alternative to the notation used in (1), we can decouple the location and value information in by writing , where contains only the nonzero entries of , and is an identity submatrix, i.e., contains columns of the identity matrix . Any -sparse signal can be written in similar fashion. To model the set of all possible sparse signals, we can then let be the set of all identity submatrices of all possible sizes , with . We refer to as a sparsity model. Whether a signal is sufficiently sparse is defined in the context of this model: given a signal , one can consider all possible factorizations with . Among these factorizations, the unique representation with smallest dimensionality for equals the sparsity level of the signal under the model .

In the signal ensemble case, we consider factorizations of the form where as above, , and for various integers . We refer to and as the location matrix and value vector, respectively. A joint sparsity model (JSM) is defined in terms of a set of admissible location matrices with varying numbers of columns; we specify below additional conditions that the matrices must satisfy for each model. For a given ensemble , we let denote the set of feasible location matrices for which a factorization exists. We define the joint sparsity level of the signal ensemble as follows.

Definition 1

The joint sparsity level of the signal ensemble is the number of columns of the smallest matrix .

In contrast to the single-signal case, there are several natural choices for what matrices should be members of a joint sparsity model . We restrict our attention in the sequel to what we call common/innovation component JSMs. In these models each signal is generated as a combination of two components: () a common component , which is present in all signals, and () an innovation component , which is unique to each signal. These combine additively, giving

Note, however, that the individual components might be zero-valued in specific scenarios. We can express the component signals as

where and each have nonzero entries. Each matrix that can express such signals has the form

(5)

where , are identity submatrices. We define the value vector as , where and each , to obtain . Although the values of and are dependent on the matrix , we omit this dependency in the sequel for brevity, except when necessary for clarity.

If a signal ensemble , were to be generated by a selection of and , where all identity submatrices share a common column vector, then would not be full rank. In other cases, we may observe a vector that has zero-valued entries; i.e., we may have for some and some , or for some . In both of these cases, by removing one instance of this column from any of the identity submatrices, one can obtain a matrix with fewer columns for which there exists that gives . If , then we term this phenomenon sparsity reduction. Sparsity reduction, when present, reduces the effective joint sparsity of a signal ensemble. As an example of sparsity reduction, consider signals of length . Consider the coefficient of the common component and the corresponding innovation coefficients . Suppose that all other coefficients are zero. The location matrix that arises is

The span of this location matrix (i.e., the set of signal ensembles that it can generate) remains unchanged if we remove any one of the columns, i.e., if we drop any entry of the value vector . This provides us with a lower-dimensional representation of the same signal ensemble under the JSM ; the joint sparsity of is .

3.3 Example joint sparsity models

Since different real-world scenarios lead to different forms of correlation within an ensemble of sparse signals, we consider several possible designs for a JSM . The distinctions among our three JSMs concern the differing sparsity assumptions regarding the common and innovation components.

3.3.1 JSM-1: Sparse common component + innovations

In this model, we suppose that each signal contains a common component that is sparse plus an innovation component that is also sparse. Thus, this joint sparsity model (JSM-1) is represented by the set of all matrices of the form (5) with and all smaller than . Assuming that sparsity reduction is not possible, the joint sparsity .

A practical situation well-modeled by this framework is a group of sensors measuring temperatures at a number of outdoor locations throughout the day. The temperature readings have both temporal (intra-signal) and spatial (inter-signal) correlations. Global factors, such as the sun and prevailing winds, could have an effect that is both common to all sensors and structured enough to permit sparse representation. More local factors, such as shade, water, or animals, could contribute localized innovations that are also structured (and hence sparse). A similar scenario could be imagined for a network of sensors recording light intensities, air pressure, or other phenomena. All of these scenarios correspond to measuring properties of physical processes that change smoothly in time and in space and thus are highly correlated [51, 52].

3.3.2 JSM-2: Common sparse supports

In this model, the common component is equal to zero, each innovation component is sparse, and the innovations share the same sparse support but have different nonzero coefficients. To formalize this setting in a joint sparsity model (JSM-2) we let represent the set of all matrices of the form (5), where and for all . Here denotes an arbitrary identity submatrix of size , with . For a given , we may again partition the value vector , where each . It is easy to see that the matrices from JSM-2 are full rank. Therefore, when sparsity reduction is not possible, the joint sparsity .

The JSM-2 model is immediately applicable to acoustic and RF sensor arrays, where each sensor acquires a replica of the same Fourier-sparse signal but with phase shifts and attenuations caused by signal propagation. In this case, it is critical to recover each one of the sensed signals. Another useful application for this framework is MIMO communication [53].

Similar signal models have been considered in the area of simultaneous sparse approximation [53, 54, 55]. In this setting, a collection of sparse signals share the same expansion vectors from a redundant dictionary. The sparse approximation can be recovered via greedy algorithms such as Simultaneous Orthogonal Matching Pursuit (SOMP) [53, 54] or MMV Order Recursive Matching Pursuit (M-ORMP) [55]. We use the SOMP algorithm in our setting (Section 5.2) to recover from incoherent measurements an ensemble of signals sharing a common sparse structure.

3.3.3 JSM-3: Nonsparse common component + sparse innovations

In this model, we suppose that each signal contains an arbitrary common component and a sparse innovation component ; this model extends JSM-1 by relaxing the assumption that the common component has a sparse representation. To formalize this setting in the JSM-3 model, we let represent the set of all matrices (5) in which , the identity matrix. This implies each is smaller than while ; thus, we obtain and . Assuming that sparsity reduction is not possible, the joint sparsity . We also consider the specific case where the supports of the innovations are shared by all signals, which extends JSM-2; in this case we will have for all , with an identity submatrix of size . It is easy to see that in this case sparsity reduction is possible, and so the the joint sparsity can drop to . Note that separate CS recovery is impossible in JSM-3 with any fewer than measurements per sensor, since the common component is not sparse. However, we will demonstrate that joint CS recovery can indeed exploit the common structure.

A practical situation well-modeled by this framework is where several sources are recorded by different sensors together with a background signal that is not sparse in any basis. Consider, for example, a verification system in a component production plant, where cameras acquire snapshots of each component to check for manufacturing defects. While each image could be extremely complicated, and hence nonsparse, the ensemble of images will be highly correlated, since each camera is observing the same device with minor (sparse) variations.

JSM-3 can also be applied in non-distributed scenarios. For example, it motivates the compression of data such as video, where the innovations or differences between video frames may be sparse, even though a single frame may not be very sparse. In this case, JSM-3 suggests that we encode each video frame separately using CS and then decode all frames of the video sequence jointly. This has the advantage of moving the bulk of the computational complexity to the video decoder. The PRISM system proposes a similar scheme based on Wyner-Ziv distributed encoding [56].

There are many possible joint sparsity models beyond those introduced above, as well as beyond the common and innovation component signal model. Further work will yield new JSMs suitable for other application scenarios; an example application consists of multiple cameras taking digital photos of a common scene from various angles [57]. Extensions are discussed in Section 6.

4 Theoretical Bounds on Measurement Rates

In this section, we seek conditions on , the tuple of number of measurements from each sensor, such that we can guarantee perfect recovery of given . To this end, we provide a graphical model for the general framework provided in Section 3.2. This graphical model is fundamental in the derivation of the number of measurements needed for each sensor, as well as in the formulation of a combinatorial recovery procedure. Thus, we generalize Theorem 1 to the distributed setting to obtain fundamental limits on the number of measurements that enable recovery of sparse signal ensembles.

Based on the models presented in Section 3, recovering requires determining a value vector and location matrix such that . Two challenges immediately present themselves. First, a given measurement depends only on some of the components of , and the measurement budget should be adjusted between the sensors according to the information that can be gathered on the components of . For example, if a component does not affect any signal coefficient in sensor , then the corresponding measurements provide no information about . Second, the decoder must identify a location matrix from the set and the measurements .

4.1 Modeling dependencies using bipartite graphs

We introduce a graphical representation that captures the dependencies between the measurements in and the value vector , represented by and . Consider a feasible decomposition of into a full-rank matrix and the corresponding ; the matrix defines the sparsities of the common and innovation components and , , as well as the joint sparsity . Define the following sets of vertices: () the set of value vertices has elements with indices representing the entries of the value vector , and () the set of measurement vertices has elements with indices representing the measurements , with and . The cardinalities for these sets are and , respectively.

We now introduce a bipartite graph , that represents the relationships between the entries of the value vector and the measurements (see [4] for details). The set of edges is defined as follows:

  • For every and such that column of does not also appear as a column of , we have an edge connecting to each vertex for .

  • For every , we consider the sensor associated with column of , and we have an edge connecting to each vertex for .

In words, we say that , the measurement of sensor , measures if the vertex is linked to the vertex in the graph . An example graph for a distributed sensing setting is shown in Figure 1.

Figure 1: Bipartite graph for distributed compressive sensing (DCS). The bipartite graph indicates the relationship between the value vector coefficients and the measurements.

4.2 Quantifying redundancies

In order to obtain sharp bounds on the number of measurements needed, our analysis of the measurement process must account for redundancies between the locations of the nonzero coefficients in the common and innovation components. To that end, we consider the overlaps between common and innovation components in each signal. When we have and for a certain signal and some index , we cannot recover the values of both coefficients from the measurements of this signal alone; therefore, we will need to recover using measurements of other signals that do not feature the same overlap. We thus quantify the size of the overlap for all subsets of signals under a feasible representation given by and , as described in Section 3.2.

Definition 2

The overlap size for the set of signals , denoted , is the number of indices in which there is overlap between the common and the innovation component supports at all signals :

(6)

We also define and .

For , provides a penalty term due to the need for recovery of common component coefficients that are overlapped by innovations in all other signals . Intuitively, for each entry counted in , some sensor in must take one measurement to account for that entry of the common component — it is impossible to recover such entries from measurements made by sensors outside of . When all signals are considered, it is clear that all of the common component coefficients must be recovered from the obtained measurements.

4.3 Measurement bounds

Converse and achievable bounds for the number of measurements necessary for DCS recovery are given below. Our bounds consider each subset of sensors , since the cost of sensing the common component can be amortized across sensors: it may be possible to reduce the rate at one sensor (up to a point), as long as other sensors in offset the rate reduction. We quantify the reduction possible through the following definition.

Definition 3

The conditional sparsity of the set of signals is the number of entries of the vector that must be recovered by measurements , :

The joint sparsity gives the number of degrees of freedom for the signals in , while the conditional sparsity gives the number of degrees of freedom for signals in when the signals in are available as side information. Note also that Definition 1 for joint sparsity can be extended to a subset of signals by considering the number of entries of that affect these signals:

Note that .

The bipartite graph introduced in Section 4.1 is the cornerstone of Theorems 3, 4, and 5, which consider whether a perfect matching can be found in the graph; see the proofs in Appendices BD, and E, respectively, for detail.

Theorem 3

(Achievable, known ) Assume that a signal ensemble is obtained from a common/innovation component JSM . Let be a measurement tuple, let be random matrices having rows of i.i.d. Gaussian entries for each , and write . Suppose there exists a full rank location matrix such that

(7)

for all . Then with probability one over , there exists a unique solution to the system of equations ; hence, the signal ensemble can be uniquely recovered as .

Theorem 4

(Achievable, unknown ) Assume that a signal ensemble and measurement matrices follow the assumptions of Theorem 3. Suppose there exists a full rank location matrix such that

(8)

for all . Then can be uniquely recovered from with probability one over .

Theorem 5

(Converse) Assume that a signal ensemble and measurement matrices follow the assumptions of Theorem 3. Suppose there exists a full rank location matrix such that

(9)

for some . Then there exists a solution such that but .

The identification of a feasible location matrix causes the one measurement per sensor gap that prevents (8)–(9) from being a tight converse and achievable bound pair. We note in passing that the signal recovery procedure used in Theorem 4 is akin to -norm minimization on ; see Appendix D for details.

4.4 Discussion

The bounds in Theorems 35 are dependent on the dimensionality of the subspaces in which the signals reside. The number of noiseless measurements required for ensemble recovery is determined by the dimensionality of the subspace in the relevant signal model, because dimensionality and sparsity play a volumetric role akin to the entropy used to characterize rates in source coding. Whereas in source coding each bit resolves between two options, and typical inputs are described using bits [12], in CS we have . Similar to Slepian-Wolf coding [13], the number of measurements required for each sensor must account for the minimal features unique to that sensor, while at the same time features that appear among multiple sensors must be amortized over the group.

Theorems 35 can also be applied to the single sensor and joint measurement settings. In the single-signal setting (Theorem 1), we will have with , and ; Theorem 4 provides the requirement . It is easy to show that the joint measurement is equivalent to the single-signal setting: we stack all the individual signals into a single signal vector, and in both cases all measurements are dependent on all the entries of the signal vector. However, the distribution of the measurements among the available sensors is irrelevant in a joint measurement setting. Therefore, we only obtain a necessary condition on the total number of measurements required.

5 Practical Recovery Algorithms and Experiments

Although we have provided a unifying theoretical treatment for the three JSM models, the nuances warrant further study. In particular, while Theorem 4 highlights the basic tradeoffs that must be made in partitioning the measurement budget among sensors, the result does not by design provide insight into tractable algorithms for signal recovery. We believe there is additional insight to be gained by considering each model in turn, and while the presentation may be less unified, we attribute this to the fundamental diversity of problems that can arise under the umbrella of jointly sparse signal representations. In this section, we focus on tractable recovery algorithms for each model and, when possible, analyze the corresponding measurement requirements.

5.1 Recovery strategies for sparse common + innovations (JSM-1)

We first characterize the sparse common signal and innovations model JSM-1 from Section 3.3.1. For simplicity, we limit our description to signals, but describe extensions to multiple signals as needed.

5.1.1 Measurement bounds for joint recovery

Under the JSM-1 model, separate recovery of the signal via -norm minimization would require measurements, where accounts for sparsity reduction due to overlap between and . We apply Theorem 4 to the JSM-1 model to obtain the corollary below. To address the possibility of sparsity reduction, we denote by the number of indices in which the common component and all innovation components , overlap; this results in sparsity reduction for the common component.

Corollary 1

Assume the measurement matrices contain i.i.d. Gaussian entries. Then the signal ensemble can be recovered with probability one if the following conditions hold:

Our joint recovery scheme provides a significant savings in measurements, because the common component can be measured as part of any of the signals.

5.1.2 Stochastic signal model for JSM-1

To give ourselves a firm footing for analysis, in the remainder of Section 5.1 we use a stochastic process for JSM-1 signal generation. This framework provides an information theoretic setting where we can scale the size of the problem and investigate which measurement rates enable recovery. We generate the common and innovation components as follows. For the decision whether and is zero or not is an i.i.d. Bernoulli process, where the probability of a nonzero value is given by parameters denoted and , respectively. The values of the nonzero coefficients are then generated from an i.i.d. Gaussian distribution. The outcome of this process is that and have sparsities and . The parameters and are sparsity rates controlling the random generation of each signal. Our model resembles the Gaussian spike process [58], which is a limiting case of a Gaussian mixture model.

Likelihood of sparsity reduction and overlap: This stochastic model can yield signal ensembles for which the corresponding generating matrices allow for sparsity reduction; specifically, there might be overlap between the supports of the common component and all the innovation components , . For , the probability that a given index is present in all supports is . Therefore, the distribution of the cardinality of this overlap is . We must account for the reduction obtained from the removal of the corresponding number of columns from the location matrix when the total number of measurements is considered. In the same way we can show that the distributions for the number of indices in the overlaps required by Corollary 1 are and , where

Measurement rate region: To characterize DCS recovery performance, we introduce a measurement rate region. We define the measurement rate in an asymptotic manner as

Additionally, we note that

Thus, we also set , . For a measurement rate pair and sources and , we evaluate whether we can recover the signals with vanishing probability of error as increases. In this case, we say that the measurement rate pair is achievable.

For jointly sparse signals under JSM-1, separate recovery via -norm minimization would require a measurement rate . Separate recovery via -norm minimization would require an overmeasuring factor , and thus the measurement rate would become . To improve upon these figures, we adapt the standard machinery of CS to the joint recovery problem.

5.1.3 Joint recovery via -norm minimization

As discussed in Section 2.3, solving an -norm minimization is NP-hard, and so in practice we must relax our criterion in order to make the solution tractable. We now study what penalty must be paid for -norm recovery of jointly sparse signals. Using the vector and frame

(11)

we can represent the concatenated measurement vector sparsely using the concatenated coefficient vector , which contains nonzero coefficients, to obtain . With sufficient overmeasuring, we have seen experimentally that it is possible to recover a vector , which yields , , by solving the weighted -norm minimization

(12)

where . We call this the -weighted -norm formulation; our numerical results (Section 5.1.6 and our technical report [59]) indicate a reduction in the requisite number of measurements via this enhancement. If and , then without loss of generality we set and numerically search for the best parameter . We discuss the asymmetric case with and in the technical report [59].

5.1.4 Converse bound on performance of -weighted -norm minimization

We now provide a converse bound that describes what measurement rate pairs cannot be achieved via the -weighted -norm minimization. Our notion of a converse focuses on the setting where each signal is measured via multiplication by the by matrix and joint recovery is performed via our -weighted -norm formulation (12). Within this setting, a converse region is a set of measurement rates for which the recovery fails with overwhelming probability as increases.

We assume that sources have innovation sparsity rates that satisfy . Our first result, proved in Appendix F, provides deterministic necessary conditions to recover the components , , and , using the -weighted -norm formulation (12). We note that the lemma holds for all such combinations of components that generate the same signals and .

Lemma 1

Consider any , , and in the -weighted -norm formulation (12). The components , , and can be recovered using measurement matrices and only if (i) can be recovered via -norm minimization (3) using and measurements ; (ii) can be recovered via -norm minimization using and measurements ; and (iii) can be recovered via -norm minimization using the joint matrix and measurements .

Lemma 1 can be interpreted as follows. If and are not large enough individually, then the innovation components and cannot be recovered. This implies a converse bound on the individual measurement rates and . Similarly, combining Lemma 1 with the converse bound of Theorem 2 for single-source -norm minimization of the common component implies a lower bound on the sum measurement rate .

Anticipated converse: As shown in Corollary 1, for indices such that and differ and are nonzero, each sensor must take measurements to account for one of the two coefficients. In the case where , the joint sparsity rate is . We define the measurement function based on Donoho and Tanner’s oversampling factor (Theorem 2). It can be shown that the function is concave; in order to minimize the sum rate bound, we “explain” as many of the sparse coefficients in one of the signals and as few as possible in the other. From Corollary 1, we have . Consequently, one of the signals must “explain” this sparsity rate, whereas the other signal must explain the rest:

Unfortunately, the derivation of relies on Gaussianity of the measurement matrix, whereas in our case has a block matrix form. Therefore, the following conjecture remains to be proved rigorously.

Conjecture 1

Let and fix the sparsity rate of the common component and the innovation sparsity rates . Then the following conditions on the measurement rates are necessary to enable recovery with probability one:

5.1.5 Achievable bound on performance of -norm minimization

Figure 2: Recovering a signal ensemble with sparse common + innovations (JSM-1). We chose a common component sparsity rate and innovation sparsity rates . Our simulation results use the -weighted -norm formulation (12) on signals of length ; the measurement rate pairs that achieved perfect recovery over 100 simulations are denoted by circles.

We have not yet characterized the performance of -weighted -norm formulation (12) analytically. Instead, Theorem 6 below uses an alternative -norm based recovery technique. The proof describes a constructive recovery algorithm. We construct measurement matrices and , each consisting of two parts. The first parts of the matrices are identical and recover . The second parts of the matrices are different and enable the recovery of . Once these two components have been recovered, the computation of and is straightforward. The measurement rate can be computed by considering both identical and different parts of the measurement matrices.

Theorem 6

Let , and fix the sparsity rate of the common component and the innovation sparsity rates . If the measurement rates satisfy the following conditions:

(13a)
(13b)

then we can design measurement matrices and with random Gaussian entries and an -norm minimization recovery algorithm that succeeds with probability approaching one as increases. Furthermore, as the sum measurement rate approaches .

The theorem is proved in Appendix G. The recovery algorithm of Theorem 6 is based on linear programming. It can be extended from to an arbitrary number of signals by recovering all signal differences of the form in the first stage of the algorithm and then recovering in the second stage. In contrast, our -weighted -norm formulation (12) recovers a length- signal. Our simulation experiments (Section 5.1.6) indicate that the -weighted formulation can recover using fewer measurements than the approach of Theorem 6.

The achievable measurement rate region of Theorem 6 is loose with respect to the region of the anticipated converse Conjecture 1 (see Figure 2). We leave for future work the characterization of a tight measurement rate region for computationally tractable (polynomial time) recovery techniques.

5.1.6 Simulations for JSM-1

We now present simulation results for several different JSM-1 settings. The -weighted -norm formulation (12) was used throughout, where the optimal choice of , , and depends on the relative sparsities , , and . The optimal values have not been determined analytically. Instead, we rely on a numerical optimization, which is computationally intense. A detailed discussion of our intuition behind the choice of appears in the technical report [59].

Recovering two signals with symmetric measurement rates: Our simulation setting is as follows. The signal components , , and are assumed (without loss of generality) to be sparse in with sparsities , , and , respectively. We assign random Gaussian values to the nonzero coefficients. We restrict our attention to the symmetric setting in which and , and consider signals of length where .

Figure 3: Comparison of joint decoding and separate decoding for JSM-1. The advantage of joint over separate decoding depends on the common component sparsity.

In our joint decoding simulations, we consider values of and in the range between 10 and 40. We find the optimal in the -weighted -norm formulation (12) using a line search optimization, where simulation indicates the “goodness” of specific values in terms of the likelihood of recovery. With the optimal , for each set of values we run several thousand trials to determine the empirical probability of success in decoding and . The results of the simulation are summarized in Figure 3. The savings in the number of measurements can be substantial, especially when the common component is large (Figure 3). For , , is reduced by approximately . For smaller , joint decoding barely outperforms separate decoding, since most of the measurements are expended on innovation components. Additional results appear in [59].

Recovering two signals with asymmetric measurement rates: In Figure 2, we compare separate CS recovery with the anticipated converse bound of Conjecture 1, the achievable bound of Theorem 6, and numerical results.

We use signals and choose a common component sparsity rate and innovation sparsity rates . We consider several different asymmetric measurement rates. In each such setting, we constrain to have the form for some , with . The results plotted indicate the smallest pairs for which we always succeeded recovering the signal over simulation runs. In some areas of the measurement rate region our -weighted -norm formulation (12) requires fewer measurements than the achievable approach of Theorem 6.

Recovering multiple signals with symmetric measurement rates: The -weighted -norm recovery technique of this section is especially promising when sensors are used. These savings may be valuable in applications such as sensor networks, where data may contain strong spatial (inter-source) correlations.

We use