Support Recovery of Sparse Signals

Support Recovery of Sparse Signals

Yuzhe Jin,  Young-Han Kim,  and Bhaskar D. Rao,  The material in this paper was presented in part at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, Nevada, USA, March 2008 and the IEEE International Symposium on Information Theory (ISIT), Toronto, Ontario, Canada, July 2008. A short version of the paper was submitted to the IEEE International Symposium on Information Theory (ISIT), Austin, Texas, USA, 2010.
Abstract

We consider the problem of exact support recovery of sparse signals via noisy measurements. The main focus is the sufficient and necessary conditions on the number of measurements for support recovery to be reliable. By drawing an analogy between the problem of support recovery and the problem of channel coding over the Gaussian multiple access channel, and exploiting mathematical tools developed for the latter problem, we obtain an information theoretic framework for analyzing the performance limits of support recovery. Sharp sufficient and necessary conditions on the number of measurements in terms of the signal sparsity level and the measurement noise level are derived. Specifically, when the number of nonzero entries is held fixed, the exact asymptotics on the number of measurements for support recovery is developed. When the number of nonzero entries increases in certain manners, we obtain sufficient conditions tighter than existing results. In addition, we show that the proposed methodology can deal with a variety of models of sparse signal recovery, hence demonstrating its potential as an effective analytical tool.

I Introduction

Consider the estimation of a sparse signal in high dimension via linear measurements , where is referred to as the measurement matrix and is the measurement noise. A sparse signal is informally described as a signal whose representation in certain coordinates contains a large proportion of zero coefficients. In this paper, we mainly consider signals that are sparse with respect to the canonical basis of the Euclidean space. The goal is to estimate the sparse signal by making as few number of measurements as possible. This problem has received much attention from many research principles, motivated by a wide spectrum of applications such as compressed sensing [1, 2], biomagnetic inverse problems [3], [4], image processing [5], [6], bandlimited extrapolation and spectral estimation [7], robust regression and outlier detection [8], speech processing [9], channel estimation [10], [11], echo cancellation [12], [13], and wireless communication [10], [14].

Computationally efficient algorithms for sparse signal recovery have been proposed to find or approximate the sparse solution in various settings. A partial list includes matching pursuit [15], orthogonal matching pursuit [16], lasso [17], basis pursuit [18], FOCUSS [3], sparse Bayesian learning [19], finite rate of innovation [20], CoSaMP[21], and subspace pursuit [22]. At the same time, many exciting mathematical tools have been developed to analyze the performance of these algorithms. In particular, Donoho [1], Donoho, Elad, and Temlyakov [23], and Candès and Tao [24], and Candès, Romberg, and Tao[25] presented sufficient conditions for -norm minimization algorithms, including basis pursuit, to successfully recover the sparse signals with respect to certain performance metrics. Tropp [26], Tropp and Gilbert [27], and Donoho, Tsaig, Drori, and Starck [28] studied greedy sequential selection methods such as matching pursuit and its variants. In these papers, the structural properties of the measurement matrix , including coherence metrics [15], [23], [26], [29] and spectral properties [1], [24], are used as the major ingredient of the performance analysis. By using random sensing matrices, these results translate to relatively simple tradeoffs between the dimension of the signal , the number of nonzero entries in , and the number of measurements to ensure asymptotically successful reconstruction of the sparse signal. In the absence of measurement noise, i.e., , the performance metric employed is the ability to recover the exact sparse signal [24]. When the measurement noise is present, the Euclidean distance between the recovered signal and the true signal has been often employed as the performance metric [23], [25].

In many applications, however, finding the exact support of the signal is important even in the noisy setting. For example, in applications of medical imaging, magnetoencephalography (MEG) and electroencephalography (EEG) are common approaches for collecting noninvasive measurements of external electromagnetic signals [30]. A relatively fine spatial resolution is required to localize the neural electrical activities from a huge number of potential locations [31]. In the domain of cognitive radio, spectrum sensing plays an important role in identifying available spectrum for communication, where estimating the number of active subbands and their locations becomes a nontrivial task [32]. In multiple-user communication systems such as a code-division multiple access (CDMA) system, the problem of neighbor discovery requires identification of active nodes from all potential nodes in a network based on a linear superposition of the signature waveforms of the active nodes [14]. In all these problems, finding the support of the sparse signal is more important than approximating the signal vector in the Euclidean distance. Hence, it is important to understand performance issues in the exact support recovery of sparse signals with noisy measurements. Information theoretic tools have proven successful in this direction. Wainwright [33], [34] considered the problem of exact support recovery using the optimal maximum likelihood decoder. Necessary and sufficient conditions are established for different scalings between the sparsity level and signal dimension. Using the same decoder, Fletcher, Rangan, and Goyal [35], [36] recently improved the necessary condition. Wang, Wainwright, and Ramchandran [37] also presented a set of necessary conditions for exact support recovery. Akçakaya and Tarokh [38] analyzed the performance of a joint typicality decoder and applied it to find a set of necessary and sufficient conditions under different performance metrics including the one for exact support recovery. In addition, a series of papers have leveraged many information theoretic tools, including rate-distortion theory [39], [40], expander graphs [41], belief propagation and list decoding [42], and low-density parity-check codes [43], to design novel algorithms for sparse signal recovery and to analyze their performances.

In this paper, we develop sharper asymptotic tradeoffs between the signal dimension , the number of nonzero entries , and number of measurements for reliable support recovery in the noisy setting. When is held fixed, we show that is sufficient and necessary. We give a complete characterization of that depends on the values of the nonzero entries of . When increases in certain manners as specified later, we obtain sufficient and necessary conditions for perfect support recovery which improve upon existing results. Our main results are inspired by the analogy to communication over the additive white Gaussian noise multiple access channel (AWGN-MAC) [44, 45]. According to this connection, the columns of the measurement matrix form a common codebook for all senders. Codewords from the senders are individually multiplied by unknown channel gains, which correspond to nonzero entries of . Then, the noise corrupted linear combination of these codewords is observed. Thus, support recovery can be interpreted as decoding messages from multiple senders. With appropriate modifications, the techniques for deriving multiple-user channel capacity can be leveraged to provide performance tradeoffs for support recovery.

The analogy between the problems of sparse signal recovery and channel coding has been observed from various perspectives in parallel work[39], [46, IV-D], [37, II-A], [38, III-A], [28, 11.2]. However, our approach is different from the existing literature in several aspects. First, we explicitly connect the problem of exact support recovery to that of multiple access communication by interpreting the sparse signal measurement model as a multiple access channel model. In spite of their similarity, however, there are also important differences between them which make a straightforward translation of known results nontrivial. We customize tools from multiple-user information theory (e.g., signal value estimation, distance decoding, Fano’s inequality) to tackle the support recovery problem. Second, equipped with this analytic framework, we can obtain a performance tradeoff sharper than existing results. Moreover, the analytical framework can be extended to different models of sparse signal recovery, such as non-Gaussian measurement noise, sources with random activity levels, and multiple measurement vectors (MMV).

The rest of the paper is organized as follows. We formally state the support recovery problem in Section II. To motivate the main results of the paper and their proof techniques, we discuss in Section III the similarities and differences between the support recovery problem and the multiple access communication problem. Our main results are presented in Section IV, together with comparisons to existing results in the literature. The proofs of the main theorems are presented in Appendices A, B, C, and D, respectively. Section V further extends the results to different signal models and measurement procedures.

Throughout this paper, a set is a collection of unique objects. Let denote the -dimensional real Euclidean space. Let denote the set of natural numbers. Let denote the set . The notation denotes the cardinality of set , denotes the -norm of a vector , and denotes the Frobenius norm of a matrix . The expression denotes , denotes as for some constant , denotes and , denotes , and denotes .

Ii Problem Formulation

Let , where for all . Let be such that , …, are chosen uniformly at random from without replacement. In particular, is uniformly distributed over all size- subsets of . Then, the signal of interest is generated as

(1)

Thus, the support of is . According to the signal model (1), . Throughout this paper, we assume is known. The signal is said to be sparse when .

We measure through the linear operation

(2)

where is the measurement matrix, is the measurement noise, and is the noisy measurement. We further assume that the noise are independently and identically distributed (i.i.d.) according to the Gaussian distribution .

Upon observing the noisy measurement , the goal is to recover the support of the sparse signal . A support recovery map is defined as

(3)

Given the signal model (1), the measurement model (2), and the support recovery map (3), we define the average probability of error by

(4)

for each (unknown) signal value vector .

Iii An Information Theoretic Perspective on Sparse Signal Recovery

In this section, we will introduce an important interpretation of the problem of sparse signal recovery via a communication problem over the Gaussian multiple access channel. The similarities and differences between the two problems will be elucidated, hence progressively unraveling the intuition and facilitating technical preparation for the main results and their proof techniques.

Iii-a Brief Review on the AWGN-MAC

We start by reviewing the background on the -sender multiple access channel (MAC). Suppose the senders wish to transmit information to a common receiver. Each sender has access to a codebook , where is a codeword and is the number of codewords in . The rate for the th sender is . To transmit information, each sender chooses a codeword from its codebook, and all senders transmit their codewords simultaneously over an AWGN-MAC [47]:

(5)

where denotes the input symbol from the th sender to the channel at the th use of the channel, denotes the channel gain associated with the th sender, is the additive noise, i.i.d. , and is the channel output.

Upon receiving , the receiver needs to determine the codewords transmitted by each sender. Since the senders interfere with each other, there is an inherent tradeoff among their operating rates. The notion of capacity region is introduced to capture this tradeoff by characterizing all possible rate tuples at which reliable communication can be achieved with diminishing error probability of decoding. By assuming each sender obeys the power constraint for all and all , the capacity region of an AWGN-MAC with known channel gains [47] is

(6)

Iii-B Connecting Sparse Signal Recovery to the AWGN-MAC

In the measurement model (2), one can remove the columns in which are nulled out by zero entries in and obtain the following effective form of the measurement procedure

(7)

By contrasting (7) to AWGN MAC (5), we can draw the following key connections that relate the two problems [44].

  1. A nonzero entry as a sender: We can view the existence of a nonzero entry position as sender that accesses the MAC.

  2. as a codeword: We treat the measurement matrix as a codebook with each column , , as a codeword. Each element of is fed one by one to the channel (5) as the input symbol , resulting in uses of the channel. The noise and measurement can be related to the channel noise and channel output in the same fashion.

  3. as a channel gain: The nonzero entry in (7) plays the role of the channel gain in (5). Essentially, we can interpret the vector representation (7) as consecutive uses of the -sender AWGN-MAC (5) with appropriate stacking of the inputs/outputs into vectors.

  4. Similarity between objectives: In the problem of sparse signal recovery, the goal is to find the support of the signal. In the problem of MAC communication, the receiver’s goal is to determine the indices of codewords, i.e., , that are transmitted by the senders.

Based on the abovementioned aspects, the two problems share significant similarities which enable leveraging the information theoretic methods for performance analysis of support recovery of sparse signals. However, as we shall see next, there are domain specific differences between the support recovery problem and the channel coding problem that should be addressed accordingly to rigorously apply the information theoretic approaches.

Iii-C Key Differences

  1. Common codebook: In MAC communication, each sender uses its own codebook. However, in sparse signal recovery, the “codebook” is shared by all “senders”. All senders choose their codewords from the same codebook and hence operate at the same rate. Different senders will not choose the same codeword, or they will collapse into one sender.

  2. Unknown channel gains: In MAC communication, the capacity region (6) is valid assuming that the receiver knows the channel gain [48]. In contrast, for sparse signal recovery problem, is actually unknown and needs to be estimated. Although coding techniques and capacity results are available for communication with channel uncertainty, a closer examination indicates that those results are not directly applicable to our problem. For instance, channel training with pilot symbols is a common practice to combat channel uncertainty [49]. However, it is not obvious how to incorporate the training procedure into the measurement model (2), and hence the related results are not directly applicable.

Once these differences are properly accounted for, the connection between the problems of sparse signal recovery and channel coding makes available a variety of information theoretic tools for handling performance issues pertaining to the support recovery problem. Based on techniques that are rooted in channel capacity results, but suitably modified to deal with the differences, we will present the main results of this paper in the next section.

Iv Main Results and Their Implications

Iv-a Fixed Number of Nonzero Entries

To discover the precise impact of the values of the nonzero entries on support recovery, we consider the support recovery of a sequence of sparse signals generated with the same signal value vector . In particular, we assume that is fixed. Define the auxiliary quantity

(8)

For example, when ,

We can see from Section III that this quantity is closely related to the -sender multiple access channel capacity with equal-rate constraint.

In the following two theorems, we summarize our main results under the assumption that is fixed. The subscript in denotes possible dependence between and . The proof of the theorems are presented in Appendices A and B, respectively.

Theorem 1

If

(9)

then there exist a sequence of matrices , , and a sequence of support recovery maps , , such that

(10)

and

(11)
Theorem 2

If there exist a sequence of matrices , , and a sequence of support recovery maps , , such that

(12)

and

(13)

then

(14)

Theorems 1 and 2 together indicate that is sufficient and necessary for exact support recovery. The constant is explicitly characterized, capturing the role of signal strength in support recovery.

Iv-B Growing Number of Nonzero Entries

Next, we consider the support recovery for the case where , the number of nonzero entries, grows with , the dimension of the signal. We assume that the magnitude of a nonzero entry is bounded from both below and above. Meanwhile, we consider using random measurement matrices drawn from the Gaussian distribution, which makes it more convenient to compare with existing results in the literature. Note that we can easily establish corresponding results on the existence of arbitrary measurement matrices as in Theorems 1 and 2.

First, we present a sufficient condition for exact support recovery. The proof can be found in Appendix C.

Theorem 3

Let be a sequence of vectors satisfying and for all . Let be generated as . If

(15)

then there exists a sequence of support recovery maps , , such that

To better understand Theorem 3, we present the following implication of (15) that shows the tradeoffs between the order of versus and .

Corollary 1

Under the assumption of Theorem 3,

provided that

In particular, we have the following:

  1. When , for example , the sufficient number of of measurements is .

  2. When , for example , the sufficient number of of measurements is . In this case, , and hence . Thus, is a better sufficient condition than .

  3. When , for example , the sufficient number of of measurements is .

  4. When , the sufficient number of of measurements is .

The following table on the next page summarizes the sufficient orders of paired with different relations between and in Corollary 1.

Relation between and Sufficient

In the existing literature, Wainwright [34] and Akçakaya and Tarokh [38] both derived sufficient conditions for exact support recovery. Under the same assumption of Theorem 3, the sufficient conditions presented in these papers, respectively, are summarized in the following table:

Relation between and Wainwright[34] Akçakaya et al. [38]

To compare the results, we first examine the case of (i.e., sublinear sparsity). Note that in the regime where , our sufficient condition on includes lower order growth rate, hence is better, than existing results. In the regime where , there exists a certain scenario, e.g., , in which our sufficient condition is of the same order as in [38] but higher than in [34]. In the case of (i.e., linear sparsity), we see that our sufficient condition is stricter, implying its inferiority to existing results in this regime.

Next, we present a necessary condition, the proof of which can be found in Appendix D.

Theorem 4

Let be a sequence of vectors satisfying and for all . Let be generated as . If

(16)

then for any sequence of support recovery maps , , we have

To compare with existing results under the same assumption111The necessary conditions derived in [34], [37], and [38] were originally derived under slightly different assumptions. Here we adapted them to compare the asymptotic orders of . of Theorem 4, we first note that when , the necessary condition is , which follows simply from the elementary constraint that the number of measurements has to be no smaller than the number of nonzero entries for support recovery to be possible. Contrasted by the sufficient conditions derived in [34] and [38], is the necessary and sufficient condition for linear sparsity. When , we summarize the necessary conditions developed in previous papers in the following table:

footnotetext: This result is implied in [38], by identifying in Thm. 1.6 therein, and clarifying the order of . The proof of Thm. 1.6 states that (below its (25)) asymptotically reliable support recovery is not possible if . Note that . Hence, we consider an appropriate necessary condition resulting from the proof in [38].
Relation between and
Wainwright [34]
Wang et al. [37]
Akçakaya et al. [38]
Theorem 4

In this case, is the best known necessary condition.

Iv-C Further Observations

Note that for the sublinear sparsity with , both and are of the same order and hence our sufficient and necessary conditions both indicate . This provides a sharp performance tradeoff for support recovery in this specific regime, which to our knowledge has not been observed in previous work (see, for example, the remarks in [34, III-A], [36, III-Remark 2)]). For the regime where , the orders of in any pair of sufficient and necessary conditions have a nontrivial difference, leaving an open question on further narrowing the gap in this remaining regime of sublinear sparsity.

In addition, it is worthwhile to note that our analytical framework could also be adapted to the case where . This is a scenario extensively discussed in [34, 36, 37]. We will not pursue this direction in detail.

V Extensions

The connection between the problems of support recovery and channel coding can be further explored to provide the performance tradeoff for different models of sparse signal recovery. Next, we discuss its potential to address several important variants.

V-a Non-Gaussian Noise

Note that the rules for support recovery, mainly reflected in (22) and (28) in the proof of Theorem 1 in Appendix A, are similar to the method of nearest neighbor decoding in information theory. Following the argument in [50], one can show that by replacing the assumption in (2) on measurement noise with , the results in the previous theorems continue to hold.

V-B Random Signal Activities

In Theorem 1, is assumed to be a fixed vector of nonzero entries. We now relax this condition to allow random , which leads to sparse signals whose nonzero entries are randomly generated and located. For simplicity of exposition, assume that is fixed. Interestingly, the model (2) with this new assumption can now be contrasted to a MAC with random channel gains

(17)

The difference between (17) and (5) is that the channel gains are random variables in this case. Specifically, in order to contrast the problem of support recovery of sparse signals, should be considered as being realized once and then kept fixed during the entire channel use [44]. This channel model is usually termed as a slow fading channel [48].

The following theorem states the performance of support recovery of sparse signals under random signal activities.

Theorem 5

Suppose has bounded support, and Let be generated as . Then, there exists a sequence of support recovery maps , , such that

where is defined in (8).

Proof:

Note that

(18)
(19)

where (18) follows from Fatou’s lemma [51] and (19) follows by applying the proof of Theorem 1 to the integrand. \qed

Theorem 5 implies that generally, rather than having a diminishing error probability, we have to tolerate certain error probability which is upper-bounded by , when the nonzero values are randomly generated. Conversely, in order to design a system with probability of success at least , one can find that satisfies .

V-C Multiple Measurement Vectors

Recently, increasing research effort has been focused on sparse signal recovery with multiple measurement vectors (MMV) [52, 53, 54, 55, 56]. In this problem, we wish to measure multiple sparse signals , , , and that possess a common sparsity profile, that is, the locations of nonzero entries are the same in each . We use the same measurement matrix to perform

(20)

where , is the measurement noise, and is the noisy measurement.

Note that the model (2) can be viewed as a special case of the MMV model (20) with . The methodology that has been developed in this paper has the potential to be extended to deal with the performance issues with the MMV model by noting the following connections to channel coding[44]. First, the same set of columns in are scaled by entries in different , forming outputs as elements in different . The nonzero entries of can then be viewed as the coefficients that connect different pairs of inputs and outputs of a channel. Second, each measurement vector can be viewed as the received symbols at the th receiver, and hence the MMV model indeed corresponds to a multiple-input multiple-output (MIMO) channel model. Third, the aim is to recover the locations of nonzero rows of upon receiving . This implies that, in the language of MIMO channel communication, the receivers will fully collaborate to decode the information sent by all senders. Via proper accommodation of the method developed in this paper, the capacity results for MIMO channels can be leveraged to shed light on the performance tradeoff of sparse signal recovery with MMV.

Vi Concluding Remarks

In this paper, we developed techniques rooted in multiple-user information theory to address the performance issues in the exact support recovery of sparse signals, and discovered necessary and sufficient conditions on the number of measurements. It is worthwhile to note that the interpretation of sparse signal recovery as MAC communication opens new avenues to different theoretic and algorithmic problems in sparse signal recovery. We conclude this paper by briefly discussing several interesting potential directions made possible by this interpretation:

  1. Among the large collection of algorithms for sparse signal recovery, the sequential selection methods, including matching pursuit [15] and orthogonal matching pursuit (OMP) [16], determine one nonzero entry at a time, remove its contribution in the residual signal, and repeat this procedure until certain stopping criterion is satisfied. In contrast, the class of convex relaxation methods, including basis pursuit [18] and lasso [17], jointly estimate the nonzero entries.

    Potentially, the sequential selection methods can be viewed as successive interference cancellation (SIC) decoding [48] for multiple access channels, whereas the convex relaxation methods can be viewed as joint decoding. It would be interesting to ask whether one can make these analogies more precise and use them to address performance issues. Similarities at an intuitive level between OMP and SIC have been discussed in [45] with performance results supported by empirical evidence. More insights are yet to be explored.

  2. The design of channel codes and the development of decoding methods have been extensively studied in the contexts of information theory and wireless communication. Some of these ideas have been transformed into design principles for sparse signal recovery [41], [42], [43] as mentioned in the introduction. Thus far, however, the efforts in utilizing the codebook designs and decoding methods are mainly focused on the point-to-point channel model, which implies that the recovery methods iterate between first recovering one nonzero entry or a group of nonzero entries by treating the rest of them as noise and then removing the recovered nonzero entries from the residual signal. In this paper, we established the analogy between the sparse signal recovery and the multiple access communication. It motivates us to envision opportunities beyond a point-to-point channel model. As one important question, for example, can we develop practical codes for joint decoding and reconstruction techniques to simultaneously recover all the nonzero entries?

  3. Last but not the least, we return to one remaining open question from this paper. Recall that for sublinear sparsity, there exists a certain regime in which the tight bound on the number of measurements is not known yet. Can we further improve the result in this regime, thereby closing the gap between sufficient and necessary conditions on the number of measurements for arbitrary scalings among the model parameters?

Vii Acknowledgements

This research was supported by NSF Grants CCF-0830612 and CCF-0747111. Y. J. wants to thank Liwen Yu for insightful discussions on perspectives of MAC communication.

Appendix A Proof of Theorem 1

The proof of Theorem 1 employs the distance decoding technique [50]. We will first randomly generate the measurement matrix and show that the error probability averaged over this ensemble tends to zero as . This naturally leads to the existence of a sequence of deterministic matrices for achieving diminishing error probability of support reconstruction. We randomly generate the measurement matrix with entries drawn independently according to , . Let denote the th column of .

For simplicity of exposition, we describe the support recovery procedure for two distinct cases on the number of nonzero entries.

Case 1: . In this case, the signal of interest is . Consider the following support recovery procedures. Fix . First form an estimate of as

(21)

Declare that is the estimated location for the nonzero entry, i.e., , if it is the unique index such that

(22)

for either or . If there is none or more than one, pick an arbitrary index.

We now analyze the average probability of error

where the expectation is taken with respect to the random measurement matrix . Due to the symmetry in the problem and the measurement matrix generation, we assume without loss of generality , that is,

for some . In the following analysis, we drop superscripts and subscripts on for notational simplicity when no ambiguity arises. Define the events

Then

(23)

Let

Then, by the union of events bound and the fact that ,

(24)

We bound each term in (24). First, by the weak law of large numbers (LLN), Next, we consider . If ,

(25)

For any , as , by the LLN,

Hence, we have for the first term in (25)

Following a similar reasoning, for the second term in (25),

and for the third term,

Therefore, for any ,

which implies that

Similarly, if ,

Hence,

For the third term in (24), we need the following lemma, whose proof is presented at the end of this appendix:

Lemma 1

Let . Let be a real sequence satisfying

Let be an i.i.d. random sequence where . Then, for any ,

Continuing the proof of Theorem 1, we consider for . Then

Since is independent of and , it follows from the definition of and Lemma 1 (with and ) that

for , if is sufficiently small. Thus,

and therefore

which tends to zero as , if

(26)

Therefore, by (24), the probability of error averaged over the random matrix , , tends to zero as , if (26) is satisfied, which in turn implies that there exists a sequence of nonrandom measurement matrices such that and if (26) is satisfied. Finally, since is chosen arbitrarily, we have the desired proof of Theorem 1.

Case 2: . In this case, the signal of interest is , where and . Consider the following support recovery procedures. Fix . First, form an estimate of as

(27)

For , let be a minimal set of points in satisfying the following properties:

  1. , where is the -dimensional hypersphere of radius .

  2. For any , there exists such that .

The following properties can be easily proved:

Lemma 2

1)

2) is monotonically non-decreasing in for fixed .

Given and , fix . Declare is the recovered support of the signal, if it is the unique set of indices such that

(28)

for some . If these is none or more than one such set, pick an arbitrary set of indices.

Next, we analyze the average probability of error