An Estimation Theoretic Approach for Sparsity Pattern Recovery in the Noisy Setting

An Estimation Theoretic Approach for Sparsity Pattern Recovery in the Noisy Setting

Ali Hormati, , Amin Karbasi, ,
Soheil Mohajer, , and Martin Vetterli,
The authors are with the School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland (e-mails: {ali.hormati, amin.karbasi, soheil.mohajer, martin.vetterli}@epfl.ch). Martin Vetterli is also with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA.This work was supported by the Swiss National Science Foundation under grants NCCR-MICS-51NF40-111400 and 200020-103729. The material in this work was presented in part at the IEEE International Symposium on Information Theory, Seoul, Korea, June 2009.
Abstract

Compressed sensing deals with the reconstruction of sparse signals using a small number of linear measurements. One of the main challenges in compressed sensing is to find the support of a sparse signal. In the literature, several bounds on the scaling law of the number of measurements for successful support recovery have been derived where the main focus is on random Gaussian measurement matrices.

In this paper, we investigate the noisy support recovery problem from an estimation theoretic point of view, where no specific assumption is made on the underlying measurement matrix. The linear measurements are perturbed by additive white Gaussian noise. We define the output of a support estimator to be a set of position values in increasing order. We set the error between the true and estimated supports as the -norm of their difference. On the one hand, this choice allows us to use the machinery behind the -norm error metric and on the other hand, converts the support recovery into a more intuitive and geometrical problem. First, by using the Hammersley-Chapman-Robbins (HCR) bound, we derive a fundamental lower bound on the performance of any unbiased estimator of the support set. This lower bound provides us with necessary conditions on the number of measurements for reliable -norm support recovery, which we specifically evaluate for uniform Gaussian measurement matrices. Then, we analyze the maximum likelihood estimator and derive conditions under which the HCR bound is achievable. This leads us to the number of measurements for the optimum decoder which is sufficient for reliable -norm support recovery and shows that the performance of the optimum decoder has only a 9 dB gap compared to the HCR lower bound. Using this framework, we specifically evaluate sufficient conditions on the number of measurements for uniform Gaussian measurement matrices.

Compressed sensing, compressive sampling, support recovery, Hammersley-Chapman-Robbins bound, Cramer-Rao bound, unbiased estimator, maximum-likelihood estimator

I Introduction

Linear sampling of sparse signals, with the number of measurements close to their sparsity level, has recently received a lot of attention under the names of compressed sensing (CS), compressive sampling or sparse sampling [2, 3, 4, 5]. A -sparse signal is defined as a signal with nonzero expansion coefficients in some orthonormal basis or frame. The goal of compressed sensing is to find measurement matrices , followed by reconstruction algorithms which allow robust recovery of sparse signals using the least number of measurements , and low computational complexity; see for example [6, 7, 8, 9, 10, 11].

Support recovery refers to the problem of estimating the positions of the non-zero entries of , based on a set of observations. In the noiseless setting, the optimal algorithm requires samples at the expense of high computational complexity to obtain the true support set [12] while measurements are needed for the reconstruction algorithms based on linear programming [13]. In the same context, it is shown that samples are sufficient for shift-invariant measurement matrices using recovery algorithms based on annihilating filters [14].

In practice, however, all the measurements are noisy due to physical restrictions, quantization precision, etc. A large body of recent work has established bounds on the number of measurements required for successful support set recovery in the noisy setting. Denoting as the minimum non-zero coefficient of the sparse vector , the authors in [15, 16] derived the scaling law on the number of measurements as a function of for the -constrained quadratic programming, also referred to as Lasso, to recover the sparsity pattern. In the context of the optimal decoding algorithm, the results in [17, 18] provide necessary and sufficient conditions for the perfect support recovery under the Gaussian measurement ensemble. Considering a fractional support recovery, the study in [19] provides a set of necessary and sufficient conditions on the required number of measurements as a function of the fraction of the support that can be reliably recovered.

In this paper, we look at the support recovery problem from an estimation theoretic point of view, where the error metric between the true and the estimated support is the -norm of their difference. In some applications, e.g. [20], it is important that the recovered sparsity pattern be as close as possible to the true support set. In these cases, the -norm error metric comes as an appropriate option where the assigned penalty is quadratically proportional to the distance. Moreover, this choice allows us to use the machinery behind the -norm error metric, which makes the theorems and the proofs geometrical and more intuitive. While no specific assumption is made on the underlying measurement matrix, we assume that the linear measurements are perturbed by additive white Gaussian noise. Since the positions of the nonzero entries of forms a set of discrete values (e.g., integers between and ), the support recovery problem can be regarded as estimating restricted parameters. This leads us to use the Hammersley-Chapman-Robbins (HCR) bound which provides a lower bound on the variance of any unbiased estimator of a set of restricted parameters [21, 22]. The HCR bound is a generalization of the Cramer-Rao (CR) bound [23] and holds under much weaker regularity conditions, while giving substantially tighter bounds in general. Using the HCR bound, we specifically derive in a straightforward manner the necessary conditions on the required number of measurements for the standard Gaussian ensemble.

Of equal interest are the conditions under which the HCR bound is achievable (tight). To this end, we study the performance of the maximum likelihood estimator (MLE) and derive conditions under which it becomes unbiased and achieves the HCR bound. In particular, this leads us to the sufficient conditions on the number of measurements for reliable -norm support recovery using the standard Gaussian measurement ensemble. Note that when the error of the -norm support recovery vanishes, so does that of a regular support recovery problem with the error metric. Therefore, the derived sufficient condition also applies to the error metric support recovery.

The organization of the paper is as follows. In Section III, we provide a more precise formulation of the problem. We derive the HCR bound for the support recovery problem in Section IV which is followed by deriving necessary conditions on the number of measurements for the standard Gaussian measurement ensemble. By studying the performance of the MLE in Section V, we derive conditions under which the HCR bound becomes achievable. Finally, under the standard Gaussian measurement ensemble, we identify the sufficient number of measurements for reliable -norm support recovery.

Ii Previous Work

The problem of sparsity recovery has received considerable attention in the literature in both the noiseless and noisy settings, see e.g., [7, 15, 17, 18, 19, 24, 25]. The results focus on the asymptotic scaling of the number of measurements for almost-sure success of the reconstruction of sparse inputs. In this section, we give an overview of the previous work which is more related to the results of this paper.

The work in [17] provides necessary and sufficient conditions on the number of measurements in the high-dimensional and noisy setting for reliable sparsity recovery using an optimal decoder. In that setup, the measurements are contaminated by i.i.d. Gaussian noise and the analysis is high dimensional, meaning that the sparsity level , the signal dimension and the number of measurements tend to infinity simultaneously. Under the condition , the author derives the following sufficient condition for asymptotic reliable recovery of the optimal decoder

(1)

where is a fixed constant. Moreover, it is also shown in [17] that

is a necessary condition for some fixed constant . By simplifying the sufficient condition (1) in the sublinear sparsity regime , it is shown that the number of measurements required by the constrained quadratic programming (Lasso) given by  [15] achieves the information-theoretic necessary bound.

In [18], the authors derive the necessary scaling

(2)

for uniform i.i.d. Gaussian measurement ensemble which is true at any finite and for all algorithms. The term indicates the minimum-to-average ratio of the input sparse signal. Moreover, they show that for a fixed and , the simple maximum correlation estimator (MCE) achieves the same scaling as in (2). The MCE selects the indices of the columns of the measurement matrix having the highest correlation with the measurement vector. More precisely, the results indicate that MCE needs

(3)

measurements to succeed with high probability. Therefore, the simple MCE also achieves the same scaling law as Lasso.

In a more general setting, the support recovery with some distortion measure has been considered in [9, 19, 26]. The results in [19] show that if the does not increase with the signal dimension, the exact support recovery is not possible. Moreover, they show that partial support recovery is possible with a bounded per sample which indicates that a finite rate per sample is sufficient. In this regard, our work can be viewed as the support recovery problem with the -norm distortion measure. In the following, we explain our setup for the estimation theoretic approach of support recovery.

Iii Problem statement

In this paper, we consider a deterministic signal model, in which is a fixed but unknown vector with exactly non-zero entries. We refer to as the signal sparsity, as the signal dimension, and define the support vector as the positions of the non-zero elements of . More precisely,

where we assume that . The corresponding non-zero entries of form a vector

Suppose we are given a vector of noisy observations of the form

where is the measurement matrix and is additive i.i.d. Gaussian noise. Throughout this paper, we assume that is fixed; since any scaling of can be accounted for in the scaling of . Let , denote the matrix composed of the columns of at positions indexed by , and denote the column span of . Since there are subspaces of dimension , a number from to can be assigned to them and w.l.o.g., we assume that belongs to the first subspace . From now on, for simplicity we refer to the first subspace as . Moreover, we need to assume that any columns of the measurement matrix are linearly independent. Under this assumption, we have , i.e., there is a one-to-one correspondence between sparse vectors and their images .

Due to the presence of noise, cannot be recovered exactly. However, a sparse-recovery algorithm outputs an estimate . In the support recovery problem, we are only interested in estimating the support. To that end, we can consider different performance metrics for the quality of estimation. In [15], the measure of error between the estimate and the true signal is a valued loss function,

where is the indicator function. This metric is appropriate for the exact support recovery. In this work, we are interested in an approximate support recovery where the goal is to recover a sparsity pattern as close as possible to the true support set. For this purpose, we consider the following -norm error metric

where throughout this paper, refers to the Euclidean norm. Note that .

As is mentioned in [17], alone is not a suitable quantity for the support recovery problem. It is possible to generate a set of problem instances for which the support recovery becomes arbitrarily unreliable, in particular, by letting the smallest coefficient go to zero (assuming that ) at an arbitrarily rate, even though the becomes arbitrarily large by increasing the rest. As he also observed, the magnitude of the smallest nonzero entry of is prominent in the phrasing of results. Hence, we define

In particular, our results apply to any unbiased estimator that operates over the signal class

Our analysis is high dimensional in nature, in the sense that the signal dimension goes to infinity. More precisely, we say the -norm support recovery is reliable if

(4)

for any under some scaling of as a function of , where is the estimated support of . For unbiased estimators, (4) is equivalent to

where

and is the matrix trace operation. Since the support estimation is based on , with a slight abuse of notation, we also denote it by .

With this setup, our first goal is to find necessary conditions on parameters which should be satisfied by any unbiased estimator for reliable -norm support recovery. The results are applicable to any measurement matrix and we specifically evaluate it for the standard Gaussian measurement matrices. Our second goal is to find sufficient conditions for the successful support recovery using the optimum decoder. We show that under appropriate conditions, the performance of the optimum decoder is close to the theoretical lower bound for the performance of the unbiased support estimators. Again, as a special case, we evaluate the sufficient conditions for standard Gaussian measurement matrices.

Iv Hammersley-Chapman-Robbins Bound

The Cramer-Rao (CR) bound is a well-known tool in statistics which provides a lower bound on the variance of the error of any unbiased estimator of an unknown deterministic parameter from a set of measurements  [23]. More specifically, in a single parameter scenario, the estimated value satisfies

(5)

where is the pdf of the measurements which depends on the parameter . As (5) suggests, the CR bound is derived for estimating a continuous parameter.

In many cases, there is a priori information on the estimated parameter which restricts it to take values from a predetermined set. An example is the estimation of the mean of a normal distribution when one knows that the true mean is an integer (see the example below). In such scenarios, the Hammersley-Chapman-Robbins (HCR) bound provides a stronger lower bound on the variance of any unbiased estimator [21, 22]. More precisely, let us assume that the set of observations are drawn according to a probability distribution with density function where is a parameter belonging to some parameter set (e.g., the set of integer numbers) and completely characterizes the pdf. In addition, the sequence is partitioned into two subsequences , where we are only interested in estimating the parameters included in the subsequence . Let denote an unbiased estimator of . Given the above definitions, we recall the following result.

Theorem 1 ([21, 22])

The trace of the covariance matrix of any unbiased estimator of is bounded below by

(6)

in which . The set is chosen so that takes values according to the a priori information.

Example 1

For clarity, let us consider the performance of any unbiased estimator of (only) the mean of a normal distribution based on independent samples of size , i.e., . In this case, , , and

Let denote an unbiased estimator of which is the parameter we want to estimate. When there is no prior information on , it follows from the CR bound that

(7)

Once the mean is restricted to be an integer, we may write and , where is a non-zero integer. Then, upon integration in (6) we get

(8)
(9)

where the maximum is attained for . A point worth mentioning is the role of the prior information. While (7) drops linearly,  (9) decreases exponentially with respect to the number of observations. It is also interesting to note that (8) applies as well to the case in which the parameter is not restricted. We then have to deal with the maximization in (8) for variations in , where may take any value (not necessarily integer) except . Since the right hand side of (8) is a decreasing function of , we let and get (7).

Iv-a Performance Lower Bound

In the support recovery problem, we know a priori that each entry of the support vector takes values from the restricted set . Hence, the HCR bound can provide us with a lower bound on the performance of any unbiased estimator of the support set.

Theorem 2

Assume to be an unbiased estimator of the support . The HCR lower bound on the variance of is given by

(10)

in which denotes the projection of onto .

Proof:

Since our observations are of the form , the set of unknown parameters consists of the support vector and the corresponding coefficients . We are only interested in estimating the support, hence, and . Then

where . Upon integration we get

Using the HCR bound (6), we derive

(11)

If and live in the same subspace, i.e., , the right hand side of (11) will be zero. Therefore, in order to find the supremum, we can restrict our attention to all the signals which do not live in the same subspace as does:

(12)

For each sequence , the numerator of (12) is fixed (it is the distance between the supports and does not depend on the coefficients) while the denominator is minimized by setting . This leads to (10). \qed

Corollary 1

For any support vector , we have

In the following, we see how Theorem 2 helps us to find the lower bound on the number of measurements for reliable -norm support recovery.

Iv-B Necessary Conditions

Using the HCR bound, Theorem 2 provides a lower bound on the performance of any unbiased estimator for the -norm support recovery problem. In words, the -norm support recovery is unreliable if the right hand side of (10) is bounded away from zero, which yields to a lower bound on the minimum number of measurements. However, finding the maximum in (10) requires a search through an exponential number of subspaces. Instead, as Corollary 1 suggests, any subspace different from the true one will provide us with a lower bound. In the following, we show how this result will lead to necessary conditions for random Gaussian measurement matrices.

Theorem 3

Let the measurement matrix be drawn with i.i.d. elements from a standard Gaussian distribution . The -norm support recovery over the signal class is unreliable if

Proof:

The -norm support recovery is reliable if (4) holds for any . Consider a with which takes as its last non-zero entry, i.e., . From Corollary 1, we have

(13)

for any . In particular, let have the support with coefficients equal to those of in the first positions and . We show that if does not satisfy the condition of the theorem, then the RHS of (13) will be bounded away from zero for this specific , and therefore the estimation is unreliable.

Note that , and . This implies that

where has a chi-square distribution with degrees of freedom. It is known that a central chi-square random variable with degrees of freedom satisfies

(14)

for all  [27]. Assume that

(15)

for some constant , and evaluate (14) for . This leads to

(16)

Note that the RHS of (16) converges to zero, as grows. Therefore,

which shows that the RHS of (13) is bounded away from zero with high probability, and therefore, the estimation error does not vanish asymptotically.

\qed

Table I shows the necessary conditions for different scalings of and as a function of .

Up to this point, we have discussed the HCR bound and its application in finding necessary conditions on the number of measurements for reliable -norm support recovery for Gaussian measurement matrices. In the following, we find conditions under which the HCR bound is achievable and as a result, find the sufficient number of measurements for reliable -norm support recovery.

V Achievability of the HCR Bound

We now analyze the performance of the maximum likelihood estimator (MLE) for the -norm support recovery and find conditions under which it becomes unbiased and in addition, its performance moves towards that of the HCR bound. We then apply this result to derive a sufficient number of measurements for the standard Gaussian measurement matrices.

V-a MLE performance

Provided that any columns of the measurement matrix are linearly independent, the noiseless measurement vector belongs to one and only one of the possible subspaces. Since the noise is i.i.d. Gaussian, MLE selects the subspace closest to the observed vector . More precisely,

Now consider another subspace of dimension where . Clearly an error happens when MLE selects the support in place of the true support . Let denote the probability that MLE outputs the support vector instead of , among all possible support vectors.

Lemma 1

Let , where , and be a support set different from . Then

Proof:

See Appendix A-A. \qed

Let the minimum distance between and its projections onto other subspaces be

and the distinguishability factor be defined as

Lemma 2

Let , where and . Moreover, assume that the number of measurements is an even integer, and . Then, the probability that MLE makes an error in choosing is upper bounded by

where and as grows.

Proof:

See Appendix A-B. \qed

Based on Lemma 2, the probability of error of MLE is related to the minimum distance between and its projections onto the other subspaces. In the following theorem, we provide a bound on the performance of MLE.

Theorem 4

Let and for some fixed . Then, MLE is asymptotically unbiased as , namely,

Moreover, its performance is bounded by

(17)

in which and as grows.

Proof:

Let be the ML estimate for the true support set . Then

Since and , we have

(18)

in which denotes the probability that MLE makes an error. Combining (18) and Lemma 2, we get

where in , we used . Obviously, as . Hence . For the second part, we need to compute the asymptotic behavior of as . By definition

Now, as we can write

where in we used the fact that is bounded by and for we used Lemma 2. \qed

By Theorem 4, MLE is asymptotically unbiased and therefore, its estimation error is lower bounded by the HCR bound. Moreover, the MLE performance upper bound in (17) has only a 9 dB gap in the denominator compared to the HCR lower bound in (10). Therefore, such asymptotic behavior of MLE shows the achievability of the HCR bound, under the mentioned conditions.

As we observe, our results do not depend on any specific measurement matrix. In the following, we see how these results lead us to find the sufficient number of measurements for reliable -norm support recovery when the Gaussian measurement ensemble is used.

V-B Sufficient Conditions

Theorem 4 provides us with a bound on the performance of the MLE. For reliable -norm support recovery, the right hand side of (17) should go to zero as . To that end, as required by Theorem 4, one should make sure that first, is bounded away from one which is a property of the underlying measurement matrix and second, that the number of measurements is at least of the order of . Note that these conditions also imply that MLE is asymptotically unbiased and therefore, its performance is bounded by the HCR bound.

In the following, we study the above two conditions for random Gaussian measurement matrices, which will provide us with the sufficient number of measurements for reliable -norm support recovery.

Theorem 5

Let the measurement matrix be drawn with i.i.d. elements from the standard Gaussian distribution . If the minimum coefficient value of the signal satisfies for a constant , then measurements suffice to ensure reliable -norm support recovery.

Proof:

To ensure that , we need to find the scaling for which

(19)

where and goes through all support vectors different from (i.e., from to . We have,

Since the projection operator cancels out any vector which lives in the subspace , we can write

where denotes the elements of which do not belong to . Now since

and the range of the orthogonal projector is of dimension , we get

(20)

Let denote the event . Then,

where in we used the union bound. In order to satisfy (19), we seek conditions under which the sum tends to zero. Each individual term in this sum can be written as

(21)

Since (see (20)), we can apply the following large deviation bound for the centralized distributions [27]

(22)

which is valid for all . Now, define

(23)

and assume . Hence, due to the fact that , we have

(24)

Therefore, by evaluating (22) for in (23) and using (24) , we have

Let be the number of indices in not present in . Then

(25)

Let the symbols and be defined as

Then, and (25) implies

and therefore,

(26)

Combining (21) and (26) and taking summation over all possible error events, we get

As we mentioned earlier, the sum should tend to zero as the dimension grows. This will hold if

(27)

Without loss of generality, we assume that . Let us define

Applying (24), it is easy to show that

Therefore using Stirling’s approximation, (27) is satisfied asymptotically if

(28)

To find the maximum in (28), we consider separately the linear and sub-linear regimes.

  1. :
    We have

    for some constants greater than zero. Since dominates the other terms asymptotically, we should have

  2. :
    In this regime we have

    and

    Therefore, the result of the linear regime covers the sub-linear regime.

Thus, we showed that measurements is sufficient for perfect -norm support recovery under the standard Gaussian measurement ensemble. \qed

Based on Theorem 5, the sufficient number of measurements under different scalings for is given by

The necessary and sufficient conditions in different regimes for the standard Gaussian measurement ensemble are shown in Table I.

Remark: The first row in Table I shows that one needs to take more measurements than the dimension of the signal in order to estimate the exact support set. This seems to be in contradiction with the concept of compressed sensing. One might think that this is an artifact of using this particular way of sampling. To show that this is not the case, let us assume that we have direct access to the noisy version of the input signal . This means that we use a square diagonal matrix instead of a Gaussian one to sample the signal. In order to make the two scenarios comparable, we should make sure that the signal powers after the measurement are equal. To this end, we need to put a gain of on the main diagonal.

Now consider two signals and which consist of nonzero entries with amplitudes and differ in only one position. The probability of error of MLE is given by

where is the tail probability of a standard Gaussian random variable. In the regime considered in the first row of Table I, i.e., we obtain

(29)

Therefore, even if we use direct measurements, there is no hope to recover the exact support in this regime. In [17], Wainwright showed that measurements is indeed sufficient.

Necessary Sufficient
TABLE I: Necessary and sufficient conditions on the number of measurements required for reliable -norm support recovery under the standard Gaussian measurement ensemble ().

Vi Conclusions

We considered the problem of recovering the support of a sparse vector from a set of noisy linear measurements from an estimation theoretic point of view. We set the error metric between the true and the estimated support sets as the -norm of their differences. Then, we investigated the fundamental performance limit of any unbiased estimator of the support set using the Hammersley-Chapman-Robbins bound, where no specific assumption was made on the measurement matrix. This general bound led us to the necessary conditions on the number of measurements for successful support recovery, which we specifically evaluated for standard random Gaussian measurement ensembles. Then, we analyzed the performance of the maximum likelihood estimator and derived conditions under which it becomes unbiased and achieves the Hammersley-Chapman-Robbins bound. Applying these conditions provided us with the sufficient number of measurements for random Gaussian measurement ensembles.

Acknowledgment

The authors would like to thank Prof. T. Blu, Prof. M. J. Wainwright and Prof. R. Ürbanke for their help and useful comments.

Appendix A

A-a Proof of Lemma 1

MLE chooses over if and only if

Let us assume that

(A.1)

For any , we have

where in we used (A.1). This implies that if , MLE will not choose over . Since the probability that MLE selects among all possible support vectors is less than the probability that MLE chooses over , we get

A-B Proof of Lemma 2

From Lemma 1 we know that if , MLE makes the correct choice. Therefore,