An Estimation Theoretic Approach for Sparsity Pattern Recovery in the Noisy Setting
Compressed sensing deals with the reconstruction of sparse signals using a small number of linear measurements. One of the main challenges in compressed sensing is to find the support of a sparse signal. In the literature, several bounds on the scaling law of the number of measurements for successful support recovery have been derived where the main focus is on random Gaussian measurement matrices.
In this paper, we investigate the noisy support recovery problem from an estimation theoretic point of view, where no specific assumption is made on the underlying measurement matrix. The linear measurements are perturbed by additive white Gaussian noise. We define the output of a support estimator to be a set of position values in increasing order. We set the error between the true and estimated supports as the -norm of their difference. On the one hand, this choice allows us to use the machinery behind the -norm error metric and on the other hand, converts the support recovery into a more intuitive and geometrical problem. First, by using the Hammersley-Chapman-Robbins (HCR) bound, we derive a fundamental lower bound on the performance of any unbiased estimator of the support set. This lower bound provides us with necessary conditions on the number of measurements for reliable -norm support recovery, which we specifically evaluate for uniform Gaussian measurement matrices. Then, we analyze the maximum likelihood estimator and derive conditions under which the HCR bound is achievable. This leads us to the number of measurements for the optimum decoder which is sufficient for reliable -norm support recovery and shows that the performance of the optimum decoder has only a 9 dB gap compared to the HCR lower bound. Using this framework, we specifically evaluate sufficient conditions on the number of measurements for uniform Gaussian measurement matrices.
Linear sampling of sparse signals, with the number of measurements close to their sparsity level, has recently received a lot of attention under the names of compressed sensing (CS), compressive sampling or sparse sampling [2, 3, 4, 5]. A -sparse signal is defined as a signal with nonzero expansion coefficients in some orthonormal basis or frame. The goal of compressed sensing is to find measurement matrices , followed by reconstruction algorithms which allow robust recovery of sparse signals using the least number of measurements , and low computational complexity; see for example [6, 7, 8, 9, 10, 11].
Support recovery refers to the problem of estimating the positions of the non-zero entries of , based on a set of observations. In the noiseless setting, the optimal algorithm requires samples at the expense of high computational complexity to obtain the true support set  while measurements are needed for the reconstruction algorithms based on linear programming . In the same context, it is shown that samples are sufficient for shift-invariant measurement matrices using recovery algorithms based on annihilating filters .
In practice, however, all the measurements are noisy due to physical restrictions, quantization precision, etc. A large body of recent work has established bounds on the number of measurements required for successful support set recovery in the noisy setting. Denoting as the minimum non-zero coefficient of the sparse vector , the authors in [15, 16] derived the scaling law on the number of measurements as a function of for the -constrained quadratic programming, also referred to as Lasso, to recover the sparsity pattern. In the context of the optimal decoding algorithm, the results in [17, 18] provide necessary and sufficient conditions for the perfect support recovery under the Gaussian measurement ensemble. Considering a fractional support recovery, the study in  provides a set of necessary and sufficient conditions on the required number of measurements as a function of the fraction of the support that can be reliably recovered.
In this paper, we look at the support recovery problem from an estimation theoretic point of view, where the error metric between the true and the estimated support is the -norm of their difference. In some applications, e.g. , it is important that the recovered sparsity pattern be as close as possible to the true support set. In these cases, the -norm error metric comes as an appropriate option where the assigned penalty is quadratically proportional to the distance. Moreover, this choice allows us to use the machinery behind the -norm error metric, which makes the theorems and the proofs geometrical and more intuitive. While no specific assumption is made on the underlying measurement matrix, we assume that the linear measurements are perturbed by additive white Gaussian noise. Since the positions of the nonzero entries of forms a set of discrete values (e.g., integers between and ), the support recovery problem can be regarded as estimating restricted parameters. This leads us to use the Hammersley-Chapman-Robbins (HCR) bound which provides a lower bound on the variance of any unbiased estimator of a set of restricted parameters [21, 22]. The HCR bound is a generalization of the Cramer-Rao (CR) bound  and holds under much weaker regularity conditions, while giving substantially tighter bounds in general. Using the HCR bound, we specifically derive in a straightforward manner the necessary conditions on the required number of measurements for the standard Gaussian ensemble.
Of equal interest are the conditions under which the HCR bound is achievable (tight). To this end, we study the performance of the maximum likelihood estimator (MLE) and derive conditions under which it becomes unbiased and achieves the HCR bound. In particular, this leads us to the sufficient conditions on the number of measurements for reliable -norm support recovery using the standard Gaussian measurement ensemble. Note that when the error of the -norm support recovery vanishes, so does that of a regular support recovery problem with the error metric. Therefore, the derived sufficient condition also applies to the error metric support recovery.
The organization of the paper is as follows. In Section III, we provide a more precise formulation of the problem. We derive the HCR bound for the support recovery problem in Section IV which is followed by deriving necessary conditions on the number of measurements for the standard Gaussian measurement ensemble. By studying the performance of the MLE in Section V, we derive conditions under which the HCR bound becomes achievable. Finally, under the standard Gaussian measurement ensemble, we identify the sufficient number of measurements for reliable -norm support recovery.
Ii Previous Work
The problem of sparsity recovery has received considerable attention in the literature in both the noiseless and noisy settings, see e.g., [7, 15, 17, 18, 19, 24, 25]. The results focus on the asymptotic scaling of the number of measurements for almost-sure success of the reconstruction of sparse inputs. In this section, we give an overview of the previous work which is more related to the results of this paper.
The work in  provides necessary and sufficient conditions on the number of measurements in the high-dimensional and noisy setting for reliable sparsity recovery using an optimal decoder. In that setup, the measurements are contaminated by i.i.d. Gaussian noise and the analysis is high dimensional, meaning that the sparsity level , the signal dimension and the number of measurements tend to infinity simultaneously. Under the condition , the author derives the following sufficient condition for asymptotic reliable recovery of the optimal decoder
where is a fixed constant. Moreover, it is also shown in  that
is a necessary condition for some fixed constant . By simplifying the sufficient condition (1) in the sublinear sparsity regime , it is shown that the number of measurements required by the constrained quadratic programming (Lasso) given by  achieves the information-theoretic necessary bound.
In , the authors derive the necessary scaling
for uniform i.i.d. Gaussian measurement ensemble which is true at any finite and for all algorithms. The term indicates the minimum-to-average ratio of the input sparse signal. Moreover, they show that for a fixed and , the simple maximum correlation estimator (MCE) achieves the same scaling as in (2). The MCE selects the indices of the columns of the measurement matrix having the highest correlation with the measurement vector. More precisely, the results indicate that MCE needs
measurements to succeed with high probability. Therefore, the simple MCE also achieves the same scaling law as Lasso.
In a more general setting, the support recovery with some distortion measure has been considered in [9, 19, 26]. The results in  show that if the does not increase with the signal dimension, the exact support recovery is not possible. Moreover, they show that partial support recovery is possible with a bounded per sample which indicates that a finite rate per sample is sufficient. In this regard, our work can be viewed as the support recovery problem with the -norm distortion measure. In the following, we explain our setup for the estimation theoretic approach of support recovery.
Iii Problem statement
In this paper, we consider a deterministic signal model, in which is a fixed but unknown vector with exactly non-zero entries. We refer to as the signal sparsity, as the signal dimension, and define the support vector as the positions of the non-zero elements of . More precisely,
where we assume that . The corresponding non-zero entries of form a vector
Suppose we are given a vector of noisy observations of the form
where is the measurement matrix and is additive i.i.d. Gaussian noise. Throughout this paper, we assume that is fixed; since any scaling of can be accounted for in the scaling of . Let , denote the matrix composed of the columns of at positions indexed by , and denote the column span of . Since there are subspaces of dimension , a number from to can be assigned to them and w.l.o.g., we assume that belongs to the first subspace . From now on, for simplicity we refer to the first subspace as . Moreover, we need to assume that any columns of the measurement matrix are linearly independent. Under this assumption, we have , i.e., there is a one-to-one correspondence between sparse vectors and their images .
Due to the presence of noise, cannot be recovered exactly. However, a sparse-recovery algorithm outputs an estimate . In the support recovery problem, we are only interested in estimating the support. To that end, we can consider different performance metrics for the quality of estimation. In , the measure of error between the estimate and the true signal is a valued loss function,
where is the indicator function. This metric is appropriate for the exact support recovery. In this work, we are interested in an approximate support recovery where the goal is to recover a sparsity pattern as close as possible to the true support set. For this purpose, we consider the following -norm error metric
where throughout this paper, refers to the Euclidean norm. Note that .
As is mentioned in , alone is not a suitable quantity for the support recovery problem. It is possible to generate a set of problem instances for which the support recovery becomes arbitrarily unreliable, in particular, by letting the smallest coefficient go to zero (assuming that ) at an arbitrarily rate, even though the becomes arbitrarily large by increasing the rest. As he also observed, the magnitude of the smallest nonzero entry of is prominent in the phrasing of results. Hence, we define
In particular, our results apply to any unbiased estimator that operates over the signal class
Our analysis is high dimensional in nature, in the sense that the signal dimension goes to infinity. More precisely, we say the -norm support recovery is reliable if
for any under some scaling of as a function of , where is the estimated support of . For unbiased estimators, (4) is equivalent to
and is the matrix trace operation. Since the support estimation is based on , with a slight abuse of notation, we also denote it by .
With this setup, our first goal is to find necessary conditions on parameters which should be satisfied by any unbiased estimator for reliable -norm support recovery. The results are applicable to any measurement matrix and we specifically evaluate it for the standard Gaussian measurement matrices. Our second goal is to find sufficient conditions for the successful support recovery using the optimum decoder. We show that under appropriate conditions, the performance of the optimum decoder is close to the theoretical lower bound for the performance of the unbiased support estimators. Again, as a special case, we evaluate the sufficient conditions for standard Gaussian measurement matrices.
Iv Hammersley-Chapman-Robbins Bound
The Cramer-Rao (CR) bound is a well-known tool in statistics which provides a lower bound on the variance of the error of any unbiased estimator of an unknown deterministic parameter from a set of measurements . More specifically, in a single parameter scenario, the estimated value satisfies
where is the pdf of the measurements which depends on the parameter . As (5) suggests, the CR bound is derived for estimating a continuous parameter.
In many cases, there is a priori information on the estimated parameter which restricts it to take values from a predetermined set. An example is the estimation of the mean of a normal distribution when one knows that the true mean is an integer (see the example below). In such scenarios, the Hammersley-Chapman-Robbins (HCR) bound provides a stronger lower bound on the variance of any unbiased estimator [21, 22]. More precisely, let us assume that the set of observations are drawn according to a probability distribution with density function where is a parameter belonging to some parameter set (e.g., the set of integer numbers) and completely characterizes the pdf. In addition, the sequence is partitioned into two subsequences , where we are only interested in estimating the parameters included in the subsequence . Let denote an unbiased estimator of . Given the above definitions, we recall the following result.
The trace of the covariance matrix of any unbiased estimator of is bounded below by
in which . The set is chosen so that takes values according to the a priori information.
For clarity, let us consider the performance of any unbiased estimator of (only) the mean of a normal distribution based on independent samples of size , i.e., . In this case, , , and
Let denote an unbiased estimator of which is the parameter we want to estimate. When there is no prior information on , it follows from the CR bound that
Once the mean is restricted to be an integer, we may write and , where is a non-zero integer. Then, upon integration in (6) we get
where the maximum is attained for . A point worth mentioning is the role of the prior information. While (7) drops linearly, (9) decreases exponentially with respect to the number of observations. It is also interesting to note that (8) applies as well to the case in which the parameter is not restricted. We then have to deal with the maximization in (8) for variations in , where may take any value (not necessarily integer) except . Since the right hand side of (8) is a decreasing function of , we let and get (7).
Iv-a Performance Lower Bound
In the support recovery problem, we know a priori that each entry of the support vector takes values from the restricted set . Hence, the HCR bound can provide us with a lower bound on the performance of any unbiased estimator of the support set.
Assume to be an unbiased estimator of the support . The HCR lower bound on the variance of is given by
in which denotes the projection of onto .
Since our observations are of the form , the set of unknown parameters consists of the support vector and the corresponding coefficients . We are only interested in estimating the support, hence, and . Then
where . Upon integration we get
Using the HCR bound (6), we derive
If and live in the same subspace, i.e., , the right hand side of (11) will be zero. Therefore, in order to find the supremum, we can restrict our attention to all the signals which do not live in the same subspace as does:
For each sequence , the numerator of (12) is fixed (it is the distance between the supports and does not depend on the coefficients) while the denominator is minimized by setting . This leads to (10). \qed
For any support vector , we have
In the following, we see how Theorem 2 helps us to find the lower bound on the number of measurements for reliable -norm support recovery.
Iv-B Necessary Conditions
Using the HCR bound, Theorem 2 provides a lower bound on the performance of any unbiased estimator for the -norm support recovery problem. In words, the -norm support recovery is unreliable if the right hand side of (10) is bounded away from zero, which yields to a lower bound on the minimum number of measurements. However, finding the maximum in (10) requires a search through an exponential number of subspaces. Instead, as Corollary 1 suggests, any subspace different from the true one will provide us with a lower bound. In the following, we show how this result will lead to necessary conditions for random Gaussian measurement matrices.
Let the measurement matrix be drawn with i.i.d. elements from a standard Gaussian distribution . The -norm support recovery over the signal class is unreliable if
for any . In particular, let have the support with coefficients equal to those of in the first positions and . We show that if does not satisfy the condition of the theorem, then the RHS of (13) will be bounded away from zero for this specific , and therefore the estimation is unreliable.
Note that , and . This implies that
where has a chi-square distribution with degrees of freedom. It is known that a central chi-square random variable with degrees of freedom satisfies
for all . Assume that
for some constant , and evaluate (14) for . This leads to
Note that the RHS of (16) converges to zero, as grows. Therefore,
which shows that the RHS of (13) is bounded away from zero with high probability, and therefore, the estimation error does not vanish asymptotically.
Table I shows the necessary conditions for different scalings of and as a function of .
Up to this point, we have discussed the HCR bound and its application in finding necessary conditions on the number of measurements for reliable -norm support recovery for Gaussian measurement matrices. In the following, we find conditions under which the HCR bound is achievable and as a result, find the sufficient number of measurements for reliable -norm support recovery.
V Achievability of the HCR Bound
We now analyze the performance of the maximum likelihood estimator (MLE) for the -norm support recovery and find conditions under which it becomes unbiased and in addition, its performance moves towards that of the HCR bound. We then apply this result to derive a sufficient number of measurements for the standard Gaussian measurement matrices.
V-a MLE performance
Provided that any columns of the measurement matrix are linearly independent, the noiseless measurement vector belongs to one and only one of the possible subspaces. Since the noise is i.i.d. Gaussian, MLE selects the subspace closest to the observed vector . More precisely,
Now consider another subspace of dimension where . Clearly an error happens when MLE selects the support in place of the true support . Let denote the probability that MLE outputs the support vector instead of , among all possible support vectors.
Let , where , and be a support set different from . Then
See Appendix A-A. \qed
Let the minimum distance between and its projections onto other subspaces be
and the distinguishability factor be defined as
Let , where and . Moreover, assume that the number of measurements is an even integer, and . Then, the probability that MLE makes an error in choosing is upper bounded by
where and as grows.
See Appendix A-B. \qed
Based on Lemma 2, the probability of error of MLE is related to the minimum distance between and its projections onto the other subspaces. In the following theorem, we provide a bound on the performance of MLE.
Let and for some fixed . Then, MLE is asymptotically unbiased as , namely,
Moreover, its performance is bounded by
in which and as grows.
Let be the ML estimate for the true support set . Then
Since and , we have
where in , we used . Obviously, as . Hence . For the second part, we need to compute the asymptotic behavior of as . By definition
Now, as we can write
where in we used the fact that is bounded by and for we used Lemma 2. \qed
By Theorem 4, MLE is asymptotically unbiased and therefore, its estimation error is lower bounded by the HCR bound. Moreover, the MLE performance upper bound in (17) has only a 9 dB gap in the denominator compared to the HCR lower bound in (10). Therefore, such asymptotic behavior of MLE shows the achievability of the HCR bound, under the mentioned conditions.
As we observe, our results do not depend on any specific measurement matrix. In the following, we see how these results lead us to find the sufficient number of measurements for reliable -norm support recovery when the Gaussian measurement ensemble is used.
V-B Sufficient Conditions
Theorem 4 provides us with a bound on the performance of the MLE. For reliable -norm support recovery, the right hand side of (17) should go to zero as . To that end, as required by Theorem 4, one should make sure that first, is bounded away from one which is a property of the underlying measurement matrix and second, that the number of measurements is at least of the order of . Note that these conditions also imply that MLE is asymptotically unbiased and therefore, its performance is bounded by the HCR bound.
In the following, we study the above two conditions for random Gaussian measurement matrices, which will provide us with the sufficient number of measurements for reliable -norm support recovery.
Let the measurement matrix be drawn with i.i.d. elements from the standard Gaussian distribution . If the minimum coefficient value of the signal satisfies for a constant , then measurements suffice to ensure reliable -norm support recovery.
To ensure that , we need to find the scaling for which
where and goes through all support vectors different from (i.e., from to . We have,
Since the projection operator cancels out any vector which lives in the subspace , we can write
where denotes the elements of which do not belong to . Now since
and the range of the orthogonal projector is of dimension , we get
Let denote the event . Then,
where in we used the union bound. In order to satisfy (19), we seek conditions under which the sum tends to zero. Each individual term in this sum can be written as
which is valid for all . Now, define
and assume . Hence, due to the fact that , we have
Let be the number of indices in not present in . Then
Let the symbols and be defined as
Then, and (25) implies
As we mentioned earlier, the sum should tend to zero as the dimension grows. This will hold if
Without loss of generality, we assume that . Let us define
Applying (24), it is easy to show that
Therefore using Stirling’s approximation, (27) is satisfied asymptotically if
To find the maximum in (28), we consider separately the linear and sub-linear regimes.
for some constants greater than zero. Since dominates the other terms asymptotically, we should have
In this regime we have
Therefore, the result of the linear regime covers the sub-linear regime.
Thus, we showed that measurements is sufficient for perfect -norm support recovery under the standard Gaussian measurement ensemble. \qed
Based on Theorem 5, the sufficient number of measurements under different scalings for is given by
The necessary and sufficient conditions in different regimes for the standard Gaussian measurement ensemble are shown in Table I.
Remark: The first row in Table I shows that one needs to take more measurements than the dimension of the signal in order to estimate the exact support set. This seems to be in contradiction with the concept of compressed sensing. One might think that this is an artifact of using this particular way of sampling. To show that this is not the case, let us assume that we have direct access to the noisy version of the input signal . This means that we use a square diagonal matrix instead of a Gaussian one to sample the signal. In order to make the two scenarios comparable, we should make sure that the signal powers after the measurement are equal. To this end, we need to put a gain of on the main diagonal.
Now consider two signals and which consist of nonzero entries with amplitudes and differ in only one position. The probability of error of MLE is given by
where is the tail probability of a standard Gaussian random variable. In the regime considered in the first row of Table I, i.e., we obtain
Therefore, even if we use direct measurements, there is no hope to recover the exact support in this regime. In , Wainwright showed that measurements is indeed sufficient.
We considered the problem of recovering the support of a sparse vector from a set of noisy linear measurements from an estimation theoretic point of view. We set the error metric between the true and the estimated support sets as the -norm of their differences. Then, we investigated the fundamental performance limit of any unbiased estimator of the support set using the Hammersley-Chapman-Robbins bound, where no specific assumption was made on the measurement matrix. This general bound led us to the necessary conditions on the number of measurements for successful support recovery, which we specifically evaluated for standard random Gaussian measurement ensembles. Then, we analyzed the performance of the maximum likelihood estimator and derived conditions under which it becomes unbiased and achieves the Hammersley-Chapman-Robbins bound. Applying these conditions provided us with the sufficient number of measurements for random Gaussian measurement ensembles.
The authors would like to thank Prof. T. Blu, Prof. M. J. Wainwright and Prof. R. Ürbanke for their help and useful comments.
A-a Proof of Lemma 1
MLE chooses over if and only if
Let us assume that
For any , we have
where in we used (A.1). This implies that if , MLE will not choose over . Since the probability that MLE selects among all possible support vectors is less than the probability that MLE chooses over , we get
A-B Proof of Lemma 2
From Lemma 1 we know that if , MLE makes the correct choice. Therefore,