Fundamental limit of sample generalized eigenvalue based detection of signals in noise using relatively few signalbearing and noiseonly samples
Abstract
The detection problem in statistical signal processing can be succinctly formulated: Given (possibly) signal bearing, dimensional signalplusnoise snapshot vectors (samples) and statistically independent dimensional noiseonly snapshot vectors, can one reliably infer the presence of a signal? This problem arises in the context of applications as diverse as radar, sonar, wireless communications, bioinformatics, and machine learning and is the critical first step in the subsequent signal parameter estimation phase.
The signal detection problem can be naturally posed in terms of the sample generalized eigenvalues. The sample generalized eigenvalues correspond to the eigenvalues of the matrix formed by “whitening” the signalplusnoise sample covariance matrix with the noiseonly sample covariance matrix. In this article we prove a fundamental asymptotic limit of sample generalized eigenvalue based detection of signals in arbitrarily colored noise when there are relatively few signal bearing and noiseonly samples.
Specifically, we show why when the (eigen) signaltonoise ratio (SNR) is below a critical value, that is a simple function of , and , then reliable signal detection, in an asymptotic sense, is not possible. If, however, the eigenSNR is above this critical value then a simple, new random matrix theory based algorithm, which we present here, will reliably detect the signal even at SNR’s close to the critical value. Numerical simulations highlight the accuracy of our analytical prediction and permit us to extend our heuristic definition of the effective number of identifiable signals in colored noise. We discuss implications of our result for the detection of weak and/or closely spaced signals in sensor array processing, abrupt change detection in sensor networks, and clustering methodologies in machine learning.
signal detection, random matrices, sample covariance matrix, Wishart distribution, multivariate F distribution
EDICS Category: SSPDETC Detection; SAMSDET Source detection
I Introduction
The observation vector, in many signal processing applications, can be modelled as a superposition of a finite number of signals embedded in additive noise. The model order selection problem of inferring the number of signals present is the critical first step in the subsequent signal parameter estimation problem. We consider the class of estimators that determine the model order, \ie, the number of signals, in colored noise from the sample generalized eigenvalues of the signalplusnoise sample covariance matrix and the noiseonly sample covariance matrix pair. The sample generalized eigenvalues [1] precisely correspond to the eigenvalues of the matrix formed by “whitening” the signalplusnoise sample covariance matrix with the noiseonly sample covariance matrix (assuming that the number of noiseonly samples is greater than the dimensionality of the system so that the noiseonly sample covariance matrix is invertible).
Such estimators are used in settings where it is possible to find a portion of the data that contains only noise fields and does not contain any signal information. This is a realistic assumption for many practical applications such as evoked neuromagnetic experiments [2, 3, 4], geophysical experiments that employ a “thumper” or in underwater experiments with a wideband acoustic signal transducer where such a portion can be found in a data portion taken before a stimulus is applied. In applications such as radar or sonar where the signals of interest are narrowband and located in a known frequency band, snapshot vectors collected at a frequency just outside this band can be justified as having the same noise covariance characteristics assuming that we are in the stationaryprocesslongobservationtime (SPLOT) regime [5].
Our main objective in this paper is to shed new light on this age old problem of detecting signal in noise from finite samples using the sample eigenvalues alone [6, 7]. We bring into sharp focus a fundamental statistical limit that explains precisely when and why, in highdimensional, sample size limited settings underestimation of the model order is unavoidable. This is in contrast to works in the literature that use simulations, as in [8], to highlight the chronically reported symptom of model order estimators underestimating the number of signals without providing insight into whether a fundamental limit of detection is being encountered.
In recent work [9], we examined this problem in the white noise scenario. The main contribution of this paper is the extension of the underlying idea to the arbitrary (or colored) noise scenario. Analogous to the definition in [9], we define the effective number of identifiable signals in colored noise as the number of the generalized eigenvalues of the population (true) signalplusnoise covariance matrix and noiseonly covariance matrix pair that are greater than a (deterministic) threshold that is a simple function of the number of signalplusnoise samples, noiseonly samples and the dimensionality of the system. Analogous to the white noise case, increasing the dimensionality of the system, by say adding more sensors, raises the detectability threshold so that the effective number of identifiable signals might actually decrease.
An additional contribution of this paper is the development of a simple, new, algorithm for estimating the number of signals based on the recent work of Johnstone [10]. Numerical results are used to illustrate the performance of the estimator around the detectability threshold alluded to earlier. Specifically, we observe that if the eigenSNR of a signal is above a critical value then reliable detection using the new algorithm is possible. Conversely, if the eigenSNR is below the critical value then the algorithm, correctly for the reason described earlier, is unable to distinguish the signal from noise.
The paper is organized as follows. We formulate the problem in Section II and state the main result in Section III. The effective number of signals is defined in Section IIIA along with a discussion on its implications for applications such as array processing, sensor networks and machine learning. A new algorithm for detecting the number of signals is presented in Section IV. Concluding remarks are offered in Section V. The mathematical proofs of the main result are provided in Section VI.
Ii Problem formulation
We observe samples (“snapshots”) of possibly signal bearing dimensional snapshot vectors where for each , the snapshot vector has a (real or complex) multivariate normal distribution, \ie, and the ’s are mutually independent. The snapshot vectors are modelled as
(1) 
where , denotes an dimensional (real or complex) Gaussian noise vector where the noise covariance may be known or unknown, denotes a dimensional (real or complex) Gaussian signal vector with covariance , and is a unknown nonrandom matrix. Since the signal and noise vectors are independent of each other, the covariance matrix of can hence be decomposed as
(2) 
where
(3) 
with denoting the complex conjugate or real transpose. Assuming that the matrix is of full column rank, \ie, the columns of are linearly independent, and that the covariance matrix of the signals is nonsingular, it follows that the rank of is . Equivalently, the smallest eigenvalues of are equal to zero.
If the noise covariance matrix were known apriori and was nonsingular, a “noise whitening” transformation may be applied to the snapshot vector to obtain the vector
(4) 
which will also be normally distributed with covariance
(5) 
Denote the eigenvalues of by . Recalling the formulation of the generalized eigenvalue problem [1][Section 8.7], we note that the eigenvalues of are exactly the generalized eigenvalues of the regular matrix pair . Then, assuming that the rank of is also , it follows that the smallest eigenvalues of or, equivalently, the generalized eigenvalues of the matrix pair ), are all equal to so that
(6) 
while the remaining eigenvalues of will be strictly greater than one.
Thus, if the true signalplusnoise covariance matrix and the noiseonly covariance matrix were known apriori, the number of signals could be trivially determined from the multiplicity of the eigenvalues of equalling one.
The problem in practice is that the signalplusnoise and the noise covariance matrices are unknown so that such a straightforward algorithm cannot be used. Instead we have an estimate the signalpluscovariance matrix obtained as
(7) 
and an estimate of the noiseonly sample covariance matrix obtained as
(8) 
where for are (possibly) signalbearing snapshots and for are independent noiseonly snapshots. We assume here that the number of noiseonly snapshots exceeds the dimensionality of the system, \ie, , so that the noiseonly sample covariance matrix , which has the Wishart distribution [11], is nonsingular and hence invertible with probability 1 [12, Chapter 3, pp. 97],[13, Chapter 7.7, pp. 272276]. Following (5), we then form the matrix
(9) 
and compute its eigendecomposition to obtain the eigenvalues of , which we denote by . We note, once again, that the eigenvalues of are simply the generalized eigenvalues of the regular matrix pair . Note that whenever , the signalplusnoise sample covariance matrix will be singular so that the generalized eigenvalues will equal zero, \ie, . Figure 1 illustrates why the blurring of the sample eigenvalues relative to the population eigenvalues makes the problem more challenging.
In this paper, we are interested in the class of algorithms that infer the number of signals buried in arbitrary noise from the eigenvalues of or alone. Such algorithms are widely used in practice and arise naturally from classical multivariate statistical theory [10] where the matrix is referred to as the multivariate F matrix [12, 14]. The information theoretical approach to model order estimation, first introduced by Wax and Kailath [6], was extended to the colored noise setting by Zhao et al in [15] who prove consistency of their estimator in the large sample size regime; their analysis does not yield any insight into the finite sample setting.
Consequently, research has focussed on developing sophisticated techniques for improving performance of eigenvalue based methods in the finite sample setting. Zhu et al [16] improve the performance of their eigenvalue estimator by assuming a model for the noise covariance matrix. Stoica and Cedervall [17] improve the performance of their estimator in two reasonable settings: one, where it is reasonable to assume that the noise covariance matrix is block diagonal or banded and two, where the temporal correlation of the noise has a shorter length than the signals. Other techniques in the literature exploit other characteristics of the signal or noise to effectively reduce the dimensionality of the signal subspace and improve model order estimation given finite samples. See for example [18, 19] and the references in [9].
Informally speaking, it is evident that performance of such model order estimation algorithms is coupled to the “quality” of the estimated signalplusnoise and noiseonly covariance matrices which in turn are dependent on the number of snapshots used to estimate them, respectively. Researchers applying these techniques have noted the absence of a mathematically rigorous, general purpose formula in the literature for predicting the minimum number of samples needed to obtain “good enough” detection accuracy (see, for example [3][pp. 846]. A larger, more fundamental question that has remained unanswered, till now, is whether there is a statistical limit being encountered.
We tackle this problem head on in this paper by employing sophisticated techniques from random matrix theory in [20]. We show that in an asymptotic sense, to be made precise later, that only the “signal” eigenvalues of that are above a deterministic threshold can be reliably distinguished from the “noise” eigenvalues. The threshold is a simple, deterministic function of the the dimensionality of the system, the number of noiseonly and signalplusnoise snapshots, and the noise and signalplus noise covariance, and described explicitly next. Note the applicability of the results to the situation when the signalplusnoise covariance matrix is singular.
Iii Main result
For a Hermitian matrix with real eigenvalues (counted with multiplicity), the empirical distribution function (e.d.f.) is defined as
(10) 
Of particular interest is the convergence of the e.d.f. of in the signalfree case, which is described next.
Let denote the matrix in (9) formed from (complex Gaussian) noiseonly snapshots and independent noiseonly (complex Gaussian) snapshots. Then the e.d.f. almost surely for every , as , and and where
(11) 
where
(12) 
when and zero otherwise, and is the Dirac delta function. {proof} This result was proved in [14]. When we recover the famous MarčenkoPastur density [21].
The following result exposes when the “signal” eigenvalues are asymptotically distinguishable from the “noise” eigenvalues.
Let denote the matrix in (9) formed from (real or complex Gaussian) signalplusnoise snapshots and independent (real or complex Gaussian) noiseonly snapshots. Denote the eigenvalues of by . Let denote the th largest eigenvalue of . Then as , and and we have
for and the convergence is almost surely and the threshold is given by
(13) 
where
(14) 
and . {proof} The result follows from Theorem VIE. The threshold is obtained by solving the inequality
where for , , from [22, 23, 24, 9], is given by
and is given by (30).
Note that when , so that we recover the results of Baik and Silverstein [23].
Iiia Effective number of identifiable signals
Theorem III brings into sharp focus the reason why, in the largesystemrelativelylargesamplesize limit, model order underestimation is sometimes unavoidable. This motivates our heuristic definition of the effective number of identifiable signals below:
(15) 
If we denote the eigenvalues of by then we define the eigenSNR of the th signal as then (15) essentially states that signals with eigenSNR’s smaller than will be asymptotically undetectable.
Figure 2 shows the eigenSNR threshold needed for reliable detection for different values as a function of for different values of . Such an analytical prediction was not possible before the results presented in this paper. Note the fundamental limit of detection in the situation when the noiseonly covariance matrix is known apriori (solid line) and increase in the threshold eigenSNR needed as the number of snapshots available to estimate the noiseonly covariance matrix decreases.
IiiB Implications for array processing
Suppose there are two uncorrelated (hence, independent) signals so that . In (1) let . In a sensor array processing application, we think of and as encoding the array manifold vectors for a source and an interferer with powers and , located at and , respectively. The signalplusnoise covariance matrix is given by
(16) 
where is the noiseonly covariance matrix. The matrix defined in (5) can be decomposed as
so we that we can readily note that has the smallest eigenvalues and the two largest eigenvalues
(17a)  
(17b) 
respectively, where and . Applying the result in Theorem III allows us to express the effective number of signals as
(18) 
Equation (18) captures the tradeoff between the identifiability of two closely spaced signals, the dimensionality of the system, the number of available snapshots and the cosine of the angle between the vectors and . Note that since the effective number of signals depends on the structure of the theoretical signal and noise covariance matrices (via the eigenvalues of ), different assumed noise covariance structures (AR(1) versus white noise, for example) will impact the signal level SNR needed for reliable detection in different ways.
IiiC Other applications
There is interest in detecting abrupt change in a system based on stochastic observations of the system using a network of sensors. When the observations made at various sensors can be modeled as GaussMarkov random field (GMRF), as in [25, 26], then the conditional independence property of GMRF’s [27] is a useful assumption. The assumption states that conditioned on a particular hypothesis, the observations at sensors are independent. This assumption results in the precision matrix, \ie, the inverse of the covariance matrix, having a sparse structure with many entries identically equal to zero.
Our results might be used to provide insight into the types of systemic changes, reflected in the structure of the signalplusnoise covariance matrix, that are undetectable using sample generalized eigenvalue based estimators. Specifically, the fact that the inverse of the noiseonly covariance matrix will have a sparse structure means that one can experiment with different (assumed) conditional independence structures and determine how “abrupt” the system change would have to be in order to be reliably detected using finite samples.
Spectral methods are popular in machine learning applications such as unsupervised learning, image segmentation, and information retrieval [28]. Generalized eigenvalue based techniques for clustering have been investigated in [29, 30]. Our results might provide insight when spectral clustering algorithms are likely to fail. In particular, we note that the results of Theorem III hold even in situations where the data is not Gaussian (see Theorem VIE) as is commonly assumed in machine learning applications.
Iv An algorithm for reliable detection of signals in noise
In [10], Johnstone proves that in the signalfree case, the distribution of the largest eigenvalue of , on appropriate centering and scaling, can be approximated to order by the TracyWidom law [31, 32, 33]. In the setting where there are signals present, we expect, after appropriate centering and scaling, the distribution of the signal eigenvalues of above the detectability threshold will obey a Gaussian law whereas those below the detectability threshold will obey the TracyWidom law as in the signalfree case. An analogous results for the signal bearing eigenvalues of was proved by Baik et al [22] and El Karoui [34]. Numerical investigations for (see Figure 3) corroborate the accuracy of our asymptotic predictions and form the basis of Algorithm 1 presented below for estimating the number of signals at (asymptotic) significance level . Theoretical support for this observation remains incomplete.
Algorithm 1 
Input: Eigenvalues for of 
1. Initialization: Set significance level 
2. Compute from Table II 
3. Set k = 0 
4. Compute and from Table I(a) 
5. Is ? 
6. If yes, then go to step 9 
7. Otherwise, increment . 
8. If , go to step 3. Else go to step 9. 
9. Return 
Figure 4 illustrates the accuracy of the predicted statistical limit and the ability of the proposed algorithm to reliably detect the presence of the signal at this limit.
0.990000  0.010000  3.89543267306429  3.72444594640057 
0.950000  0.050000  3.18037997693774  3.19416673215810 
0.900000  0.100000  2.78242790569530  2.90135093847591 
0.700000  0.300000  1.91037974619926  2.26618203984916 
0.500000  0.500000  1.26857461658107  1.80491240893658 
0.300000  0.700000  0.59228719101613  1.32485955606020 
0.100000  0.900000  0.45014328905825  0.59685129711735 
0.050000  0.950000  0.97931605346955  0.23247446976400 
0.010000  0.990000  2.02344928138015  0.47763604739084 
0.001000  0.999000  3.27219605900193  1.31441948008634 
0.000100  0.999900  4.35942034391365  2.03469175457082 
0.000010  0.999990  5.34429594047426  2.68220732168978 
0.000001  0.999999  6.25635442969338  3.27858828203370 
V Conclusion
Figure 4 captures the fundamental statistical limit encountered when attempting to discriminate signal from noise using finite samples. Simply put, a signal whose eigenSNR is below the detectability threshold cannot be reliably detected while a signal above the threshold can be. In settings such as wireless communications and biomedical signal processing where the signal power is controllable, our results provide a prescription for how strong it needs to be so that it can be detected. If the signal level is barely above the threshold, simply adding more sensors might actually degrade the performance because of the increased dimensionality of the system. If, however, either due to clever signal design or physics based modeling, we are able to reduce (or identify) the dimensionality of the subspace spanned by signal, then according to Figure 4 the detectability threshold will also be lowered. With VLSI advances making sensors easier and cheaper to deploy, our results demonstrate exactly why the resulting gains in systemic performance will more than offset the effort we will have to invest in developing increasingly more sophisticated dimensionality reduction techniques. Understanding the fundamental statistical limits of techniques for signal detection in the setting where the noiseonly sample covariance matrix is singular remains an important open problem.
Acknowledgements
Raj Rao was supported by an Office of Naval Research PostDoctoral Fellowship Award under grant N000140710269. Jack Silverstein was supported by the U.S. Army Research Office under Grant W911NF0510244. R. R. thanks Arthur Baggeroer for encouragement and invaluable feedback. This material was based upon work supported by the National Science Foundation under Agreement No. DMS0112069. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
We thank Alan Edelman for his encouragement and support, Interactive Supercomputing, Inc. for providing access to the StarP parallel computing software and Sudarshan Raghunathan of Interactive Supercomputing, Inc. for his patience and support in answering our multitude of StarP programming queries. We remain grateful to Al Davis and Chris Hill of MIT for granting us access to the Darwin Project computing cluster. Thanks to their involvement we were able to program, debug and complete the computation needed to produce Figure 4 in 4 days! Without their gracious help, the computation would have taken 3 months on the latest single processor laptop. We thank Folkmar Bornemann for providing the \matlabcode for computing the percentiles in Table II.
Vi Appendix
Via Mathematical preliminaries
Let for , be a collection of complex valued i.i.d. random variables with and . For positive integers and let , , . Assume for each is an Hermitian nonnegative definite matrix. The matrix
where is any Hermitian square root of , can be viewed as a sample covariance matrix, formed from samples of the random vector with denoting the first column of , which has for its population covariance matrix. When and are both large and on the same order of magnitude, will not be near , due to an insufficient number of samples required for such a large dimensional random vector. However, there exist results on the eigenvalues of . They are limit theorems as with and , which provide information on the eigenvalues of . One result [36] is on the empirical distribution function (e.d.f.), , of the eigenvalues of , which throughout the paper, is defined for any Hermitian matrix as
The limit theorem is expressed in terms of the Stieltjes transform of the limiting e.d.f. of the ’s, where for any distribution function (d.f.) its Stieltjes transform, , is defined to be
There exists a onetoone correspondence between the distribution functions (d.f.’s) and their Stieltjes transforms, due to the inversion formula
for continuity points of .
The limit theorem allows the to be random, only assuming as , the convergence of to a nonrandom proper probability distribution function , \ie, . The theorem states that with probability one, as , , where is nonrandom, with Stieltjes transform , satisfying the equation
(19) 
which is unique in the set .
It is more convenient to work with the eigenvalues of the matrix , whose eigenvalues differ from those of by zero eigenvalues. Indeed, with denoting the indicator function on the set we have the exact relationship
almost surely, implying
(20) 
Upon substituting into (19) we find that for solves the equation
(21) 
and is unique in . Thus we have an explicit inverse for .
Qualitative properties of have been obtained in [37], most notably the fact that on has a continuous derivative. The paper [37] also shows how intervals outside the support of can be determined from the graph of (21) for .
Let denote the support of the d.f. , its complement, and define to be (21) with . Intuitively, on is well defined and increasing. Therefore it is invertible on each interval in , its inverse, namely , is also increasing. The details are stated in the following.
[Theorems 4.1, 4.2 of [37]] If , then satisfies (1) , (2) , and (3) . Conversely, if satisfies (1)–(3), then .
In simple terms is comprised of the range of values where is increasing.
Another result which will be needed later is the following.
[Theorem 4.3 of [37]] Suppose each contained in the interval satisfies (1) and (2) of Lemma VIA, and for . Then for all .
Limiting eigenvalue mass at zero is also derived in [37]. It is shown that
(22) 
ViB Support of eigenvalues
Since the convergence in distribution of only addresses how proportions of eigenvalues behave, understanding the possible appearance or nonappearance of eigenvalues in requires further work.
The question of the behavior of the largest and smallest eigenvalues when has been answered by Yin, Bai, and Krishnaiah in [38], and Bai and Yin in [39], respectively, under the additional assumption : the largest eigenvalue and largest eigenvalue of converge a.s. to and respectively, matching the support, of on . More on when will be given later.
For general , restricted to being bounded in spectral norm, the nonappearance of eigenvalues in has been proven by Bai and Silverstein in [40]. Moreover, the separation of eigenvalues across intervals in , mirrors exactly the separation of eigenvalues over corresponding intervals in [41]. The results are summarized below.
Assume additionally and the are nonrandom and are bounded in spectral norm for all .
Let denote the “limiting” e.d.f. associated with , in other words, is the d.f. having Stieltjes transform with inverse (21), where are replace by .
Assume the following condition:

(*) Interval with lies in an open interval outside the support of for all large .
Then .
For Hermitian nonnegative definite matrix , let denote the largest eigenvalue of . For notational convenience, define and .
(i) If , then , the smallest value in the support of , is positive, and with probability 1, as .
(ii) If , or but is not contained in , then , and for all large there is an index for which
(23) 
The behavior of the extreme eigenvalues of leads to the following corollary of Theorem VIB.
If converges to the largest number in the support of , then converges a.s to the largest number in the support of . If converges to the smallest number in the support of , then () implies () converges a.s. to the smallest number in the support of ().
In Theorem VIB, Case (i) applies when , whereby the rank of would be at most , the conclusion asserting, that with probability 1, for all large, the rank is equal to . From Lemma VIA, Case (ii) of Theorem VIB covers all intervals in on resulting from intervals on where is increasing. For all large is increasing on , which, from inspecting the vertical asymptotes of and Lemma VIA, must be due to the existence of , satisfying (23).
Theorem VIB easily extends to random , independent of with the aid of Tonelli’s Theorem [42, pp. 234], provided the condition (*) on is strengthened to:

(**) With probability 1 for all large (nonrandom) lies in an open interval outside the support of .
Indeed, let denote the probability space generating , the probability space generating . Let their respective measures be denoted by ,, the product measure on by . Consider, for example in case (ii), we define
Let be an element of the event defined in (**). Then by Theorem VIB for all contained in a subset of having probability 1. Therefore, by Tonelli’s theorem
Consider now case (ii) of Theorem VIB in terms of the corresponding interval outside the support of and the ’s. By Lemma VIA and condition (*), we have the existence of an such that , and for all large
(24) 
Let , . Then by Lemma VIA we have the existence of an for which and for all large. Moreover, by (24) we have for all large
(25) 
Necessarily, and .
Notice the steps can be completely reversed, that is, beginning with an interval , with , lying in an open interval in for all large and satisfying (25) for some , will yield , with , , satisfying condition (*). Case (ii) applies, since is within the range of for . If , then we would have .
ViC Behavior of spiked eigenvalues
Suppose now the ’s are altered, where a finite number of eigenvalues are interspersed between the previously adjacent eigenvalues and . It is clear that the limiting will remain unchanged. However, the graph of on will now contain vertical asymptotes. If the graph remains increasing on two intervals for all large, each one between successive asymptotes, then because of Theorem VIB, with probability one, eigenvalues of the new will appear in for all large.
Theorem VIC below shows this will happen when a “sprinkled”, or “spiked” eigenvalue lies in . Theorem VIC provides a converse, in the sense that any isolated eigenvalue of must be due to a spiked eigenvalue, the absence of which corresponds to case (ii) of Theorem VIB.
Theorem VIC, below, allows the number of spiked eigenvalues to grow with , provided it remains .
Assume in additon to the assumptions in Theorem VIB on the and :
(a) There are positive eigenvalues of all converging uniformly to , a positive number. Denote by the e..d.f. of the other eigenvalues of . (b) There exists positive contained in an interval with which is outside the support of for all large , such that for these
for . (c) .
Suppose are the eigenvalues stated in (a). Then, with probability one
(26) 
For , we have
By considering continuity points of in we see that is constant on this interval, and consequently, this interval is also contained in .
By Lemma VIA we therefore have for all . Thus we can find and , such that and for all large for all .
It follows that for any positive sufficiently small, there exist positive with , such that, for all large, both , and