Fundamental limit of sample generalized eigenvalue based detection of signals in noise using relatively few signal-bearing and noise-only samples

Fundamental limit of sample generalized eigenvalue based detection of signals in noise using relatively few signal-bearing and noise-only samples

Abstract

The detection problem in statistical signal processing can be succinctly formulated: Given (possibly) signal bearing, -dimensional signal-plus-noise snapshot vectors (samples) and statistically independent -dimensional noise-only snapshot vectors, can one reliably infer the presence of a signal? This problem arises in the context of applications as diverse as radar, sonar, wireless communications, bioinformatics, and machine learning and is the critical first step in the subsequent signal parameter estimation phase.

The signal detection problem can be naturally posed in terms of the sample generalized eigenvalues. The sample generalized eigenvalues correspond to the eigenvalues of the matrix formed by “whitening” the signal-plus-noise sample covariance matrix with the noise-only sample covariance matrix. In this article we prove a fundamental asymptotic limit of sample generalized eigenvalue based detection of signals in arbitrarily colored noise when there are relatively few signal bearing and noise-only samples.

Specifically, we show why when the (eigen) signal-to-noise ratio (SNR) is below a critical value, that is a simple function of , and , then reliable signal detection, in an asymptotic sense, is not possible. If, however, the eigen-SNR is above this critical value then a simple, new random matrix theory based algorithm, which we present here, will reliably detect the signal even at SNR’s close to the critical value. Numerical simulations highlight the accuracy of our analytical prediction and permit us to extend our heuristic definition of the effective number of identifiable signals in colored noise. We discuss implications of our result for the detection of weak and/or closely spaced signals in sensor array processing, abrupt change detection in sensor networks, and clustering methodologies in machine learning.

12{keywords}

signal detection, random matrices, sample covariance matrix, Wishart distribution, multivariate F distribution

EDICS Category: SSP-DETC Detection; SAM-SDET Source detection

I Introduction

The observation vector, in many signal processing applications, can be modelled as a superposition of a finite number of signals embedded in additive noise. The model order selection problem of inferring the number of signals present is the critical first step in the subsequent signal parameter estimation problem. We consider the class of estimators that determine the model order, \ie, the number of signals, in colored noise from the sample generalized eigenvalues of the signal-plus-noise sample covariance matrix and the noise-only sample covariance matrix pair. The sample generalized eigenvalues [1] precisely correspond to the eigenvalues of the matrix formed by “whitening” the signal-plus-noise sample covariance matrix with the noise-only sample covariance matrix (assuming that the number of noise-only samples is greater than the dimensionality of the system so that the noise-only sample covariance matrix is invertible).

Such estimators are used in settings where it is possible to find a portion of the data that contains only noise fields and does not contain any signal information. This is a realistic assumption for many practical applications such as evoked neuromagnetic experiments [2, 3, 4], geophysical experiments that employ a “thumper” or in underwater experiments with a wideband acoustic signal transducer where such a portion can be found in a data portion taken before a stimulus is applied. In applications such as radar or sonar where the signals of interest are narrowband and located in a known frequency band, snapshot vectors collected at a frequency just outside this band can be justified as having the same noise covariance characteristics assuming that we are in the stationary-process-long-observation-time (SPLOT) regime [5].

Our main objective in this paper is to shed new light on this age old problem of detecting signal in noise from finite samples using the sample eigenvalues alone [6, 7]. We bring into sharp focus a fundamental statistical limit that explains precisely when and why, in high-dimensional, sample size limited settings underestimation of the model order is unavoidable. This is in contrast to works in the literature that use simulations, as in [8], to highlight the chronically reported symptom of model order estimators underestimating the number of signals without providing insight into whether a fundamental limit of detection is being encountered.

In recent work [9], we examined this problem in the white noise scenario. The main contribution of this paper is the extension of the underlying idea to the arbitrary (or colored) noise scenario. Analogous to the definition in [9], we define the effective number of identifiable signals in colored noise as the number of the generalized eigenvalues of the population (true) signal-plus-noise covariance matrix and noise-only covariance matrix pair that are greater than a (deterministic) threshold that is a simple function of the number of signal-plus-noise samples, noise-only samples and the dimensionality of the system. Analogous to the white noise case, increasing the dimensionality of the system, by say adding more sensors, raises the detectability threshold so that the effective number of identifiable signals might actually decrease.

An additional contribution of this paper is the development of a simple, new, algorithm for estimating the number of signals based on the recent work of Johnstone [10]. Numerical results are used to illustrate the performance of the estimator around the detectability threshold alluded to earlier. Specifically, we observe that if the eigen-SNR of a signal is above a critical value then reliable detection using the new algorithm is possible. Conversely, if the eigen-SNR is below the critical value then the algorithm, correctly for the reason described earlier, is unable to distinguish the signal from noise.

The paper is organized as follows. We formulate the problem in Section II and state the main result in Section III. The effective number of signals is defined in Section III-A along with a discussion on its implications for applications such as array processing, sensor networks and machine learning. A new algorithm for detecting the number of signals is presented in Section IV. Concluding remarks are offered in Section V. The mathematical proofs of the main result are provided in Section VI.

Ii Problem formulation

We observe samples (“snapshots”) of possibly signal bearing -dimensional snapshot vectors where for each , the snapshot vector has a (real or complex) multivariate normal distribution, \ie, and the ’s are mutually independent. The snapshot vectors are modelled as

(1)

where , denotes an -dimensional (real or complex) Gaussian noise vector where the noise covariance may be known or unknown, denotes a -dimensional (real or complex) Gaussian signal vector with covariance , and is a unknown non-random matrix. Since the signal and noise vectors are independent of each other, the covariance matrix of can hence be decomposed as

(2)

where

(3)

with denoting the complex conjugate or real transpose. Assuming that the matrix is of full column rank, \ie, the columns of are linearly independent, and that the covariance matrix of the signals is nonsingular, it follows that the rank of is . Equivalently, the smallest eigenvalues of are equal to zero.

If the noise covariance matrix were known apriori and was non-singular, a “noise whitening” transformation may be applied to the snapshot vector to obtain the vector

(4)

which will also be normally distributed with covariance

(5)

Denote the eigenvalues of by . Recalling the formulation of the generalized eigenvalue problem [1][Section 8.7], we note that the eigenvalues of are exactly the generalized eigenvalues of the regular matrix pair . Then, assuming that the rank of is also , it follows that the smallest eigenvalues of or, equivalently, the generalized eigenvalues of the matrix pair ), are all equal to so that

(6)

while the remaining eigenvalues of will be strictly greater than one.

Thus, if the true signal-plus-noise covariance matrix and the noise-only covariance matrix were known apriori, the number of signals could be trivially determined from the multiplicity of the eigenvalues of equalling one.

The problem in practice is that the signal-plus-noise and the noise covariance matrices are unknown so that such a straight-forward algorithm cannot be used. Instead we have an estimate the signal-plus-covariance matrix obtained as

(7)

and an estimate of the noise-only sample covariance matrix obtained as

(8)

where for are (possibly) signal-bearing snapshots and for are independent noise-only snapshots. We assume here that the number of noise-only snapshots exceeds the dimensionality of the system, \ie, , so that the noise-only sample covariance matrix , which has the Wishart distribution [11], is non-singular and hence invertible with probability 1 [12, Chapter 3, pp. 97],[13, Chapter 7.7, pp. 272-276]. Following (5), we then form the matrix

(9)

and compute its eigen-decomposition to obtain the eigenvalues of , which we denote by . We note, once again, that the eigenvalues of are simply the generalized eigenvalues of the regular matrix pair . Note that whenever , the signal-plus-noise sample covariance matrix will be singular so that the generalized eigenvalues will equal zero, \ie, . Figure 1 illustrates why the blurring of the sample eigenvalues relative to the population eigenvalues makes the problem more challenging.

In this paper, we are interested in the class of algorithms that infer the number of signals buried in arbitrary noise from the eigenvalues of or alone. Such algorithms are widely used in practice and arise naturally from classical multivariate statistical theory [10] where the matrix is referred to as the multivariate F matrix [12, 14]. The information theoretical approach to model order estimation, first introduced by Wax and Kailath [6], was extended to the colored noise setting by Zhao et al in [15] who prove consistency of their estimator in the large sample size regime; their analysis does not yield any insight into the finite sample setting.

Consequently, research has focussed on developing sophisticated techniques for improving performance of eigenvalue based methods in the finite sample setting. Zhu et al [16] improve the performance of their eigenvalue estimator by assuming a model for the noise covariance matrix. Stoica and Cedervall [17] improve the performance of their estimator in two reasonable settings: one, where it is reasonable to assume that the noise covariance matrix is block diagonal or banded and two, where the temporal correlation of the noise has a shorter length than the signals. Other techniques in the literature exploit other characteristics of the signal or noise to effectively reduce the dimensionality of the signal subspace and improve model order estimation given finite samples. See for example [18, 19] and the references in [9].

Informally speaking, it is evident that performance of such model order estimation algorithms is coupled to the “quality” of the estimated signal-plus-noise and noise-only covariance matrices which in turn are dependent on the number of snapshots used to estimate them, respectively. Researchers applying these techniques have noted the absence of a mathematically rigorous, general purpose formula in the literature for predicting the minimum number of samples needed to obtain “good enough” detection accuracy (see, for example [3][pp. 846]. A larger, more fundamental question that has remained unanswered, till now, is whether there is a statistical limit being encountered.

We tackle this problem head on in this paper by employing sophisticated techniques from random matrix theory in [20]. We show that in an asymptotic sense, to be made precise later, that only the “signal” eigenvalues of that are above a deterministic threshold can be reliably distinguished from the “noise” eigenvalues. The threshold is a simple, deterministic function of the the dimensionality of the system, the number of noise-only and signal-plus-noise snapshots, and the noise and signal-plus noise covariance, and described explicitly next. Note the applicability of the results to the situation when the signal-plus-noise covariance matrix is singular.

Fig. 1: The dimension of the “noise” subspace is equal to the multiplicity of the population eigenvalue equal to one. When the population eigenvalues are known, then detecting the number of signals becomes trivial. However, estimating the number of signals from the sample generalized eigen-spectrum is considerably more challenging because of the finite sample effects. Specifically, the finite number of noise-only and signal-plus-noise samples induces a blurring in the sample eigenspectrum relative to the population eigenspectrum makes discrimination of the “signal” from the “noise” challenging. The figure shows one random instance generated for a dimensional system with noise-only samples and signal-plus-noise bearing samples.

Iii Main result

For a Hermitian matrix with real eigenvalues (counted with multiplicity), the empirical distribution function (e.d.f.) is defined as

(10)

Of particular interest is the convergence of the e.d.f. of in the signal-free case, which is described next.

{theorem}

Let denote the matrix in (9) formed from (complex Gaussian) noise-only snapshots and independent noise-only (complex Gaussian) snapshots. Then the e.d.f. almost surely for every , as , and and where

(11)

where

(12)

when and zero otherwise, and is the Dirac delta function. {proof} This result was proved in [14]. When we recover the famous Marčenko-Pastur density [21].

The following result exposes when the “signal” eigenvalues are asymptotically distinguishable from the “noise” eigenvalues.

{theorem}

Let denote the matrix in (9) formed from (real or complex Gaussian) signal-plus-noise snapshots and independent (real or complex Gaussian) noise-only snapshots. Denote the eigenvalues of by . Let denote the -th largest eigenvalue of . Then as , and and we have

for and the convergence is almost surely and the threshold is given by

(13)

where

(14)

and . {proof} The result follows from Theorem VI-E. The threshold is obtained by solving the inequality

where for , , from [22, 23, 24, 9], is given by

and is given by (30).

Note that when , so that we recover the results of Baik and Silverstein [23].

Iii-a Effective number of identifiable signals

Theorem III brings into sharp focus the reason why, in the large-system-relatively-large-sample-size limit, model order underestimation is sometimes unavoidable. This motivates our heuristic definition of the effective number of identifiable signals below:

(15)

If we denote the eigenvalues of by then we define the eigen-SNR of the -th signal as then (15) essentially states that signals with eigen-SNR’s smaller than will be asymptotically undetectable.

Figure 2 shows the eigen-SNR threshold needed for reliable detection for different values as a function of for different values of . Such an analytical prediction was not possible before the results presented in this paper. Note the fundamental limit of detection in the situation when the noise-only covariance matrix is known apriori (solid line) and increase in the threshold eigen-SNR needed as the number of snapshots available to estimate the noise-only covariance matrix decreases.

Fig. 2: Plot of the minimum (generalized) Eigen-SNR required (equal to where is given by (13)) to be able to asymptotically discriminate between the “signal” and “noise” eigenvalue of the matrix constructed as in (9) as a function of the ratio of the number of sensors to snapshots for different values of where Number of sensors/Number of noise-only snapshots. The gap between the upper two lines and the bottom most line represents the SNR loss due to noise covariance matrix estimation.

Iii-B Implications for array processing

Suppose there are two uncorrelated (hence, independent) signals so that . In (1) let . In a sensor array processing application, we think of and as encoding the array manifold vectors for a source and an interferer with powers and , located at and , respectively. The signal-plus-noise covariance matrix is given by

(16)

where is the noise-only covariance matrix. The matrix defined in (5) can be decomposed as

so we that we can readily note that has the smallest eigenvalues and the two largest eigenvalues

(17a)
(17b)

respectively, where and . Applying the result in Theorem III allows us to express the effective number of signals as

(18)

Equation (18) captures the tradeoff between the identifiability of two closely spaced signals, the dimensionality of the system, the number of available snapshots and the cosine of the angle between the vectors and . Note that since the effective number of signals depends on the structure of the theoretical signal and noise covariance matrices (via the eigenvalues of ), different assumed noise covariance structures (AR(1) versus white noise, for example) will impact the signal level SNR needed for reliable detection in different ways.

Iii-C Other applications

There is interest in detecting abrupt change in a system based on stochastic observations of the system using a network of sensors. When the observations made at various sensors can be modeled as Gauss-Markov random field (GMRF), as in [25, 26], then the conditional independence property of GMRF’s [27] is a useful assumption. The assumption states that conditioned on a particular hypothesis, the observations at sensors are independent. This assumption results in the precision matrix, \ie, the inverse of the covariance matrix, having a sparse structure with many entries identically equal to zero.

Our results might be used to provide insight into the types of systemic changes, reflected in the structure of the signal-plus-noise covariance matrix, that are undetectable using sample generalized eigenvalue based estimators. Specifically, the fact that the inverse of the noise-only covariance matrix will have a sparse structure means that one can experiment with different (assumed) conditional independence structures and determine how “abrupt” the system change would have to be in order to be reliably detected using finite samples.

Spectral methods are popular in machine learning applications such as unsupervised learning, image segmentation, and information retrieval [28]. Generalized eigenvalue based techniques for clustering have been investigated in [29, 30]. Our results might provide insight when spectral clustering algorithms are likely to fail. In particular, we note that the results of Theorem III hold even in situations where the data is not Gaussian (see Theorem VI-E) as is commonly assumed in machine learning applications.

Iv An algorithm for reliable detection of signals in noise

In [10], Johnstone proves that in the signal-free case, the distribution of the largest eigenvalue of , on appropriate centering and scaling, can be approximated to order by the Tracy-Widom law [31, 32, 33]. In the setting where there are signals present, we expect, after appropriate centering and scaling, the distribution of the signal eigenvalues of above the detectability threshold will obey a Gaussian law whereas those below the detectability threshold will obey the Tracy-Widom law as in the signal-free case. An analogous results for the signal bearing eigenvalues of was proved by Baik et al [22] and El Karoui [34]. Numerical investigations for (see Figure 3) corroborate the accuracy of our asymptotic predictions and form the basis of Algorithm 1 presented below for estimating the number of signals at (asymptotic) significance level . Theoretical support for this observation remains incomplete.

Algorithm 1
Input: Eigenvalues for of
1. Initialization: Set significance level
2. Compute from Table II
3. Set k = 0
4. Compute and from Table I(a)
5. Is ?
6. If yes, then go to step 9
7. Otherwise, increment .
8. If , go to step 3. Else go to step 9.
9. Return

Figure 4 illustrates the accuracy of the predicted statistical limit and the ability of the proposed algorithm to reliably detect the presence of the signal at this limit.

(a) Here , so that .
(b) Here , so that
Fig. 3: In (a), for the setting described in Theorem III we set , , , , and w.l.o.g. , and compare the the empirical cdf of the largest eigenvalue of with the largest eigenvalue of with , \ie, in the noise-only case, over Monte-Carlo trials. In (b), we plot the empirical cdf but now with .
(a) Algorithm 1
(b) Algorithm 2
TABLE I: Parameters for signal detection algorithms.
0.990000 0.010000 -3.89543267306429 -3.72444594640057
0.950000 0.050000 -3.18037997693774 -3.19416673215810
0.900000 0.100000 -2.78242790569530 -2.90135093847591
0.700000 0.300000 -1.91037974619926 -2.26618203984916
0.500000 0.500000 -1.26857461658107 -1.80491240893658
0.300000 0.700000 -0.59228719101613 -1.32485955606020
0.100000 0.900000 0.45014328905825 -0.59685129711735
0.050000 0.950000 0.97931605346955 -0.23247446976400
0.010000 0.990000 2.02344928138015 0.47763604739084
0.001000 0.999000 3.27219605900193 1.31441948008634
0.000100 0.999900 4.35942034391365 2.03469175457082
0.000010 0.999990 5.34429594047426 2.68220732168978
0.000001 0.999999 6.25635442969338 3.27858828203370
TABLE II: The third and fourth column show the percentiles of the Tracy-Widom real and complex distribution respectively corresponding to fractions in the second column. The percentiles were computed in \matlabusing software provided by Folkmar Bornemann for the efficient evaluation of the real and complex Tracy-Widom distribution functions . The percentiles are computed using the fzero command in \matlab. The accuracy of the computed percentiles is about in absolute error terms.

In the special setting where the noise covariance matrix is known apriori, the results of Baik et al [22], El Karoui [34] and Ma [35] form the basis of Algorithm 2 presented below for estimating the number of signals at (asymptotic) significance level .

Algorithm 2
Input: Eigenvalues for of
1. Initialization: Set significance level
2. Compute from Table II
3. Set k = 0
4. Compute and from Table I(b)
5. Is ?
6. If yes, then go to step 9
7. Otherwise, increment .
8. If , go to step 3. Else go to step 9.
9. Return
Fig. 4: A heat map of the log probability of signal detection using Algorithm 1 in Section IV, with the significance level set at , in (eigen) SNR versus number of sensors to number of signal-plus-noise snapshots phase space. In this example, for the setting described in Theorem III we set , and w.l.o.g. , and evaluated Prob() over Monte-Carlo trials and a grid of equally spaced points in the -5 dB to 15 dB (eigen) SNR range and equally spaced points in the space by setting . The values of the colormap at each of the faces were interpolated across each line segment and face to obtain the above plot. In the dark zone (upper half of the plot) a signal can be reliably detected whereas in the lighter zone (lower half of the plot) the signal is statistically indistinguishable from noise as evidenced from the probability of detection being close to the significance level. The superimposed solid black line demarcates the theoretically predicted threshold while the superimposed solid red line is the theoretically predicted threshold in the setting where the noise covariance matrix is perfectly known. The gap between the two lines thus represents the SNR loss due to noise covariance matrix estimation.

V Conclusion

Figure 4 captures the fundamental statistical limit encountered when attempting to discriminate signal from noise using finite samples. Simply put, a signal whose eigen-SNR is below the detectability threshold cannot be reliably detected while a signal above the threshold can be. In settings such as wireless communications and biomedical signal processing where the signal power is controllable, our results provide a prescription for how strong it needs to be so that it can be detected. If the signal level is barely above the threshold, simply adding more sensors might actually degrade the performance because of the increased dimensionality of the system. If, however, either due to clever signal design or physics based modeling, we are able to reduce (or identify) the dimensionality of the subspace spanned by signal, then according to Figure 4 the detectability threshold will also be lowered. With VLSI advances making sensors easier and cheaper to deploy, our results demonstrate exactly why the resulting gains in systemic performance will more than offset the effort we will have to invest in developing increasingly more sophisticated dimensionality reduction techniques. Understanding the fundamental statistical limits of techniques for signal detection in the setting where the noise-only sample covariance matrix is singular remains an important open problem.

Acknowledgements

Raj Rao was supported by an Office of Naval Research Post-Doctoral Fellowship Award under grant N00014-07-1-0269. Jack Silverstein was supported by the U.S. Army Research Office under Grant W911NF-05-1-0244. R. R. thanks Arthur Baggeroer for encouragement and invaluable feedback. This material was based upon work supported by the National Science Foundation under Agreement No. DMS-0112069. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

We thank Alan Edelman for his encouragement and support, Interactive Supercomputing, Inc. for providing access to the Star-P parallel computing software and Sudarshan Raghunathan of Interactive Supercomputing, Inc. for his patience and support in answering our multitude of Star-P programming queries. We remain grateful to Al Davis and Chris Hill of MIT for granting us access to the Darwin Project computing cluster. Thanks to their involvement we were able to program, debug and complete the computation needed to produce Figure 4 in 4 days! Without their gracious help, the computation would have taken 3 months on the latest single processor laptop. We thank Folkmar Bornemann for providing the \matlabcode for computing the percentiles in Table II.

Vi Appendix

Vi-a Mathematical preliminaries

Let for , be a collection of complex valued i.i.d. random variables with and . For positive integers and let , , . Assume for each is an Hermitian nonnegative definite matrix. The matrix

where is any Hermitian square root of , can be viewed as a sample covariance matrix, formed from samples of the random vector with denoting the first column of , which has for its population covariance matrix. When and are both large and on the same order of magnitude, will not be near , due to an insufficient number of samples required for such a large dimensional random vector. However, there exist results on the eigenvalues of . They are limit theorems as with and , which provide information on the eigenvalues of . One result [36] is on the empirical distribution function (e.d.f.), , of the eigenvalues of , which throughout the paper, is defined for any Hermitian matrix as

The limit theorem is expressed in terms of the Stieltjes transform of the limiting e.d.f. of the ’s, where for any distribution function (d.f.) its Stieltjes transform, , is defined to be

There exists a one-to-one correspondence between the distribution functions (d.f.’s) and their Stieltjes transforms, due to the inversion formula

for continuity points of .

The limit theorem allows the to be random, only assuming as , the convergence of to a nonrandom proper probability distribution function , \ie, . The theorem states that with probability one, as , , where is nonrandom, with Stieltjes transform , satisfying the equation

(19)

which is unique in the set .

It is more convenient to work with the eigenvalues of the matrix , whose eigenvalues differ from those of by zero eigenvalues. Indeed, with denoting the indicator function on the set we have the exact relationship

almost surely, implying

(20)

Upon substituting into (19) we find that for solves the equation

(21)

and is unique in . Thus we have an explicit inverse for .

Qualitative properties of have been obtained in [37], most notably the fact that on has a continuous derivative. The paper [37] also shows how intervals outside the support of can be determined from the graph of (21) for .

Let denote the support of the d.f. , its complement, and define to be (21) with . Intuitively, on is well defined and increasing. Therefore it is invertible on each interval in , its inverse, namely , is also increasing. The details are stated in the following.

{lemma}

[Theorems 4.1, 4.2 of [37]] If , then satisfies (1) , (2) , and (3) . Conversely, if satisfies (1)–(3), then .

In simple terms is comprised of the range of values where is increasing.

Another result which will be needed later is the following.

{lemma}

[Theorem 4.3 of [37]] Suppose each contained in the interval satisfies (1) and (2) of Lemma VI-A, and for . Then for all .

Limiting eigenvalue mass at zero is also derived in [37]. It is shown that

(22)

Vi-B Support of eigenvalues

Since the convergence in distribution of only addresses how proportions of eigenvalues behave, understanding the possible appearance or non-appearance of eigenvalues in requires further work.

The question of the behavior of the largest and smallest eigenvalues when has been answered by Yin, Bai, and Krishnaiah in [38], and Bai and Yin in [39], respectively, under the additional assumption : the largest eigenvalue and largest eigenvalue of converge a.s. to and respectively, matching the support, of on . More on when will be given later.

For general , restricted to being bounded in spectral norm, the non-appearance of eigenvalues in has been proven by Bai and Silverstein in [40]. Moreover, the separation of eigenvalues across intervals in , mirrors exactly the separation of eigenvalues over corresponding intervals in [41]. The results are summarized below.

{theorem}

Assume additionally and the are nonrandom and are bounded in spectral norm for all .

Let denote the “limiting” e.d.f. associated with , in other words, is the d.f. having Stieltjes transform with inverse (21), where are replace by .

Assume the following condition:

  • (*) Interval with lies in an open interval outside the support of for all large .

Then .

For Hermitian non-negative definite matrix , let denote the largest eigenvalue of . For notational convenience, define and .

(i) If , then , the smallest value in the support of , is positive, and with probability 1, as .

(ii) If , or but is not contained in , then , and for all large there is an index for which

(23)

Then . {proof} See proof of Theorems 1.1 in [40, 41]).

The behavior of the extreme eigenvalues of leads to the following corollary of Theorem VI-B.

{corollary}

If converges to the largest number in the support of , then converges a.s to the largest number in the support of . If converges to the smallest number in the support of , then () implies () converges a.s. to the smallest number in the support of ().

In Theorem VI-B, Case (i) applies when , whereby the rank of would be at most , the conclusion asserting, that with probability 1, for all large, the rank is equal to . From Lemma VI-A, Case (ii) of Theorem VI-B covers all intervals in on resulting from intervals on where is increasing. For all large is increasing on , which, from inspecting the vertical asymptotes of and Lemma VI-A, must be due to the existence of , satisfying (23).

Theorem VI-B easily extends to random , independent of with the aid of Tonelli’s Theorem [42, pp. 234], provided the condition (*) on is strengthened to:

  • (**) With probability 1 for all large (nonrandom) lies in an open interval outside the support of .

Indeed, let denote the probability space generating , the probability space generating . Let their respective measures be denoted by ,, the product measure on by . Consider, for example in case (ii), we define

Let be an element of the event defined in (**). Then by Theorem VI-B for all contained in a subset of having probability 1. Therefore, by Tonelli’s theorem

Consider now case (ii) of Theorem VI-B in terms of the corresponding interval outside the support of and the ’s. By Lemma VI-A and condition (*), we have the existence of an such that , and for all large

(24)

Let , . Then by Lemma VI-A we have the existence of an for which and for all large. Moreover, by (24) we have for all large

(25)

Necessarily, and .

Notice the steps can be completely reversed, that is, beginning with an interval , with , lying in an open interval in for all large and satisfying (25) for some , will yield , with , , satisfying condition (*). Case (ii) applies, since is within the range of for . If , then we would have .

Vi-C Behavior of spiked eigenvalues

Suppose now the ’s are altered, where a finite number of eigenvalues are interspersed between the previously adjacent eigenvalues and . It is clear that the limiting will remain unchanged. However, the graph of on will now contain vertical asymptotes. If the graph remains increasing on two intervals for all large, each one between successive asymptotes, then because of Theorem VI-B, with probability one, eigenvalues of the new will appear in for all large.

Theorem VI-C below shows this will happen when a “sprinkled”, or “spiked” eigenvalue lies in . Theorem VI-C provides a converse, in the sense that any isolated eigenvalue of must be due to a spiked eigenvalue, the absence of which corresponds to case (ii) of Theorem VI-B.

Theorem VI-C, below, allows the number of spiked eigenvalues to grow with , provided it remains .

{theorem}

Assume in additon to the assumptions in Theorem VI-B on the and :

(a) There are positive eigenvalues of all converging uniformly to , a positive number. Denote by the e..d.f. of the other eigenvalues of . (b) There exists positive contained in an interval with which is outside the support of for all large , such that for these

for . (c) .

Suppose are the eigenvalues stated in (a). Then, with probability one

(26)
{proof}

For , we have

By considering continuity points of in we see that is constant on this interval, and consequently, this interval is also contained in .

Because of (b) we have for (recall (24),(25)).

By Lemma VI-A we therefore have for all . Thus we can find and , such that and for all large for all .

It follows that for any positive sufficiently small, there exist positive with , such that, for all large, both , and