Large Deviations of the Maximum Eigenvalue for Wishart and Gaussian Random Matrices

# Large Deviations of the Maximum Eigenvalue for Wishart and Gaussian Random Matrices

Satya N. Majumdar and Massimo Vergassola Laboratoire de Physique Théorique et Modèles Statistiques (UMR 8626 du CNRS), Université Paris-Sud, Bâtiment 100, 91405 Orsay Cedex (France)
Institut Pasteur, CNRS URA 2171, F-75724 Paris 15, France.
July 12, 2019
###### Abstract

We present a simple Coulomb gas method to calculate analytically the probability of rare events where the maximum eigenvalue of a random matrix is much larger than its typical value. The large deviation function that characterizes this probability is computed explicitly for Wishart and Gaussian ensembles. The method is quite general and applies to other related problems, e.g. the joint large deviation function for large fluctuations of top eigenvalues. Our results are relevant to widely employed data compression techniques, namely the principal components analysis. Analytical predictions are verified by extensive numerical simulations.

###### pacs:
02.50.-r, 02.50.Sk, 02.10.Yn, 24.60.-k, 21.10.Ft

Rare events where one of the eigenvalues of a random matrix is much larger than the others play an important role in data compression techniques such as the “Principal Components Analysis” (PCA). PCA is a very useful method to detect hidden patterns or correlations in complex, high-dimensional datasets. A non-exhaustive list of applications includes image processing Wilks (); Fukunaga (); Smith (), biological microarrays arrays1 (); arrays2 (), population genetics Cavalli (); genetics (), finance BP (); Burda (), meteorology and oceanography Preisendorfer (). The main idea behind PCA is very simple. Consider a rectangular matrix whose entries represent some data. For instance, might represent examination marks of the -th student () in the -th subject (physics, mathematics, chemistry, etc., with ). The product symmetric matrix represents the covariance matrix of the data and it contains information about correlations. In PCA, one first identifies eigenvalues and eigenvectors of . The data are maximally scattered and correlated along the eigenvector (“principal component”) associated with the largest eigenvalue . The scatter progressively reduces as lower and lower eigenvalues are considered. The subsequent step is the reduction of data dimensionality, achieved by setting to zero those components corresponding to low eigenvalues. The rationale is that retaining the largest components will preserve the major patterns in the data and only minor variations are filtered out.

The above description of PCA makes it clear that the efficiency of the method crucially depends upon the gap between the top eigenvalues and the “sea” of intermediate and small eigenvalues. In particular, the further is the maximum eigenvalue spaced from all the others, the more effective the projection and the compression procedure will be. A question then naturally arises: how good is PCA for random data? This issue has a twofold interest. First, in many situations, the data are high-dimensional and have random components. Second, random ensembles provide null models needed to gauge the statistical significance of results obtained for non-random datasets. To address the question just formulated, one needs to compute the probability of rare events where the largest eigenvalue has atypically large fluctuations. The purpose of this Letter is to provide a simple physical method, based on the Coloumb gas method in statistical physics, that allows us to compute analytically the probability of these rare events for a general class of random matrices.

Let us start by considering Wishart matrices Wishart (), which are directly related to PCA and multivariate statistics  Johnstone (). Wishart matrices are defined via the product of a random matrix having its elements drawn independently from a Gaussian distribution, . The Dyson indices correspond respectively to real and complex  Dyson (). Without any loss of generality, we will assume hereafter that . In addition to the aforementioned PCA applications, Wishart matrices appear in antenna selection in communication technology Mimo (), nuclear physics Fyo1 (), quantum chromodynamics QCD (), statistical physics of directed polymers in random media Johansson () and nonintersecting Brownian motions SMCR ().

The spectral properties of are well-known. For example, eigenvalues ’s of are nonnegative random variables with a joint probability density function (PDF) James ()

 P[{λi}]∝e−β2∑Ni=1λiN∏i=1λαβ/2i∏j

where . This can be written as , with the energy

 E[{λi}]=N∑i=1(λi−αlogλi)−∑j≠kln|λj−λk|, (2)

coinciding with that of a -d Coulomb gas of charges with coordinates . Charges are confined to the positive half-line in the presence of an external linear+logarithmic potential. The external potential tends to push the charges towards the origin, whilst the Coulomb repulsion tends to spread them apart. A glance at (2) indicates that these two competing mechanisms balance for values of scaling as . Indeed, from the joint PDF (1), one can calculate the average density of eigenvalues, , with the Marcenko-Pastur (MP) MP () scaling function :

 f(x)=12πx√(b−x)(x−a). (3)

Here, (with ) and the upper and lower edges are and . For all , the average density vanishes at both edges of the MP sea. For the special case , we have , and the average density for , diverges as at the lower edge (see Fig. 1).

The MP expression shows that the maximum eigenvalue has the average value for large . Typical fluctuations of around its mean are known to be of  Johnstone (); Johansson (). More specifically, , where has an -independent limiting PDF, , the well-known Tracy-Widom (TW) density TW1 (). The TW distribution for has asymmetric tails TW1 ()

 gβ(χ) ∼ exp[−β24|χ|3]asχ→−∞, (4) ∼ exp[−2β3χ3/2]asχ→∞. (5)

In contrast, the probability of atypically large, e.g. , fluctuations of from its mean are not captured by the TW distribution. Note that these configurations are precisely those relevant here, as they are ideally suited for the PCA to work accurately.

How does the PDF look like for where the TW form is no longer valid? Using general large deviation principles, Johansson Johansson () proved that for large fluctuations from its mean, the PDF has the form (for large ) :

 P(t,N) ∼ exp[−βN2Φ−(bN−tN)]t≪bN; (6) ∼ exp[−βNΦ+(t−bNN)]t≫bN;

where are the right (left) rate functions for the large positive (negative) fluctuations of . The challenge is to explicitly compute their functional forms. The approach developed for Gaussian matrices DM () allows to compute the left function  VMB () but it does not apply to the right tail. The problem of computing the right function is solved hereafter. This is followed by the application of the new method to Gaussian matrices and further generalizations.

The starting point of our method to compute is the energy expression (2). The Coulomb gas physics suggests that when the rightmost charge is moved to the right, , the MP sea is a priori not subject to forces capable of macroscopic rearrangements. Following this physical picture, the right rate function is determined by the energy cost in pulling the rightmost charge in the external potential of the Coulomb gas and the interaction of the charge with the unperturbed MP sea. This energy cost for can be estimated for large using Eq. (2)

 ΔE(t)=t−αln(t)−2N∫ln|t−λ|ρN(λ)dλ, (7)

where is the MP average density of charges and the integral describes the Coulomb interaction of the rightmost charge with the MP sea. This energy cost expression is valid up to an additive constant, which is chosen such that since our reference configuration is the one where . For large , we scale , use the MP expression (3) and the energy cost finally takes the form

 ΔE(z)N=z−1−ccln(z)−2∫baln(z−z′)f(z′)dz′, (8)

valid for and up to an additive constant. The probability of such a configuration is . Making a shift of variable , it follows that for large and for agrees with the large deviation behavior in Eq. (6). Progress is that we also have derived the explicit expression of the right rate function

 Φ+(x)=x2 − 1−c2cln(x+bb) (9) − ∫baln(x+b−x′b−x′)f(x′)dx′,

where and the additive constant was chosen to have . The integral can be performed exactly and expressed as a hypergeometric function. For ( and ), we obtain

 Φ+(x)=x+22−ln(x+4)+1x+4G(44+x), (10)

where is a hypergeometric function (with a lengthy but explicit expression in terms of elementary functions). For the sake of comparison, we also report the simpler expression of the left rate function VMB (): for .

The asymptotics of can be easily worked out from Eq. (9). For large , independently of , while the function has a nonanalytic behavior for small  :

 Φ+(x)≈√b−a3bx3/2asx→0. (11)

This shows that, as from the right side, the PDF of in Eq. (6) behaves as . Expressing the exponent in terms of the TW variable , we recover exactly the right tail behavior of the TW density in Eq. (5). Thus, the large deviation function matches, for small arguments , the behavior of the TW density at large arguments. This is quite consistent with the fact that the scales of the fluctuations for TW and are and , respectively. In fact, our method provides, as a bonus, a physical derivation of the right tail behavior of the TW density (originally derived through the Painlevé differential equation satisfied by the TW distribution TW1 ()).

We confirmed theoretical predictions by extensive numerical simulations. About realizations of real () Wishart matrices of sizes , , and with different values of were efficiently generated using the tridiagonal results in Edel (). We find very good agreement with our analytical predictions for the right large deviations. For example, in Fig. (2) we present the results for and . MonteCarlo numerical results are compared to the TW density (obtained by numerically integrating the Painlevé equation satisfied by the TW distribution TW1 ()) and in Eq. (10), multiplied by . For comparison, we also show the corresponding left rate function VMB () multiplied by . It is clear that, while the numerical data are well described by the TW density near the peak of the distribution, they deviate considerably from TW as one moves into the tails, where our large deviation predictions work perfectly.

Our Coulomb gas method is quite general and it can be applied to other related problems. For example, we can compute the right large deviation function of for Gaussian random matrices. For the latter, the eigenvalues can be positive or negative with joint PDF Wigner () :

 P[{λi}]∝e−β2∑Ni=1λ2i∏j

where the Dyson indices , and correspond to the orthogonal, unitary and symplectic ensembles. The quadratic nature of the potential in (12), in contrast to the linear term appearing in (1), makes that the amplitude of a typical eigenvalue scales as . The average density of states for large has the scaling form, , where the famous Wigner semi-circular law has compact support over . Thus, and typical fluctuations of around its mean are known TW1 () to be TW distributed over a scale of . Specifically, , with , and is a random variable with the TW distribution . Again, the TW form describes the PDF only in the vicinity of over a small scale of , while deviations from the TW form appear in the tails.

Fluctuations of over a scale are described by large deviation functions, analogous to the Wishart case in Eq. (6) but with a different scaling variable

 P(t,N) ∼ exp[−βN2Ψ−(√2N−t√N)]t≪√2N; ∼ exp[−βNΨ+(t−√2N√N)]t≫√2N.

As previously mentioned, the left rate function was recently computed exactly in Ref. DM (), but the right rate function was yet unknown. Our Coulomb gas approach allows to solve this problem as well and gives for  :

 Ψ+(x)=z2−12−ln(z√2)+14z2G(2z2). (13)

Here, , the hypergeometric function was defined earlier and the additive constant was chosen to have . The asymptotics of can be easily derived: for large , , while the non-analytic behavior holds for small . Using the TW scaling variable , with defined after (12), it is easy to check that one recovers the correct TW right tails for all , and . This provides again a physical derivation of the TW right tail.

We have realized simulations for Gaussian matrices with sizes , and and for and . In Fig. (3) we present the data for the PDF of (with , ) and compare with the TW form and the exact left function DM () and right rate function derived in Eq. (13). As in the Wishart case, the TW form works well near the peak , but it fails as we move into the tails, where the large deviation predictions are quite accurate.

Our Coulomb gas method lends to further generalizations that we only briefly mention here. For instance, we can compute the joint probability distribution for large fluctuations of top eigenvalues in Wishart and Gaussian random matrices. If , the energy will be given by their mutual charge interactions, the external potentials and their interaction with the unperturbed MP sea. Integrals are the same as those computed previously and yield the explicit expression for the joint PDF. It is also possible to use our method to compute the large deviation function for fluctuations of the smallest eigenvalue for Wishart matrices with . Note that the MP sea remains unperturbed (and our method applies) for small fluctuations of while the method in DM () applies for large fluctuations of , which compress the MP sea.

In conclusion, we have presented a new Coulomb gas method to compute large deviation probabilities of top eigenvalues for a general class of random matrices. The physical picture that emerges is quite transparent: when the top eigenvalues are pulled to the right (towards large values) the Marcenko-Pastur (or Wigner) sea is simply pinched and the top eigenvalues do not drag all the other eigenvalues. In other words, no macroscopic rearrangement of the sea occurs and the top eigenvalues move in the effective potential defined by the external potential of the Coulomb gas and by the electrostatic potential generated by the charges in the sea. Our predictions are formally valid for large yet our simulations indicate that they work for moderate as well. This further adds to the relevance of the large deviation rate functions derived here to data compression methods and their applications.

Acknowledgments We are grateful to E. Aurell for the invitations to KTH, where this work was initiated.

## References

• (1) S.S. Wilks, Mathematical Statistics (John Wiley & Sons, New York, 1962).
• (2) K. Fukunaga, Introduction to Statistical Pattern Recognition (Elsevier, New York, 1990).
• (3) L.I. Smith, “A tutorial on Principal Components Analysis” (2002).
• (4) N. Holter et al. Proc Natl Acad Sci USA 97, 8409 (2000).
• (5) O. Alter et al. Proc Natl Acad Sci USA 97, 10101 (2000).
• (6) L.-L. Cavalli-Sforza, P. Menozzi and A. Piazza, “The History and Geography of Human Genes”, Princeton Univ. Press, 1994.
• (7) J. Novembre and M. Stephens Nature Genetics 40, 646 (2008).
• (8) J.-P. Bouchaud and M. Potters, Theory of Financial Risks (Cambridge University Press, Cambridge, 2001).
• (9) Z. Burda and J. Jurkiewicz, Physica A 344, 67 (2004); Z. Burda, J. Jurkiewicz and B. Waclaw, Acta Physica Polonica B 36, 2641 (2005) and references therein.
• (10) R.W. Preisendorfer, Principal Component Analysis in Meteorology and Oceanography (Elsevier, New York, 1988).
• (11) J. Wishart, Biometrica 20, 32 (1928).
• (12) I.M. Johnstone, Ann. Statist. 29, 295 (2001).
• (13) F.J. Dyson, J. Math. Phys. 3 140 (1962).
• (14) M. Sadek, A. Tarighat and A.H. Sayed, IEEE Trans. Signal Processing, 55, 1498 (2007).
• (15) Y.V. Fyodorov and H.-J. Sommers, J. Math. Phys. 38, 1918 (1997); Y.V. Fyodorov and B.A. Khoruzhenko, Phys. Rev. Lett. 83, 66 (1999).
• (16) J.J.M. Verbaarschot, Phys. Rev. Lett. 72, 2531 (1994).
• (17) K. Johansson, Comm. Math. Phys. 209, 437 (2000).
• (18) G. Schehr, S.N. Majumdar, A. Comtet and J. Randon-Furling, Phys. Rev. Lett. 101, 150601 (2008).
• (19) A.T. James, Ann. Math. Statistics 35, 475 (1964).
• (20) V.A. Marcenko and L.A. Pastur, Math. USSR-Sb, 1, 457 (1967).
• (21) C. Tracy and H. Widom, Comm. Math. Phys. 159, 151 (1994); 177, 727 (1996).
• (22) D.S. Dean and S.N. Majumdar, Phys. Rev. Lett. 97, 160201 (2006); Phys. Rev. E 77, 041108 (2008).
• (23) P. Vivo, S.N. Majumdar, and O. Bohigas, J. Phys. A. Math-Gen. 40, 4317 (2007).
• (24) I. Dumitriu and A. Edelman, J. Math. Phys., 43, 5830, (2002).
• (25) E.P. Wigner, Proc. Cambridge Philos. Soc. 47, 790 (1951).
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters