# The Index Distribution of Gaussian Random Matrices

## Abstract

We compute analytically, for large , the probability distribution of the number of positive eigenvalues (the index ) of a random matrix belonging to Gaussian orthogonal , unitary or symplectic ensembles. The distribution of the fraction of positive eigenvalues scales, for large , as where the rate function , symmetric around and universal (independent of ), is calculated exactly. The distribution has non-Gaussian tails, but even near its peak at it is not strictly Gaussian due to an unusual logarithmic singularity in the rate function.

###### pacs:

02.50.-r; 02.10.Yn; 24.60.-kRandom matrix theory (RMT) has played a central role in various branches of physics since its inception [[1]]. Through the years, different, seemingly unrelated problems in physics and mathematics have been linked via RMT. It is not surprising then that the distributions of observables associated with random matrices play a very important role in a variety of physical contexts. Still, after more than half a century, certain natural questions about eigenvalue distributions have eluded a thorough treatment, in spite of their relevance to a broad range of subjects.

As an example, classical disordered systems offer the ideal environment where RMT ideas and tools may be applied. Physical systems such as liquids and spin glasses are known to exhibit a rich energy or free energy landscape characterized by many extrema (minima, maxima and saddles) and rather complex stability patterns [[2]] which play an important role both in statics and dynamics of such systems. The stability of a stationary point of an -dimensional potential landscape is decided by the real eigenvalues of the Hessian matrix which is evidently symmetric. If all eigenvalues are positive (negative), the stationary point is a local minimum (local maximum). If some, but not all, are positive it is a saddle. The number of positive eigenvalues , called the index, is a key object of interest as it determines the number of directions in which a stationary point is stable.

In many situations, important insights about the system can be gained by simply assuming that the Hessian is a real symmetric random matrix drawn from a Gaussian ensemble: Gaussian orthogonal ensemble characterized by the Dyson index . This random Hessian model (RHM) has been studied extensively in the context of disordered systems [[3]], landscape based string theory [[4], [5]] and quantum cosmology [[6]]. In RHM, the fraction of positive eigenvalues is a random variable whose distribution is our main object of interest in this Letter. Although in RHM , it is also of interest to study the index distribution for other Gaussian ensembles namely the unitary () and the symplectic (). In this Letter, we study the index distribution for general .

Due to the Gaussian symmetry of the ensemble, it is clear that on average half of the eigenvalues are positive (or negative), implying . Thus, the distribution must be symmetric around with a peak at . Cavagna et al. studied , for , using the replica method and some additional approximations [[3]]. They argued that for large the distribution has a Gaussian peak around its mean [[3]]

(1) |

indicating that the variance , or equivalently for large . In the opposite limit, near the tail (or equivalently ) , the probability that all eigenvalues are positive, was recently computed exactly for large and for all [[7]],

(2) |

It is then natural to ask how does the distribution behave in between, i.e., for . In particular, the presence of term in (1) presents a challenging puzzle: how does one smoothly interpolate the distribution between the two limits, the peak and the tails?

In this Letter we resolve this outstanding puzzle by computing the distribution exactly for large in the full range and for all . Let us summarize our main results. We show that to leading order for large ,

(3) |

where indicates . The exact rate function , symmetric around and universal (independent of ), is given in (18) for and is plotted in Fig. (3). The fact that the logarithm of the probability for fixed is quite natural, as it represents the free energy of an associated Coloumb fluid of charges (eigenvalues). The Coulomb energy of charges clearly scales as . In the limit , we get in agreement with (2). The distribution is thus highly non-Gaussian near its tails. In the opposite limit , we find a marginally quadratic behavior, modulated by an unusual logarithmic singularity

(4) |

The variance computed from our exact formula, for large perfectly agrees, for , with Ref. [[3]]. However, the distribution of near its peak is not a Gaussian with variance as claimed in [[3]], rather our exact result shows that the origin of the term is due to the logarithmic singularity associated with the rate function itself near . In addition to obtaining the full distribution thus solving this challenging puzzle, our Coloumb gas approach also provides a new method of finding solutions to singular integral equation with two disjoint supports. This method is rather general and can be fruitfully applied to other related problems in RMT.

Our starting point is the joint distribution of eigenvalues of Gaussian random matrices parametrized by the Dyson index [[1]]

(5) |

where the normalization constant can be computed using the celebrated Selberg’s integral. For large , it is known that [[1]], with .

The distribution of the index, i.e., the number of positive eigenvalues can be expressed in terms of the joint distribution

(6) |

where the integrals are restricted over configurations with only positive eigenvalues and the binomial counts the different relabellings of the eigenvalues. The function can be interpreted as the energy of a configuration of charged particles located at on the real line and is the partition function of this fluid at inverse temperature . The distribution of the fraction is then simply .

In this Coulomb gas picture, of the total charges are confined to the positive real semiaxis. The charges repel each other via the 2-d Coulomb interaction (logarithmic) and are also subject to an external confining potential (parabolic). As a result of these two competing energy scales, it is easy to see that typically for large [[7]]. The evaluation of such partition function in the large limit is carried out in two steps [[7]]: i) a coarse-graining protocol, where one sums over all microscopic arrangements of compatible with a fixed and normalized (to unity) charge density function , and ii) a functional integral over all possible normalized charge density functions, upon using the scaling where the scaling function satisfies .

The resulting functional integral over is then evaluated in the large limit via a saddle point method. In physical terms, this amounts to finding the equilibrium density of the fluid (minimizing its free energy) under the competing interactions (Coulomb repulsion and quadratic confinement) and the external constraint ( particles kept always on the positive semiaxis). This constrained Coulomb gas approach has proven useful in a number of different contexts such as Gaussian and Wishart extreme eigenvalues [[7], [8], [9]], nonintersecting Brownian interfaces [[10]], quantum transport in chaotic cavities [[11]], statistics of critical points in Gaussian landscapes [[12], [13]], bipartite entanglement [[14]] and also in information and communication systems [[15]].

Using the above approach one gets, to leading order in large ,

(7) |

with the action given by:

(8) |

where and are Lagrange multipliers enforcing the fraction of positive eigenvalues and the normalization of , and is the Heaviside step function.

The equilibrium fluid density , which minimizes the action or the free energy, is obtained by the saddle point equation , resulting in the integral equation

(9) |

By taking one derivative with respect of , we obtain, for ,

(10) |

where stands for the Cauchy’s principal part. It turns out that there exists a closed formula, due to Tricomi [[16]], for the solution of such integral equations provided the solution has a single support. This is indeed the case in the two limiting situations and . For , the solution is given by the celebrated Wigner’s semicircle, with . In the opposite limit , all the eigenvalues are on the positive side and one again obtains a single support solution [[7]], for .

In contrast, for , the solution generally consists of two disjoint supports: a blob of negative eigenvalues and a blob of positive eigenvalues (see Fig. 1). Finding this two-support solution thus poses the principal technical challenge for . We have succeeded in finding this two-support solution exactly by iterating the Tricomi formula for single-support solution twice, the details of which will be reported elsewhere [[17]]. Our main result is that for all ,

(11) |

where parametrize the support of the solution where . They are implicitly given as functions of by the equations:

(12) |

and

(13) |

It is easy to check that one recovers the correct single support solutions in the limiting situations and . The density for different is given in Fig. 1. We have also verified this analytical prediction numerically by two different methods (see Fig. 2): direct diagonalization of small matrices and also by Monte Carlo simulation of the Coloumb gas [[17]].

Using this saddle point solution in (7), we get , where the rate function , coming from the normalization . To evaluate the saddle point action we next need to evaluate the single and double integral over in (8). The double integral can be written in terms of a simple integral after multiplying (9) by and then integrating over . This gives

(14) |

where the average is done with the measure .

The Lagrange multipliers and can be obtained from (9) upon setting and

(15) | |||||

(16) |

It turns out that the averages on the right hand side can be simplified by first introducing a pair of functions

(17) |

defined respectively for and . One can show [[17]] that they are essentially the moment generating functions of . The averages in (15) and (16) can then be expressed as simple integrals over . Skipping details [[17]], we obtain the following explicit expression for the rate function for ,

(18) |

Eq. (18) is the principal result of this Letter. It is again easy to check that in the two limits and , one recovers correctly and . For arbitrary , the integrals have to be evaluated numerically. The rate function is plotted for in the top panel of Figure 3. It is symmetric around with a minimum at and grows monotonically from to . To see how behaves near its minimum , we make a perturbation expansion of (18) setting with small. Since for , , we first expand (12) setting . To leading order we get . Inserting this result in (18) followed by straightforward expansion gives (4). Using this expression of in (3), one can then easily compute the variance of

(19) |

which, for , agrees with the asymptotic result in [[3]]. However, the logarithmic growth of the variance is evidently due to the logarithmic singularity in the rate function itself in (4) and the index distribution is strictly not Gaussian near .

We also remark that for it is possible to find an exact formula for the
variance at *finite* [[17]] based on Andrejeff formula and/or
orthogonal polynomials with discontinuous weights
[[18]]. Using this formula we have evaluated
the variance for all finite and found that the leading
growth is precisely in agreement
with the asymptotic result in (19), they
differ only
for subleading terms in (see Fig. 3 (bottom)).

The work presented here can be generalized in several directions. Our method to find explicitly two-support solutions to singular integral equation is quite general and can be applied to other problems. For example, one can compute the distribution of the number of eigenvalues bigger than a fixed value . This is particularly relevant for Wishart matrices that play an important role in multivariate data analysis [[9]]. Our method can also be applied to more exotic ensembles of random matrices that arise in connection with 2-d quantum gravity [[19]]. It would also be interesting to investigate if the logarithmic growth of the variance of the index is universal and holds even for non-Gaussian ensembles.

### References

- M.L. Mehta, Random Matrices (Academic Press, Boston, 1991).
- D.J. Wales, Energy Landscapes: Applications to Clusters, Biomolecules and Glasses (Cambridge University Press, 2004).
- A. Cavagna, J.P. Garrahan and I. Giardina, Phys. Rev. B 61, 3960 (2000).
- M.R. Douglas, JHEP 05, 046 (2003).
- A. Aazami and R. Easther, JCAP03 p. 013 (2006).
- L. Mersini-Houghton, Class. Quant. Grav. 22, 3481 (2005).
- D.S. Dean and S.N. Majumdar, Phys. Rev. Lett. 97, 160201 (2006); Phys. Rev. E 77, 41108 (2008).
- P. Vivo, S.N. Majumdar and O. Bohigas, J. Phys. A: Math. Theor. 40, 4317 (2007).
- S.N. Majumdar and M. Vergassola, Phys. Rev. Lett. 102, 060601 (2009).
- C. Nadal and S.N. Majumdar, Phys. Rev. E 79, 061117 (2009).
- P. Vivo, S.N. Majumdar and O. Bohigas, Phys. Rev. Lett. 101, 216809 (2008).
- A.J. Bray and D.S. Dean, Phys. Rev. Lett. 98, 150201 (2007).
- Y.V. Fyodorov and I. Williams, J. Stat. Phys. 129, 1081 (2007).
- P. Facchi, U. Marzolino, G. Parisi, S. Pascazio and A. Scardicchio, Phys. Rev. Lett. 101, 050502 (2008).
- P. Kazakopoulos, P. Mertikopoulos, A.L. Moustakas and G. Caire, [arXiv:0907.5024] (2009).
- F.G. Tricomi, Integral Equations (Pure Appl. Math V, Interscience, London, 1957).
- details will be published elsewhere.
- Y. Chen and G. Pruessner, J. Phys. A: Math. Gen. 38, L191 (2005).
- P. Di Francesco, P. Ginsparg and J. Zinn-Justin, Phys. Rep. 254, 133 (1995).