A simple derivation of the Tracy-Widom distribution of the maximal eigenvalue of a Gaussian unitary random matrix
In this paper, we first briefly review some recent results on the distribution of the maximal eigenvalue of a random matrix drawn from Gaussian ensembles. Next we focus on the Gaussian Unitary Ensemble (GUE) and by suitably adapting a method of orthogonal polynomials developed by Gross and Matytsin in the context of Yang-Mills theory in two dimensions, we provide a rather simple derivation of the Tracy-Widom law for GUE. Our derivation is based on the elementary asymptotic scaling analysis of a pair of coupled nonlinear recursion relations. As an added bonus, this method also allows us to compute the precise subleading terms describing the right large deviation tail of the maximal eigenvalue distribution. In the Yang-Mills language, these subleading terms correspond to non-perturbative (in expansion) corrections to the two-dimensional partition function in the so called ‘weak’ coupling regime.
Quite a long time ago, Wigner  introduced random matrices in the context of nuclear physics. He suggested that the highly-excited energy levels of complex nuclei can locally be well represented by the eigenvalues of a large random matrix. A big nucleus is a rather complex system composed of many strongly interacting quantum particles and it is practically impossible to describe its spectral properties via first principle calculations. The idea of Wigner was to model the spectral properties of the complex Hamiltonian of such a big nucleus by those of a large random matrix preserving the same symmetry. This was a very successful approach in nuclear physics. Since then, the random matrix theory (RMT) has gone beyond nuclear physics and has found a wide number of applications in various fields of physics and mathematics including quantum chaos, disordered systems, string theory and even number theory . A case of special interest is the one of Gaussian random matrices (originally introduced by Wigner himself) where the entries of the matrix are Gaussian random variables.
Depending on the symmetry of the problem, Dyson distinguished three
for the matrix :
the Gaussian Orthogonal Ensemble (GOE) : is real symmetric.
the Gaussian Unitary Ensemble (GUE) : is complex Hermitian.
the Gaussian Symplectic Ensemble (GSE) : is quaternionic Hermitian.
Let us write the adjoint of , i.e. the transpose of for the GOE, the complex conjugate transpose for the GUE and the quaternionic conjugate transpose for the GSE. A Gaussian random matrix is a self-adjoint matrix , i.e. distributed according to the law
where, for convenience, we have chosen the prefactor of the
to be for the GOE, for the GUE and for the GSE.
For instance, for the GUE we have
This means that is a complex Hermitian matrix with entries
and for that are independent (real) random variables
distributed according to the same centered Gaussian law with variance
and the are (real) independent Gaussian variables with mean and
variance . In case of GSE, there are eigenvalues, each
of them two-fold degenerate and Tr
in (1) for is defined
so that only one of the two fold degenerate eigenvalues in is counted.
where is a normalization constant such that (it depends on ) and the power of the Vandermonde term is called the Dyson index depending on the ensemble (resp. GOE, GUE or GSE). Note that we have chosen the prefactor of term in (1) to be the same as the Dyson index just for convenience. This prefactor is not very important as it can be absorbed by rescaling the matrix entries by a constant factor. In contrast, the value of the Dyson index , or , characterizing the power of the Vandermonde term, plays a crucial role. The normalization constant can be computed using Selberg’s integral : .
Because of the presence of the Vandermonde determinant in Eq. (2), the eigenvalues are strongly correlated random variables, they repel each other. In this paper, our focus is on the statistical properties of the extreme (maximal) eigenvalue . Had the Vandermonde term been not there in the joint distribution (2), the joint distribution would factorize and the eigenvalues would thus be completely independent random variables, each with a Gaussian distribution. For such independent and identically distributed random variables , the extreme value statistics is well understood  and the distribution of the maximum, properly shifted and scaled, belongs to one of the three universality classes Gumbel, Frechet or Weibull (for large ) depending on the tail of the distribution of individual ’s. However, in the case of random matrix theory, the eigenvalues ’s are strongly correlated variables. For strongly correlated random variables there is no general theory for the distribution of the maximum. In case of Gaussian random matrices, where the joint distribution (2) is explicitly known, much progress has been made in understanding the distribution of following the seminal work by Tracy and Widom [6, 7]. This then provides a very useful solvable model for the extreme value distribution in a strongly correlated system and hence is of special interest.
Let us first summarize some known properties of the random variable . Its average value can be easily obtained from the right edge of the well known Wigner semi-circle describing the average density of eigenvalues. For a Gaussian random matrix of large size , the average density of eigenvalues (normalized to unity) has a semi-circular shape on a finite support called the Wigner semi-circle :
The quantity represents the average fraction of eigenvalues that lie within the small interval . Therefore, Eq. (3) means that the eigenvalues of a Gaussian random matrix lie on average within the finite interval . Note also that one can rewrite, using the joint distribution in (2)
Hence the average density of states
can also be interpreted as the marginal
distribution of one of the eigenvalues (say the first one).
Thus, the marginal distribution also
has the shape of a semi-circle.
Figure 1 shows the average density
It then follows that the average value of the maximal eigenvalue is given for large by the upper bound of the density support:
However, fluctuates around this average value from one realization to another and has a distribution around its mean value (see Fig. 1 with ). What is the full probability distribution of ? From the joint distribution of eigenvalues in Eq. (2), it is easy to write down formally the cumulative distribution function (cdf) of as a multiple integral
which can be interpreted as a partition function of a Coulomb gas in presence of a hard wall at the location (see the discussion in Section 2). The question is how does behave for large ? It turns out that the fluctuations of around its mean have two scales for large . While typical fluctuations scale as , large fluctuations scale as and their probability distributions are described by different functional forms (see Fig. 1 with ).
Typical fluctuations: From an asymptotic analysis of the mutilple integral in Eq. (6), Forrester , followed Tracy and Widom [6, 7] deduced that for large , small and typical fluctuations of the maximal eigenvalue around its mean value are of order and can be written as
where (for GOE and GUE) and (GSE) and is a random variable characterizing the typical fluctuations. Tracy and Widom [6, 7] proved that for large , the distribution of is independent of : . The function depends explicitly on and is called the Tracy-Widom distribution. For example, for [6, 7],
where satisfies the special case of of the Painlevé II equation
For , the solution only requires the right tail boundary condition for its unique specification: as , where is the Airy function that satisfies the differential equation and vanishes as, as . This solution of the special case of the Painlevé-II equation is called the Hastings-McLeod solution . For and , one has [6, 7]
Note that is the cumulative probability of the scaled random variable and hence it approaches to as and vanishes to as . The corresponding probability density function (pdf) vanishes as in an asymmetric fashion
Over the last decade or so, the Tracy-Widom distribution has appeared in a wide variety of problems ranging from statistical physics and probability theory all the way to growth models and biological sequence matching problems (for reviews see [10, 11, 12, 13]). These include the longest increasing subsequence or the Ulam problem [14, 15, 10], a wide variety of (1+1)-dimensional growth models [16, 17, 18, 19, 20], directed polymer in random medium  and the continuum Kardar-Parisi-Zhang equation [22, 23, 24, 25], Bernoulli matching problem between two random sequences , nonintersecting Brownian motions (see e.g. [27, 28] and references therein). This distribution has also been measured in a variety of recent experiments, e.g., in the height distribution of fronts generated in paper burning experiment , in turbulent liquid crystals  and more recently in coupled fiber laser systems .
Large deviations: Tracy-Widom distribution describes the probability of typical fluctuations of around its mean (on a scale of ), but not the atypical large fluctuations, i.e., fluctuations of order around the mean value . Questions regarding such large/rare fluctuations do arise in various contexts [32, 33, 34] and have recently been computed [32, 33, 34, 35] to dominant order for large . As a summary, the probability density of , , is given for large by:
where is the Tracy-Widom distribution and where and are respectively the left and right large deviation functions describing the tails of the distribution of . The rate function was explicitly computed in [32, 33], while was computed in , both by simple physical methods exploiting the Coulomb gas analogy. A more complicated, albeit mathematically rigorous, derivation of in the context of spin glass models can be found in . These rate functions read
Note that in ref. , the function was expressed in terms of a complicated hypergeometric function, which however can be reduced to a simple algebraic function as presented above in Eq. (1). Note also that while depends explicitly on , the rate functions and are independent of . These rate functions only give the dominant order for large in the exponential. In other words, the precise meaning of is that for large : for and for . When approaches (from below or above) it is easy to see that the rate functions vanish respectively as
Note that the physics of the left tail [32, 33] is very different from the physics of the right tail . In the former case, the semi-circular charge density of the Coulomb gas is pushed by the hard wall () leading to a reorganization of all the charges that gives rise to an energy difference of [32, 33]. In contrast, for the right tail , the dominant fluctuations are caused by pulling a single charge away (to the right) from the Wigner sea leading to an energy difference of .
The different behaviour of the probability distribution for and leads to a ‘phase transition’ strictly in the limit at the critical point in the following sense. Indeed, if one scales by and takes the limit, one obtains
Note that since can be interpreted as a partition function of a Coulomb gas (see Eq. (6)), its logarithm has the interpretation of a free energy. Since as from below, the -rd derivative of the free energy is discontinuous at the critical point . Hence, this can be interpreted as a third order phase transition.
However, for finite but large , it follows from (14) that the behavior to the left of smoothly crosses over to the behaviour to the right as one varies through its critical point and the Tracy-Widom distribution in (14) around the critical point is precisely this crossover function. Indeed, if one zooms in close to the mean value by setting (for ) in the rate functions and in (14), one expects to recover, by taking large limit, respectively the left and the right tail of the Tracy-Widom distribution. With this scaling, and using (17), one finds and thus , which indeed matches the dominant order in the far right tail of the Tracy-Widom distribution for in (13). Similarly for the left tail (), using (16), one finds , thus which matches the left tail of the Tracy-Widom distribution in (12).
More recently, higher order corrections for large have been computed for the left tail of the distribution  using methods developed in the context of matrix models. Note that in  a different notation for was used: (GOE), (GUE) and (GSE). To avoid confusion, we present below the results in terms of the standard Dyson index .
and (see Eq. (4-35) in Ref. )
where is a complicated function of . For integer, it reduces to 
For instance, for the GUE (), we find . For and , the expression in Eq. (20) matches the left asymptotics of the Tracy-Widom distribution, i.e. the asymptotic behaviour of for , see ref. . However, for the right tail of the distribution of , the corrections to dominant order for large are, to our knowledge, not known until now. In fact, one of the results of this paper is to compute these right tail corrections for the GUE (). Both left and right large deviations are plotted in Fig. 2 for the GUE. The left tail is described by in Eq. (20), the right tail is described by our result given in Eq. (26).
Another result of this paper concerns a simpler and pedestrian derivation of the Tracy-Widom distribution for the GUE case. The original derivation of the Tracy-Widom law for the distribution of typical fluctuations of [6, 7] is somewhat complex as it requires a rather sophisticated and nontrivial asymptotic analysis of the Fredholm determinant involving Airy Kernel [6, 7]. Since this distribution appears in so many different contexts, it is quite natural to ask if there is any other simpler (more elementary) derivation of the Tracy-Widom distribution. In this paper, we provide such a derivation for the GUE case. Our method is based on a suitable modification of a technique of orthogonal polynomials developed by Gross and Matytsin  in the context of two-dimensional Yang-Mills theory. In fact, the partition function of the continuum two-dimensional pure Yang-Mills theory on a sphere (with gauge group ) can be written (up to a prefactor) as a discrete multiple sum over integers [40, 41]
where is the area of the sphere. In the limit, the free energy , as a function of , undergoes a 3rd order phase transition known as the Douglas-Kazakov transition  at the critical value . For , the system is in the ‘strong’ coupling phase while for , it is in the ‘weak’ coupling phase. For finite but large , there is a crossover between the two phases as one passes through the vicinity of the critical point. In the so called double scaling limit (where , but keeping the product fixed), the singular part of the free energy satisfies a Painlevé II equation . Gross and Matytsin (see also ) used a method based on orthogonal polynomials to analyse the partition sum in the double scaling limit, as well as in the weak coupling regime () where they were able to compute non-perturbative (in expansion) corrections to the free energy. Actually, a similar -rd order phase transition from a weak to strong coupling phase in the limit was originally noticed in the lattice formulation (with Wilson action) of the two dimensional gauge theory [44, 45, 14] and in the vicinity of the transition point the singular part of the free energy was shown to satisfy a Painlevé II equation . Note that similar calculations involving the asymptotic analysis of partition functions using orthogonal polynomials were used extensively in the early 90’s to study the double scaling limit of the so called one-matrix model (for a recent review and developments, see e.g. Ref. ).
In our case, for the distribution of , we need to analyse the asymptotic large behaviour of a multiple indefinite integral in Eq. (6), as opposed to the discrete sum in Eq. (25). However, we show that one can suitably modify the orthogonal polynomial method of Gross and Matytsin to analyse the multiple integral in Eq. (6) in the limit of large . In fact, we find a similar third order phase transition (in the limit) in the largest eigenvalue distribution as a function of at the critical point . The regime of left large deviation of () is similar to the ‘strong’ coupling regime of the Yang-Mills theory, while the right large deviation tail of () is similar to the ‘weak’ coupling regime of the Yang-Mills theory. For finite but large , the crossover function across the critical point that connects the left and right large deviation tails is precisely the Tracy-Widom distribution. Thus the Tracy-Widom distribution corresponds precisely to the double scaling limit of the Yang-Mills theory and one finds the same Painlevé II equation. A similar 3rd order phase transition was also found recently in a model of non-intersecting Brownian motions by establishing an exact correspondence between the reunion probability in the Brownian motion model and the partition function in the -d Yang-Mills theory on a sphere [27, 48].
The advantage of this orthogonal polynomial method to analyse the maximum eigenvalue distribution is twofold: (i) one gets the Tracy-Widom distribution in a simple elementary way (basically one carries out a scaling analysis of a pair of nonlinear recursion relations near the critical point and shows that the scaling function satisfies a Painlevé II differential equation) and (ii) as an added bonus, we also obtain precise subleading corrections to the leading right large deviation tail . The subleading corrections, in the Yang-Mills language, correspond to the non-perturbative corrections in the weak coupling regime as derived by Gross and Matytsin . More precisely we show that
where is given in Eq. (1). Note that only the leading behaviour was known before , but the subleading corrections are, to our knowledge, new results. We also verify that our expression matches the precise right asymptotics of the Tracy-Widom distribution. Figure 2 shows the distribution of for the GUE: close to the mean value it is described by the Tracy-Widom distribution, whereas the tails are described by the large deviations. The right tail (right large deviation) is given by our result in Eq. (26). Together with the subleading terms in the left tail in Eq. (20), our new result in Eq. (26) then provides a rather complete picture of the tail behaviors of the distribution of on both sides of the mean .
The rest of the paper is organized as follows. In Section 2, we start with some general notations and scaling remarks for the GUE. In Section 3, we explain the method of orthogonal polynomials on a semi-infinite interval and derive some basic recursion relations. In Section 4, we compute the right tail of the distribution of (dominant order and corrections for the GUE): it describes atypical large fluctuations of to the right of its mean value. In Section 5, using results of the previous sections and basic scaling remarks, we derive the Tracy-Widom law (with for the GUE) that describes small typical fluctuations close to the mean value.
2 Notations and scaling
In the rest of the paper we focus only on Gaussian random matrices drawn from the GUE (). They are Hermitian random matrices such that where we have introduced an additional parameter for the purpose of certain mathematical manipulations that will be clear later. Setting at the end of the calculation, we will recover the usual GUE. With the additional parameter , the joint distribution of the eigenvalues of is given by (see Eq. (2)):
the normalization constant.
The Vandermonde determinant appears with a power as we consider the GUE ().
This determinant indeed comes from a Jacobian for the change of variables from the entries of the matrix to its eigenvalues.
The power is related to the number of real degrees of freedom of an element of the matrix, which is for
complex entries, i.e., for GUE
(it is for real entries GOE and for quaternion entries GSE). As we will see later,
this power is crucial
for the method of orthogonal polynomials to work.
The technique of Gross and Matytsin  that we adapt here also works only for the
Given the joint distribution of eigenvalues in Eq. (27), it is easy to write down the cumulative distribution of the maximal eigenvalue
where the partition function is given by the multiple indefinite integral
Thus and always appear in a single scaling combination .
We will henceforth focus on the large limit. For fixed , one can easily figure out from the joint pdf in Eq. (27) how a typical eigenvalue scales with for large . Let us rewrite the joint distribution of eigenvalues in Eq. (27) as
which can then be interpreted as a Boltzmann weight , with effective energy . The eigenvalues can thus be seen as the positions of charges of a 2D Coulomb gas (but restricted to be on the real line) which repel each other via a logarithmic Coulomb potential (coming from the Vandermonde determinant in Eq. (27)) . In addition, the charges are subjected to an external confining parabolic potential. For large , the first term of the energy is of order , whereas the second is of order because of the double sum. Balancing the two terms gives the scaling of a typical eigenvalue for large : . For large , the eigenvalues are close to each other and they can be described by a continuous charge density (normalized to unity) . The average density of states for large can be obtained by evaluating the full partition function (the denominator in Eq. (28)) via a saddle point method. The saddle point density is the density that minimizes the effective energy (see the book of Mehta ) (in its continuous version). This gives the well-known semi-circle law (which is exactly the same as in Eq. (3) for ):
The density is plotted in figure 1.
The average value of is again given for large by the upper bound of the density support (see Fig. 1):
For , this evidently reduces to the usual expression for . The typical scaling for large is thus . Hence, we will use , where is of order one.
3 Orthogonal polynomials
In this section, we introduce the method of orthogonal polynomials to evaluate the partition function in Eq. (29). As mentioned in the introduction, to evaluate this multiple indefinite integral we will adapt the method developed by Gross and Matytsin  to enumerate the partition sum (25) in the two-dimensional Yang-Mills theory. Orthogonal polynomials are very useful to handle the square Vandermonde determinant in the distribution of the eigenvalues in Eq. (27). A Vandermonde determinant can indeed be written as where is any polynomial of degree with leading coefficient one. The idea is to choose well these polynomials in order to simplify the computation of the integral.
We define an operation on pairs of polynomials as follows:
We consider a family of orthogonal polynomials with respect to the operation defined above, i.e. with weight on the interval . Without any loss of generalization, we define the polynomial of degree such that the coefficient of term is always fixed to be , i.e., . These polynomials satisfy the orthogonality property: for all . We also write . Thus,
Note that and are implicitly functions of and , i.e. and . In particular, we can easily compute by hand the first few ’s for fixed and , but the expressions become rather complex as increases and it is hard to find a closed form expression for for every (except for the limiting case ):
Thus we get for instance
and , etc. In the limit , we recover the Hermite polynomials: , , and , , .
3.1 Partition function
The partition function in Eq. (29) can be written as a function of the ’s. By combination of rows, the Vandermonde determinant in the joint distribution of the eigenvalues can indeed be written
Then, the partition function can be expressed as
where in going from the second to the third line we have used the Cauchy-Binet formula . Note that this step works only for . Therefore the partition function reduces to:
Thus the integral is now expressed as a product of the coefficients . The goal of next subsection is to find recursion relations for the ’s in order to compute them and subsequently analyse their product in Eq. (36) in the large limit.
3.2 Recursion relations
In general, for orthogonal polynomials (with any reasonable weight function), one can write a recursion relation of the form:
where and are real coefficients.
This relation comes from the fact that and that
for any polynomial
of degree strictly smaller than .
The coefficients and are functions of and , i.e.
Let us first demonstrate that the coefficients and can be expressed in terms of ’s.
From Eq. (37), we get:
On the other hand, we have
Therefore , thus .
From Eq. (37) again, we also get: . By definition we have