Dyson Brownian Motion for General \beta and Potential at the Edge

# Dyson Brownian Motion for General β and Potential at the Edge

Arka Adhikari Jiaoyang Huang
###### Abstract

In this paper, we compare the solutions of Dyson Brownian motion with general and potential and the associated McKean-Vlasov equation near the edge. Under suitable conditions on the initial data and potential , we obtain the optimal rigidity estimates of particle locations near the edge for short time . Our argument uses the method of characteristics along with a careful estimate involving an equation of the edge. With the rigidity estimates as an input, we prove a central limit theorem for mesoscopic statistics near the edge which, as far as we know, have been done for the first time in this paper. Additionally, combining with [30], our rigidity estimates are used to give a proof of the local ergodicity of Dyson Brownian motion for general and potential at the edge, i.e. the distribution of extreme particles converges to Tracy-Widom distribution in short time.

Harvard University

Harvard University

E-mail: jiaoyang@math.harvard.edu

## 1 Introduction

Random Matrix models were originally suggested by Wigner [41, 40] to model the nuclei of heavy atoms. The models he originally studied, the Gaussian orthogonal/unitary ensembles were successful in describing the spacing distribution between energy levels. Wigner conjectured that general random matrices will have this same spacing distribution as long as they are in the same symmetry class.

Later, in 1962 [10], Dyson interpreted the Gaussian orthogonal/unitary ensembles as dynamical limit of the matrix valued Brownian motion, which is given by

 dH(t)=dB(t)−12H(t)dt, (1.1)

where is the Brownian motion on real symmetric/complex Hermitian matrices. It turns out the eigenvalues of the above matrix valued Brownian motion satisfy a system of stochastic differential equations. These equations have been later generalized to stochastic differential equations, called the -Dyson Brownian Motion with potential ,

 dλi(t)=√2βNdBi(t)+N∑i=1dtλi(t)−λj(t)−12V′(λi(t))dt,1⩽i⩽N, (1.2)

where the initial data lies in the closer of the Weyl chamber

 △N:={(x1,x2,⋯,xN):x1

The real symmetric and complex Hermitian matrix valued Brownian motion corresponds to (1.2) with and respectively, and quadratic potential .

Dyson suggested that on times of order one would get equilibrium in the microscopic statistics by evolving a random system stochastically to one of the standard Gaussian matrix models depending on the symmetry class. In fact, one has a dichotomy of three time scales

1. For time one should get the global equilibrium, e.g., the global spectral density should approach that for the corresponding Gaussian ensembles. For Dyson Brownian motion with general and potential , this was studied in [33].

2. On scales of order , one should reach the equilibrium after running Dyson Brownian motion for time . Namely, mesoscopic quantities of the form for appropriate test functions should be universal.

3. For the microscopic scale, i.e. the scale of order , and , the microscopic eigenvalue distribution should be the same as that of the determinantal point process with the Sine kernel, , provided one runs Dyson Brownian motion for .

The understanding of the local ergodicity of Dyson Brownian motion, i.e. the fact that the local statistics of Dyson Brownian motion reaches an equilibrium in short time, plays an important role in the proof of Wigner’s original universality conjecture by Erdős, Schlein and Yau [21]. Their methods to prove universality for matrix models first involve proving a rigidity estimates of the eigenvalues, i.e. the eigenvalues are close to their classical locations, up to an optimal scale. This is the initial data for a Dyson Brownian motion input which interpolates the initial model to the Gaussian orthogonal/unitary ensembles. The second step is to show that Dyson Brownian motion reaches an equilibrium in a short time for local statistics. Since Dyson Brownian motion needs only to be run in short time, then the initial and final models can be compared. The last step compares the original random matrices with ones with small Gaussian component. For a good review about the general framework regarding this type of analysis, one can read the book by Erdős and Yau [12].

Of the three steps described in the previous section, the step that is the least robust is the proof of the rigidity estimates. This part is very model particular and, depending on the model in question, requires significant effort in trying to prove optimal estimates. Even in the most basic case of Wigner matrices, the concentration of the trace of the resolvent would require very precise cancellation in the form of what is known as the fluctuating averaging lemma [24]. The proof of this type of cancellation uses very delicate combinatorial expansions involving iterated applications of the resolvent identity. For models even more complicated than the Wigner matrices, such lemmas are an intricate effort. A more general method that does not involve delving into the particulars of a model would be desirable; then we would be able to treat a general class of models uniformly.

A dynamical approach to proving rigidity using Dyson Brownian motion allows us to avoid technical issues relating to the particulars of a matrix model. This would allow us to avoid complicated combinatorial analysis and, in addition, allow us to treat models that do not occur naturally with an associated matrix structure, such as the -ensembles. In an earlier paper by B.Landon and the second author [27], they proved the rigidity estimates for the bulk eigenvalues of Dyson Brownian motion. As a result, the optimal rigidity estimates are purely a consequence of the dynamics. The proof of rigidity is based on a comparison between the empirical eigenvalue process of Dyson Brownian motion and the deterministic measure valued process obtained as the solution of the associated McKean-Vlasov equation by using the method of characteristics. The difference in the corresponding Stieltjes transforms can be analyzed by estimates of Gronwall type.

There are substantial difficulties involved in performing a comparison between the solutions of Dyson Brownian motion and the associated McKean-Vlasov equation near the edge. In the bulk, one can derive sufficiently strong estimates by looking at the distance from the characteristics to the real line; this is thanks to the fact that we have strong bounds on the imaginary part of the Stieltjes transform in the bulk. Near the spectral edge, the power of these bounds decay and become too weak to prove optimal rigidity. In our case, we have to establish an equation determining the relative movement of our characteristics to the edge. The estimates of the Stieltjes transform of the empirical particle density near the edge heavily depend on this relative movement. The equation for the edge allows us to explicitly understand how the eigenvalues move from their initial position to the optimal region.

In addition to the rigidity estimates, another main innovation in this paper is the determination of the correlation kernel for the Stieltjes transform of the empirical particle density of Dyson Brownian motion at mesoscopic scales near the edge. It allows us to prove a mesoscopic central limit theorem near the edge. The mesoscopic central limit at the bulk for Wigner matrices was proven in [6, 7, 35, 25], for -ensemble in [35] and for Dyson Brownian motion in [27, 9, 28]. As far as we know, the mesoscopic central limit theorem near the edge is new even for the Wigner matrices and -ensembles. The dynamical method provides a unified approach to see how it emerges naturally, and allows us to see the universality of this correlation kernel.

Combining with [30], our rigidity estimates are used to give a proof of the local ergodicity of Dyson Brownian motion for general and potential at the edge, i.e. the distribution of extreme particles converges to Tracy-Widom distribution in short time. Our proof uses only the dynamics, and is independent of the matrix models. This is in alignment with Dyson’s original vision on the nature of universality of the local eigenvalue statistics. A consequence of our edge universality result is a purely dynamical proof of the edge universality for -ensembles with general potential.

### 1.1 Related Results in the Literature

Results for the McKean-Vlasov equation were first established by Chan[8] and Rogers-Shi[36], who showed the existence of a solution for quadratic potentials . The McKean-Vlasov equation for general potentials was studied in detail in the works of Li, Li and Xie. In the works [33] and [34], it was shown that under very weak conditions on the solution of the McKean-Vlasov equation will converge to an equilibrium distribution, that is dependent on the parameters and at times . The authors were able to interpret the time evolution under the McKean-Vlasov equation as a manner of gradient descent on the space of measures. This gives the complete description of Dyson Brownian motion at the macroscopic scale.

For the microscopic scale, Dyson Brownian motion was studied in detail by Erdos, Yau and various coauthors across a multitude of papers [14, 15, 16, 17, 18, 19, 20, 22, 4, 24]. Specifically, from these works, it is known that for the classical ensembles and quadratic potential, with the initial data given by the eigenvalues of a Wigner matrix, it is known that after the local statistics of the particles are the same as those of the corresponding classical Gaussian ensembles. After this, the two works [13, 29] established gap universality for the classical Dyson Brownian motion with general initial data, by using estimates established in a discrete DiGeorgi-Nash-Moser theorem in [22]. Fixed Energy Universality required a sophisticated homogenization argument that allowed the comparison between the discrete equation and a continuous version; the results have been established in recent papers [3, 28]. An extension of this interpolation at the edge was shown in [30]. These results were a key step in the proof of edge and bulk universality in various models. An alternative approach to Universality was shown, independently, in the works of Tao and Vu [38].

In the three-step strategy for proving universality, as developed by Erdős, Yau and their collaborators, the first step is to derive a local law of eigenvalue density. This is a very technical and highly model dependent procedure. In the case of Wigner matrices, the proofs have been established in [18, 19, 23, 24, 39]. Local laws can be established for other matrix models in the bulk, such as the case of sparse random matrices [14] and in deformed Wigner matrices [31]. Establishing local laws near the edge are generally more involved; the case of correlated matrices was shown in [1, 2, 11]. Local laws for -ensembles near the edge were considered in [5] with the discrete analogue in [26]; the Wigner matrices were considered in [32].

## 2 Background

In this section, we will provide basic definitions and assumptions in our study of the -Dyson Brownian motion and the associated McKean Vlasov equation. This section culminates in the analysis of solutions of the McKean-Vlasov equation via the method of characteristics and the proof of various important inequalities on the growth of the solution in time and the behavior of its characteristics . These bounds provide the basis for our later estimates on the edge rigidity of the -Dyson Brownian motion near the edge. To make the argument clean, we make the following assumption on the potential . We believe the main results in this paper hold for in as in [27].

###### Assumption 2.1.

We assume that the potential is an analytic function.

We denote as the space of probability measures on and equip this space with the weak topology. We fix a sufficiently small time and denote by the space of continuous processes on taking values in . It follows from [33] that for all and initial data , there exists a strong solution to the stochastic differential equation (1.2).

We recall the following estimates on the locations of extreme particles of -Dyson Brownian motion from [27, Proposition 2.5].

###### Proposition 2.2.

Suppose satisfies Assumption 2.1. Let , and . Let be a constant such that the initial data . Then for a sufficiently small time , there exists a finite constant , such that for any , the unique strong solution of (1.2) satisfies:

 P(max{|λ1(t)|,|λN(t)|}⩾b)⩽e−N. (2.1)

Given a probability measure , we define the measure-valued process as the solution of the following equation

 ∂t^mt(z)=∂z^mt(z)(^mt(z)+V′(z)2)+^mt(z)V′′(z)2+∫Rg(z,x)d^ρt(x), (2.2)

where

 g(z,x):=V′(x)−V′(z)−(x−z)V′′(z)2(x−z)2,g(x,x):=V′′′(x)4. (2.3)

It is easy to see that for any fixed , is analytic in as a function of ; for any fixed , is analytic in as a function of .

We analyze (2.2) by the method of characteristics. Let

 ∂tzt(u)=−^mt(zt(u))−V′(zt(u))2,z0=u∈C+, (2.4)

If the context is clear, we omit the parameter , i.e., we simply write instead of . Plugging (2.4) into (2.2), and applying the chain rule we obtain

 ∂t^mt(zt(u))=^mt(zt(u))V′′(zt(u))2+∫Rg(zt(u),x)dρt(x). (2.5)

The behaviors of and are governed by the system of equations (2.4) and (2.5).

As a consequence of Proposition 2.2, if the probability measure is supported on , then there exists a finite constant , such that are supported on for . We fix a large constant . If and , then

 |∂tzt|⩽1+12supz∈B2r(0)|V′(z)|. (2.6)

Therefore, for any , we have for any , provided is small enough.

We also frequently use the following estimates studying the imaginary part of characteristics. They were proven in [27, Proposition 2.7].

###### Proposition 2.3.

Suppose satisfies assumption 2.1. Let , and . Fix large constant Then for a sufficiently small time , there exist constants depending on potential and , such that the following holds. Fix any with and ,

 e−C(t−s)Im[zt]⩽Im[zs]e−(t−s)CIm[^mt(zt)]⩽Im[^ms(zs)]⩽e(t−s)CIm[^ms(zs)]e−C(t−s)(Im[zt]+(t−s)Im[^mt(zt)])⩽Im[zs]⩽eC(t−s)(Im[zt]+(t−s)Im[^mt(zt)])e−C(t−s)(Im[zt]Im[^mt(zt)]+(t−s)Im[^mt(zt)]2)⩽Im[z0]Im[^m0(z0)] (2.7)

## 3 Square Root Behavior Measures

In the earlier work [27], the bulk rigidity of -Dyson Brownian motion was proved via a comparison of the empirical density with , the solution of the associated McKean-Vlasov equation with as initial data. This is not a good choice for studying the spectral edge. In most applications, we take to be the empirical eigenvalue density of a random matrix, which itself is random. As a consequence, the solution of the associated McKean-Vlasov equation with , is again a random measure. Even if we have a good control on the difference between and , it does not tell us the locations of the extreme eigenvalues, unless we have a very precise control of . Unfortunately the edge universality asks exactly the locations of the extreme eigenvalues.

In order to circumvent this problem, we comparison the empirical density with , the solution of the associated McKean-Vlasov equation with initial data close to . In most applications, we take to be either the semi-circle distribution,

 ρsc(x)=√[4−x2]+2π, (3.1)

or the Kesten-McKay distribution,

 ρd=(1+1d−1−x2d)−1√[4−x2]+2π. (3.2)

As one can see from the expressions of semi-circle distribution 3.1, and Kesten-McKay distribution 3.2, they both have square root behavior at the spectral edge. It is believed that square root behavior is necessary for edge universality. For the remainder of the paper, we assume that the initial measure has square root behavior in the following sense.

###### Definition 3.1.

We say a probability measure has square root behavior at if the measure is supported in and, in addition, there is some neighborhood around such that its Stieltjes transform satisfies

 ^m0(z)=A0(z)+√B0(z), (3.3)

with and analytic in and with a simple root of .

###### Remark 3.2.

If has square root behavior at right edge , for any , with , it is easy to check that

 Im[^m0(z)]≍Im[√z−E0]≍{√|κ|+η,κ⩽0η/√|κ|+η,κ⩾0. (3.4)

The Stieltjes transforms of semi-circle distribution and Kesten-McKay distribution are given by

 msc(z)=∫Rρsc(x)dxx−z=−z2+√z2−42md(z)=∫Rρd(x)dxx−z=(1+1d−1−z2d)−1(−(d−2)z2d+√z2−42). (3.5)

They both have square root behavior in the sense of Definition 3.1. More generally, we have the following proposition.

###### Proposition 3.3.

If has an analytic density in a small neighborhood of , given by

 ^ρ0(x)=S(x)√[E0−x]+,E0−ε⩽x⩽E0+ε, (3.6)

where is analytic on , then has square root behavior in the sense of Definition 3.1.

One important consequence of our definition of square root behavior measure is the following proposition which shows us how the square root behavior is a property that propagates in time when solving the McKean-Vlasov equation. We postpone its proof to the Appendix A.

###### Proposition 3.4.

Let be a probability measure which has square root behavior at the right edge in the sense of Definition 3.1. Fix a sufficiently small time , and let the solution of the McKean-Vlasov equation (2.2) with initial data . Then the measures have square root behavior at the right edge , for any . The edge satisfies,

 ∂tEt=−^mt(Et)−V′(Et)2, (3.7)

and it is Lipschitz in time, for . As a consequence, has a density in the neighborhood of , given by

 ^ρt(x)=(1+o(1))Ct√[Et−x]+,Et−ε⩽x⩽Et+ε. (3.8)

The constants are Lipschitz in time, , for .

The following proposition studies the growth of the distance of the real part of the characteristics to the edge . This is the main proposition we use to give strong bounds on close to the edge and it serves as one of our fundamental inequalities in next section. The square root behavior of the measures was used essentially to describe an equation for the growth of and to provide estimates for the Stieltjes transform.

###### Proposition 3.5.

Let be a probability measure having square root behavior in the sense of Definition 3.1. Fix small and a sufficiently small time , and let the solution of the McKean-Vlasov equation (2.2) with initial data . If at some , the characteristics , with , then there exists an universal constant such that for any ,

 √κs⩾√κt+C(t−s). (3.9)
###### Proof.

We denote , for . Thanks to (2.4) and Proposition (3.4), if , then there exists some universal constant such that . If we take sufficiently small, we will have that for any . In the following we prove that if , then for some universal constant . Then the claim (3.9) follows by integrating from to , and we have for all .

We recall the differential equation (3.7) for the edge

 ∂sEs=−^ms(Es)−V′(Es)2. (3.10)

We take real part of (2.4), and take difference with (3.10)

 ∂sκs=−(Re[^ms(zs(u))]−^ms(Es))−Re[V′(zs(u))]−V′(Es)2. (3.11)

For the first term in (3.11)

 =Re[^ms(zs(u))]−^ms(Es)=(Re[^ms(zs(u))]−^ms(Re[zs(u)]))+(^ms(Re[zs(u)])−^ms(Es))=η2s∫d^ρs(x)(Re[zs(u)]−x)((Re[zs(u)]−x)2+η2s)+κs∫d^ρs(x)(Re[zs(u)]−x)(Es−x). (3.12)

The purpose of the above decomposition is to write out the expressions for the Stieltjes transform in a way that we can easily compare the corresponding integral expressions. From the integral expression, we can compute the leading order behavior in terms of and in order to get an equation. Thanks to Proposition 3.4, has square root behavior. From Remark 3.2, we have on a neighborhood of , and we can estimate (3.12)

 Re[^ms(zs(u))]−^ms(Es)⩾C(η2s(κs+ηs)3/2+√κs). (3.13)

where is some universal constant. For the second term in (3.11)

 Re[V′(zs(u))]−V′(Es)=(Re[V′(zs(u))]−V′(Re[zs(u)]))+(V′(Re[zs(u)])−V′(Es))⩾−C(η2s+κs). (3.14)

Uniformly for , we have . By taking sufficiently small, it follows by combining (3.12) and (3.14), there exists some constant such that

 ∂sκs⩽−C√κs, (3.15)

and the claim (3.9) follows.

## 4 Rigidity Estimates

We prove our edge rigidity estimates in this section. Roughly speaking if the initial data is regular on the scale , then the optimal rigidity holds for time , provided is large enough. We fix a smaller number and the control parameter .

###### Assumption 4.1.

Let , and for some constant . We assume that the initial empirical density

 ρ0=1NN∑i=1δλi(0) (4.1)

satisfies

1. .

2. There exists a measure , with square root behavior as defined in Definition 3.1 such that we have the estimate

 |m0(z)−^m0(z)|⩽MNη,z∈Din0, (4.2)

and

 |m0(z)−^m0(z)|⩽1M1Nη,z∈Dout0, (4.3)

and

 |m0(z)−^m0(z)|⩽MN,z∈Dfar0, (4.4)

where and are the Stieltjes transform of and respectively, and the domains , and are given by

 Din0:={z∈C+∩BE0(r):Im[z]Im[^m0(z)]⩾(η∗)3/2},Dout0:={z∈C+∩BE0(r):Re[z]⩾E0+η∗},Dfar0:={z∈C+:r−1⩽dist(z,supp^ρ0)⩽r+1}, (4.5)

where is a large constant as defined in (2.6), and is the radius disk centered at .

###### Remark 4.2.

We remark here that it is essential to control the difference of and far away from the support of , i.e. on . The effect of the potential is to cause a long range interaction that will cause two solutions to diverge if we have no control in this region. To see this effect, one should notice that if we were to compare the linear statistics of two measures, the difference will change by no more than a constant factor.

We define the following function

 f(t)=(max{√η∗−ct,MN−1/3})2, (4.6)

where small constant will be chosen later. It holds that , and it has similar behavior as the real part of characteristics as in (3.9), i.e it satisfies for any . We use this function in interpolating from weak eigenvalue rigidity at the edge at time to better eigenvalue rigidity at time .

###### Theorem 4.3.

Suppose satisfies Assumption 2.1 and the initial data satisfies Assumption 4.1. For time , with high probability under the Dyson Brownian motion (1.2), we have for .

We define the spectral domains . Roughly speaking the information of Stieltjes transform on reflects the regularity of the empirical particle density on the scale .

###### Definition 4.4.

For any , we define the region , where

 Dint:={z∈C+∩BEt(r−t/c):Im[z]Im[^mt(z)]⩾f(t)3/2},Doutt:={z∈C+∩BEt(r−t/c):Re[z]⩾Et+f(t)},Dfart:={z∈C+:r−1+t/c⩽dist(z,supp^ρt)⩽r+1−t/c}, (4.7)

For any , the spectral domain is a subsect of the domain under the characteristic flow.

###### Proposition 4.5.

Suppose satisfies Assumption 2.1 and the initial data satisfies Assumption 4.1. For any , we have

 zs∘z−1t(Dt)⊂Ds, (4.8)

provided is large enough.

###### Proof.

By integrating (2.6), we get that for any . It follows that , and , provided that is small enough.

For any , let for . By the definition of in (4.6), we have

 √f(s)⩽c(t−s)+√f(t)=c(t−s)+√κt⩽√κs−c(t−s), (4.9)

provided that , where is the constant in (3.9). We can rearrange (4.9) to get

 κs⩾f(s)+c√κs(t−s). (4.10)

As a consequence, we have and .

Thanks to Proposition 2.3, if and , we have

 Im[z0]Im[^m0(z0)]⩾e−tC(Im[zt]Im[^mt(zt)]+tIm[^mt(zt)]2)⩾e−tC(f(t)3/2+tIm[^mt(zt)]2)⩾(η∗)3/2, (4.11)

provided or . For , in fact we have . We prove it by contradiction. Say if there exists some , such that . By our assumption that has square root behavior, we have . Thanks to Proposition 2.3, we have , and thus

 Im[^mt(zt)]⩾e−CtIm[^m0(z0)]⩾e−CtCIm[z0]√η∗+Im[z0]⩾e−2CtCtIm[^mt(zt)]√η∗+e−tCtIm[^mt(zt)], (4.12)

which is impossible if , and is sufficiently small. This finishes the proof of Proposition 4.5

The following proposition gives optimal bulk estimate of , i.e. on the spectral domain .

###### Proposition 4.6.

Suppose satisfies the Assumption 2.1. Fix time . For any initial data satisfies Assumption 4.1, uniformly for any , and there exists a set that occurs with overwhelming probability on which the following estimate holds: if

 |mt(w)−^mt(w)|⩽MNIm[w], (4.13)

if

 |mt(w)−^mt(w)|⩽MN. (4.14)

The proof of proposition 4.6 follows the same argument as [27, Theorem 3.1], with two modifications. Firstly, when we use Gronwall inequality, we need to take care of the error from the initial data, i.e. . This is where our Assumption 4.1 comes into play. Secondly, we estimate the error term involving the potential using a contour integral.

###### Proof of Proposition 4.6.

By Ito’s formula, satisfies the stochastic differential equation

 dms(z)=−√2βN3N∑i=1dBi(s)(λi(s)−z)2+ms(z)∂zms(z)ds+12NN∑i=1V′(λi(s))(λi(s)−z)2ds+2−ββN2N∑i=1ds(λi(s)−z)3. (4.15)

We can rewrite (4.15) as

 dms(z)=−√2βN3N∑i=1dBi(s)(λi(s)−z)2+∂zms(z)(ms(z)+V′(z)2)ds+ms(z)∂zV′(z)2ds+∫Rg(z,x)dρs(x)ds+2−ββN2N∑i=1ds(λi(s)−z)3, (4.16)

is defined in (2.3). Plugging (2.4) into (4.16), and by the chain rule, we have

 dms(zs)=−√2βN3N∑i=1dBi(s)(λi(s)−zs)2+∂zms(zs)(ms(zs)−^ms(zs))ds+ms(zs)V′′(zs)2ds+∫Rg(zs,x)dρs(x)ds+2−ββN2N∑i=1ds(λi(s)−zs)3. (4.17)

It follows by taking the difference of (2.5) and (4.17) that,

 d(ms(zs)−^ms(zs))=−√2βN3N∑i=1dBi(s)(λi(s)−zs)2+(ms(zs)−^ms(zs))∂z(ms(zs)+V′(zs)2)ds+∫Rg(zs,x)(dρs(x)−d^ρs(x))ds+2−ββN2N∑i=1ds(λi(s)−zs)3. (4.18)

We can integrate both sides of (4.18) from to and obtain

 mt(zt)−^mt(zt)=∫t0(E1(s)ds+dE2(s))+(m0(z0)−^m0(z0)), (4.19)

where the error terms are

 E1(s)= (ms(zs)−^ms(zs))∂z(ms(zs)+V′(zs)2)+∫Rg(zs,x)(dρ