Convergence rates of the ESM of unitary BM

Convergence rates of the empirical spectral measure of unitary Brownian motion

Abstract.

Let be a standard Brownian motion on . For fixed and , we give explicit bounds on the -Wasserstein distance of the empirical spectral measure of to both the ensemble-averaged spectral measure and to the large- limiting measure identified by Biane. The proofs use tools developed by the first author to study convergence rates of the classical random matrix ensembles, as well as recent estimates for the convergence of the moments of the ensemble-average spectral distribution.

Key words and phrases:
Unitary Brownian motion, empirical spectral measure, heat kernel measure,log Sobolev inequality
2010 Mathematics Subject Classification:
60B20, 58J65
1 Supported in part by NSF DMS 1612589.
2 Supported in part by NSF DMS 1255574.

1. Introduction

This paper studies the convergence of the empirical spectral measure of Brownian motion on the unitary group to its large limit. Brownian motion on large unitary groups has generated significant interest in recent years, due in part to its relationships with two-dimensional Yang-Mills theory and with the object from free probability theory called free unitary Brownian motion. As is natural in the context of random matrices, there has been particular focus on the asymptotic behavior (as tends to infinity) of the spectral measure of unitary Brownian motions; see for example [14, 17, 2, 3, 9, 10, 5, 7, 4] and the references therein.

Of course, many tools have been developed to study the spectral distributions of random matrices in high dimension in a variety of contexts. Among them is an approach developed by the first author with M. Meckes (see [13] for a survey) which allows for quantitative estimates on rates of convergence of the empirical spectral measure in a wide assortment of random matrix ensembles. This approach is based on concentration of measure and bounds for suprema of stochastic processes, in combination with more classical tools from matrix analysis, approximation theory, and Fourier analysis. In the present paper, we combine some of these techniques with recent estimates on the rates of convergence of the moments for the empirical spectral distribution of unitary Brownian motion [4] to prove asymptotically almost sure rates of convergence.

Statement of results. Let denote the unitary group and its Lie algebra of skew-Hermitian matrices equipped with the scaled (real) inner product . This is the unique scaling that gives meaningful limiting behavior as ; see for example Remark 3.4 of [5]. The inner product on induces a left-invariant Riemannian metric on , and we may define Brownian motion on as the Markov diffusion issued from the identity with generator , that is, one half the left-invariant Laplacian on with respect to this metric. One may equivalently describe as the solution to the Itô stochastic differential equation

 dUNt=UNtdWNt−12UNtdt

with , where is a standard Brownian motion on (for example, take an orthonormal basis of with respect to the given inner product and where the are independent standard Brownian motions on ). This realization of unitary Brownian motion is computationally more useful and is mainly what will be used in the sequel. It should be noted that another standard description of the unitary Brownian motion is via a stochastic differential equation with respect to a Hermitian Brownian motion, which results in a difference of a factor of in the diffusion coefficient. For , let denote the end point distribution of Brownian motion; is called the heat kernel measure on .

Our primary object of interest is the empirical spectral measure of unitary Brownian motion. A matrix has complex eigenvalues of modulus one which we denote by (repeated according to multiplicity), and the spectral measure of is defined to be the probability measure on the unit circle given by

 μU:=1NN∑j=1δeiθj.

In particular, for

 ∫S1fdμU=1NN∑j=1f(eiθj).

For each fixed , is a random unitary matrix, and we denote its empirical spectral measure by . In [2], Biane showed that the random probability measure converges weakly almost surely to a deterministic probability measure, which we denote by : that is, for all ,

 limN→∞∫S1fdμNt=∫S1fdνt a.s.

The measure represents in some sense the spectral distribution of a “free unitary Brownian motion”. For , possesses a continuous density that is symmetric about . When , is supported on an arc strictly contained in the circle; for , . The paper [4] presents a nice brief summary of these and other properties of and the construction of free unitary Brownian motion.

In the present paper, we give estimates on the -Wasserstein distance between the empirical spectral distribution and its limiting spectral measure , where for probability measures and on , the -Wasserstein distance is defined by

 W1(μ,ν):=inf{∫|x−y|dπ(x,y):π is a coupling of μ and ν}.

We will also make use of the equivalent dual representation of due to Kantorovich and Rubenstein:

 W1(μ,ν)=sup{∫fdμ−∫fdν:|f|L≤1},

where denotes the Lipschitz constant of .

The main results of this paper are the following.

Theorem 1.

Let be a Brownian motion on . For , let denote the empirical spectral measure as above, and let denote the ensemble-averaged spectral measure of defined by

 ∫S1fd¯¯¯μNt:=E∫S1fdμNt.

Then there is a constant such that with probability one, for all sufficiently large and ,

 W1(μNt,¯¯¯μNt)≤C(tN2)1/3.

Moreover, given such that , there is a constant depending only on such that for all sufficiently large

 W1(μNt,¯¯¯μNt)≤CαN2/3.
Theorem 2.

Let be the limiting spectral measure for unitary Brownian motion described above. There is a constant such that for all and

 W1(¯¯¯μNt,νt)≤Cmin{t2/5logNN2/5,ecN/tt+1N2}.

One may infer from these bounds direct (a.s.) estimates on the rate of convergence of the empirical spectral distribution to its limiting distribution for all sufficiently large . To the authors’ knowledge, these results constitute the first known rates of convergence for itself; previously the only known convergence rates were for moments of the ensemble-averaged spectral measure [4].

As a technical tool, we also determine rates for the convergence in time of Biane’s measure to the uniform distribution on .

Proposition 3.

Let denote the limiting spectral measure and the uniform measure on . Then there is a constant so that for all

 W1(νt,ν)≤Ct3/2e−t/4.

The organization of the paper is as follows. In Section 2, we establish improved concentration estimates for heat kernel measure on via a coupling of Brownian motions on and . These estimates are then used in Section 3 to prove Theorem 1. In Section 4 we use Fourier and classical approximation methods, as well as the previously mentioned coupling argument, to give bounds on the rate of convergence of the ensemble-averaged spectral measure to the limiting measure as in Theorem 2. In this section, we also give the proof of Proposition 3 using similar methods.

2. A concentration inequality for heat kernel measure

In this section, we will consider concentration of measure results for Lipschitz functions of the following form. Let be a metric space equipped with Borel probability measure . Then, under some conditions, there exists such that, for all and Lipschitz with Lipschitz constant and ,

 (1) ρ(|F−EF|≥r)≤2e−r2/L2C.

Concentration estimates of this type are standard for heat kernel measure on a Riemannian manifold with curvature bounded below. We recall here the necessary results. Let be a complete Riemannian manifold, and let denote the Laplace-Beltrami operator acting on . We write to denote the heat semigroup; that is, for and any sufficiently nice function ,

 Ptf(x)=E[f(ξxt)]=∫Mfdρxt

where is the Markov diffusion on started at with generator (that is, is a Brownian motion on ) and is the heat kernel measure. If denotes the Ricci curvature tensor on , then for implies that for all the estimate (1) holds for with coefficient , where when , we interpret this to be . (A typical proof is via log Sobolev estimates.) See for example Corollary 2.6 and Lemma 6.3 of [8] (stated in the case that , which is the only relevant case here).

For small the general machinery described above leads to a sharp concentration estimate for heat kernel measure on . For large , the estimates are no longer sharp, but we can improve them using a coupling approach inspired by one in [12]. The following lemma gives the key idea.

Lemma 4.

Let be a real-valued Brownian motion and , and let be a Brownian motion on issued from the identity. Then is a Brownian motion on .

Proof.

Set , and note that and satisfy the stochastic differential equations

 dzt=ztidb0tN−12N2ztdt and dZt=Ztdbt−12N2Ztdt

where with . Let be an orthonormal basis of , and let be independent real-valued Brownian motions. Then is a Brownian motion on , and satisfies the stochastic differential equation

 dVt=Vt∘d~Wt=Vtd~Wt+12Vt∑ξ∈βξ2dt=Vtd~Wt−(N2−12N2)Vtdt.

(Here denotes a Stratonovich integral, which is then expressed as an Itô integral via the usual calculus.)

Now, is an orthonormal basis of , and satisfies

 d(ZtVt) =(Ztdbt−12N2Ztdt)Vt+Zt(Vtd~Wt−(N2−12N2)Vt) =ZtVt(dbt+d~Wt)−12ZtVtdt.

Since is a Brownian motion on , this implies that is a Brownian motion on . ∎

We use this realization of the Brownian motion on along with concentration properties of the laws of and to obtain sub-Gaussian concentration independent of on for large .

Proposition 5.

Let be distributed according to heat kernel measure on , and let be -Lipschitz. For any ,

 P(|F(Ut)−EF(Ut)|>r)≤2e−r2tL2.

If , then there are constants , with universal and depending only on , such that for all

 P(|F(Ut)−EF(Ut)|>r)≤Cαe−cr2L2.
Proof.

To prove the first statement, observe that since the Ricci curvature on is nonnegative, the comments preceding Lemma 4 imply that the desired concentration estimate holds for with coefficient . That is, if is -Lipschitz with with , then

 P(|F(Ut)−EF(Ut)|>r)≤2e−r2tL2.

To prove the second statement, observe that the representation of in Lemma 4 implies that

 (2) Missing or unrecognized delimiter for \big

Now for the first term, measure concentration for follows again from curvature considerations: following for example Proposition E.15 and Lemma F.27 of [1], one may compute the Ricci curvature on with respect to the given inner product as

 Ric(X,X)=12⟨X,X⟩N.

Thus, by the discussion preceding Lemma 4, on satisfies the following concentration estimate: if is -Lipschitz, then

 P(|G(Vt)−EG(Vt)|>r)≤2e−cr2L2,

where . For fixed, is an -Lipschitz function on , and so the first term of (2) is bounded by .

For the second term of (2), let be the random variable taking values in such that, on , . Conditioning on , we have

 P (∣∣E[F(ztVt)∣∣zt]−EF(ztVt)∣∣>r2) =E(P[∣∣E[F(ztVt)∣∣zt]−EF(ztVt)∣∣>r2∣∣∣K]) (3) Missing or unrecognized delimiter for \right +P(∣∣E[F(ztVt)∣∣K]−EF(ztVt)∣∣>r4)

To deal with the first term in (3), let denote integration over only, integration over only, and let denote integration over conditional on . Observe that by independence of and

 ∣∣E[F(ztVt)∣∣zt]−E[F(ztVt)∣∣K=k]∣∣ =∣∣EVt[F(ztVt)]−EVtEzt|K=k[F(ztVt)]∣∣ ≤EVt∣∣F(ztVt)−Ezt|K=k[F(ztVt)]∣∣ =∫SU(N)|F(ztV)−Ezt|K=k[F(ztV)]|dVt(V).

Now, for fixed, is an -Lipschitz function on , and so, conditional on , can only fluctuate by as much as . Thus if , the first term is zero. For , we may just use the trivial bound of 1 and choose in the statement of the proposition so that , where is as above.

It remains to show that the second term in (3) satisfies a bound of the desired form. First, we observe that in the regime , is essentially uniform in . Indeed, a sharp estimate of the time to equilibrium of was proved in Theorem 1.2 of [16], from which it follows (see the discussion preceding the theorem in [16], and note that the normalization here differs by a factor of 2 from the one used there) that if is the density of with respect to Haar measure on , then

 ∥hSU(N)t−1∥1≤e−t(1+o(1))8logN.

We thus have that if is distributed according to Haar measure on ,

 |EVt[F(ztVt)]−EV[F(ztV)]|≤Ne−t(1+o(1))8logN≤Ne−αN(1+o(1))8logN,

where we have also used that the diameter of with respect to our scaling of the inner product is .

We may thus replace by in order to bound the second term of (3). Let be a standard normal random variable, fix and , and let with . Let be a random variable with law given by the conditional distribution of given , where is the random variable so that, on , (and thus ). We restrict for now to the case that ; once we arrive at a bound which is uniform in , it will hold for negative as well by symmetry.

With notation as above, observe that

 E[F(ztV)]−E[F(ztV)∣∣K=k]=EV[Ezt[F(ztV)]−Ezt|K=k[F(ztV)]].

Recall that for fixed , is an -Lipschitz function of , and that is a -Lipschitz function of . We therefore consider an -Lipschitz function of .

Now, the conditional distribution of given and is exactly as constructed above. The density of on is given by

 gt,k(x)=ct,k√2πt∑m≥0e−(x+2πNm)22t,

where

 ct,k=1P(√tZ∈∪m≥0Im,k).

Let be a uniform random variable on , and consider

 EF(Zt,k)−EF(Uk)=∫2π(k+1)2πkF(x)(gt,k(x)−12π)dx.

Since , the average value of on is , and so is bounded by the maximum fluctuation of over this interval. Without loss of generality, we may replace by , in which case it follows that

 |EF(Zt,k)−EF(Uk)|≤(2π)2Lsupx,y∈[2πk,2π(k+1))|gt,k(x)−gt,k(y)|≤(2π)2L(gt,k(2πk)−gt,k(2π(k+1))).

Now,

 gt,k(2πk)−gt,k(2π(k+1))=ct,k√2πt∑m≥0(e−(2π(Nm+k))22t−e−(2π(Nm+k+1))22t)=ct,k√2πt∑m≥0e−(2π(Nm+k))22t(1−e−4π2(2Nm+2k+1)2t)≤∑m≥0e−(2π(Nm+k))22t(4π2(2Nm+2k+1)2t)2π∑m≥0e−(2π(Nm+k+1))22t.

Suppose first that , from which it follows that for , is decreasing as a function of . Letting denote the th term of the sum in the numerator,

 ∑m≥0e−(2π(Nm+k))22t(4π2(2Nm+2k+1)2t)≤a0+a1+∫∞1e−(2π(Nx+k))22t(4π2(2Nx+2k+1)2t)dx=a0+a1+√t2πN∫∞2π(N+k)√te−u22[2πu√t+2π2t]du=a0+a1+1Ne−(2π(N+k))22t+πN√t∫∞2π(N+k)√te−u22du≤a0+a1+(1N+12N(N+k))e−(2π(N+k))22t.

Combining this with the trivial lower bound in the denominator of

 ∑m≥0e−(2π(Nm+k+1))22t≥e−(2π(k+1))22t

now gives that

 gt,k(2πk)−gt,k(2π(k+1))≤12π[(2π2(2k+1)t)e2π2(2k+1)t+(1N+12N(N+k)+2π2(2N+2k+1)t)e−2π2(N−1)(2k+N+1)t].

In particular, under the assumption that , we have that this quantity is bounded uniformly in by a constant depending only on . So for each , lies within an interval centered on with length bounded by . Writing , observe that

 eiUkNV=eiU0Ne2πikNVd=eiU0NV,

by the translation invariance of Haar measure on (since ). We thus have

 EUk[F(eiUkNV)]=EU0[F(eiU0NV)],

and for each , (and thus itself) is an interval centered on with length bounded by . As before, absorbing the requirement that into the constant in the statement of the proposition completes the proof in this case.

Finally, if , then the function

 h(x)=e−(2π(Nx+k))22t(4π2(2Nx+2k+1)2t)

is increasing until , and then decreasing. We thus have in this regime that

 (4) ∑m≥0e−(2π(Nm+k))22t(4π2(2Nm+2k+1)2t)≤a0+e−(2π(Nx0+k))22t(4π2(2Nx0+2k+1)2t)+∫∞1e−(2π(Nx+k))22t(4π2(2Nx+2k+1)2t)dx

By definition of ,

 e−(2π(Nx0+k))22t(4π2(2Nx0+2k+1)2t)=e−2tπ2(2N+2k+1)2(42N+2k+1+2π2t)≤e−2π(k+1))22te−2tπ2(2N+2k+1)2+(2π(k+1))22t(2N+2π2t).

Since and ,

 e−2tπ2(2N+2k+1)2+(2π(k+1))22t≤e−2N(2N+1)(4N+1)2+2(N+1)2πN(2N+1)≤2.

The first and third terms of (4) can be bounded as before, and so we have that

 gt,k(2πk)−gt,k(2π(k+1))≤12π[(2π2(2k+1)t)e2π2(2k+1)t+(5N+12N(N+k)+4π2t)e−2π2(N−1)(2k+N+1)t].

The proof is now completed as above. ∎

3. Concentration of μNt

Armed with the concentration inequality for heat kernel measure, the proof of Theorem 1 is an application of the program laid out in [13] for estimating the Wasserstein distance between the empirical spectral measure of a random matrix and the ensemble average, in the presence of measure concentration. Since it is relatively brief, we include the detailed argument here for completeness.

The first step is to bound the “average distance to average” as follows.

Proposition 6.

There is a constant such that for all and

 EW1(μNt,¯¯¯μNt)≤c(tN2)1/3.

If , then there is a constant depending only on , such that

 EW1(μNt,¯¯¯μNt)≤cαN2/3.
Proof.

We will give the proof of the first statement only, which applies the first half of Lemma 5; the proof of the second statement is identical using only instead the second half of Lemma 5.

Recall that

 W1(μNt,¯¯¯μNt)=sup|f|L≤1(∫fdμNt−∫fd¯¯¯μNt),

where . That is, our task is to estimate the expected supremum of the centered stochastic process , with

 Xf:=∫fdμNt−∫fd¯¯¯μNt=∫fdμNt−E∫fdμNt.

Note that without loss we may choose the indexing set to be 1-Lipschitz functions on the circle with ; write for the set of all such functions. Now, if is a fixed Lipschitz function and denotes the spectral measure of , then

 U⟼(∫fdμU−∫fd¯¯¯μNt)

is -Lipschitz (see Lemma 2.3 of [11], and note the different normalization of the metric on matrices), and so by Lemma 5,

 P(|Xf−Xg|>x)=P(|Xf−g|>x)≤2e−N2x2t|f−g|2L.

That is, the stochastic process satisfies a sub-Gaussian increment condition.

Now, if is a centered stochastic process indexed by the unit ball of a finite-dimensional normed space , and satisfies the increment condition

 P(|Xu−Xv|>x)≤ae−x2K2∥u−v∥2

for each , then it is a consequence of Dudley’s entropy bound (see [13] for a detailed proof) that

 (5) E(sup∥v∥