Products of independent elliptic random matrices

# Products of independent elliptic random matrices

Sean O’Rourke Department of Mathematics, University of Colorado at Boulder, Boulder, CO 80309 David Renfrew Department of Mathematics, UCLA Alexander Soshnikov Department of Mathematics, University of California, Davis, One Shields Avenue, Davis, CA 95616-8633  and  Van Vu Department of Mathematics, Yale University, New Haven, CT 06520, USA
###### Abstract.

For fixed , we study the product of independent elliptic random matrices as tends to infinity. Our main result shows that the empirical spectral distribution of the product converges, with probability , to the -th power of the circular law, regardless of the joint distribution of the mirror entries in each matrix. This leads to a new kind of universality phenomenon: the limit law for the product of independent random matrices is independent of the limit laws for the individual matrices themselves.

Our result also generalizes earlier results of Götze–Tikhomirov  and O’Rourke–Soshnikov  concerning the product of independent iid random matrices.

S. O’Rourke has been supported by grant AFOSAR-FA-9550-12-1-0083
D. Renfrew is partly supported by NSF grant DMS-0838680
A. Soshnikov has been supported in part by NSF grant DMS-1007558
V. Vu is supported by research grants DMS-0901216, DMS-1307797, and AFOSAR-FA-9550-12-1-0083.

## 1. Introduction

We begin by recalling that the eigenvalues of a matrix are the roots in of the characteristic polynomial , where is the identity matrix. We let denote the eigenvalues of . In this case, the empirical spectral measure is given by

 μM:=1NN∑i=1δλi(M).

The corresponding empirical spectral distribution (ESD) is given by

 FM(x,y):=1N#{1≤i≤N:Re(λi(M))≤x,Im(λi(M))≤y}.

Here denotes the cardinality of the set .

If the matrix is Hermitian, then the eigenvalues are real. In this case the ESD is given by

 FM(x):=1N#{1≤i≤N:λi(M)≤x}.

One of the simplest random matrix ensembles is the class of random matrices with independent and identically distributed (iid) entries.

###### Definition 1.1 (iid random matrix).

Let be a complex random variable. We say is an iid random matrix with atom variable if the entries of are iid copies of .

When is a standard complex Gaussian random variable, can be viewed as a random matrix drawn from the probability distribution

 P(dM)=1πN2e−tr(MM∗)dM

on the set of complex matrices. Here denotes the Lebesgue measure on the real entries

 {Re(mij):1≤i,j≤N}∪{Im(mij):1≤i,j≤N}

of . The measure is known as the complex Ginibre ensemble. The real Ginibre ensemble is defined analogously. Following Ginibre , one may compute the joint density of the eigenvalues of a random matrix drawn from the complex Ginibre ensemble. Indeed, has density

 pN(z1,…,zN):=1πN∏Ni=1k!exp(−N∑k=1|zk|2)∏1≤i

Mehta [35, 36] used the joint density function (1.1) to compute the limiting spectral measure of the complex Ginibre ensemble. In particular, he showed that if is drawn from the complex Ginibre ensemble, then the ESD of converges to the circular law as , where

 Fcirc(x,y):=μcirc({z∈C:Re(z)≤x,Im(z)≤y})

and is the uniform probability measure on the unit disk in the complex plane. Edelman  verified the same limiting distribution for the real Ginibre ensemble.

For the general (non-Gaussian) case, there is no formula for the joint distribution of the eigenvalues and the problem appears much more difficult. The universality phenomenon in random matrix theory asserts that the spectral behavior of an iid random matrix does not depend on the distribution of the atom variable in the limit . In other words, one expects that the circular law describes the limiting ESD of a large class of random matrices (not just Gaussian matrices).

An important result was obtained by Girko [21, 22] who related the empirical spectral measure of a non-Hermitian matrix to that of a family of Hermitian matrices. Using this Hermitization technique, Bai [8, 9] gave the first rigorous proof of the circular law for general (non-Gaussian) distributions. He proved the result under a number of moment and smoothness assumptions on the atom variable , and a series of recent improvements were obtained by Götze and Tikhomirov , Pan and Zhou  and Tao and Vu [46, 48]. In particular, Tao and Vu [47, 48] established the law with the minimum assumption that has finite variance.

###### Theorem 1.2 (Tao-Vu, ).

Let be a complex random variable with mean zero and unit variance. For each , let be a iid random matrix with atom variable . Then the ESD of converges almost surely to the circular law as .

More recently, Götze and Tikhomirov  consider the ESD of the product of independent iid random matrices. They show that, as the sizes of the matrices tend to infinity, the limiting distribution is given by , where is supported on the unit circle in the complex plane and has density given by

 fm(z):={1mπ|z|2m−2, % for |z|≤10, for |z|>1 (1.2)

in the complex plane. It can be verified directly, that if is a random variable distributed uniformly on the unit disk in the complex plane, then has distribution .

###### Theorem 1.3 (Götze-Tikhomirov, ).

Let be an interger, and assume are complex random variables with mean zero and unit variance. For each and , let be an iid random matrix with atom variable , and assume are independent. Define the product

 PN:=N−m/2YN,1⋯YN,m.

Then converges to as .

The convergence of to in Theorem 1.3 was strengthened to almost sure convergence in [10, 44]. The Gaussian case was originally considered by Burda, Janik, and Waclaw ; see also . We refer the reader to [1, 2, 3, 4, 5, 6, 14, 18, 19] and references therein for many other interesting results concerning products of Gaussian random matrices.

## 2. New results

In this paper, we generalize Theorem 1.3 by considering products of independent real elliptic random matrices. Elliptic random matrices were originally introduced by Girko [23, 24] in the 1980s.

###### Definition 2.1 (Real elliptic random matrix).

Let be a random vector in , and let be a real random variable. We say is a real elliptic random matrix with atom variables if the following conditions hold.

• (independence) is a collection of independent random elements.

• (off-diagonal entries) is a collection of iid copies of .

• (diagonal entries) is a collection of iid copies of .

Real elliptic random matrices generalize iid random matrices. Indeed, if are iid, then is just an iid random matrix. On the other hand, if almost surely, then is a real symmetric matrix. In this case, the eigenvalues of are real and is known as a real symmetric Wigner matrix .

Suppose have mean zero and unit variance. Set . When and has mean zero and finite variance, it was shown in  that the ESD of converges almost surely to the elliptic law as , where

 Fρ(x,y)=μρ({z∈C:Re(z)≤x,Im(z)≤y})

and is the uniform probability measure on the ellipsoid

 Eρ={z∈C:Re(z)2(1+ρ)2+Im(z)2(1−ρ)2<1}.

This is a natural generalization of the circular law (Theorem 1.2). Figure 1 displays a numerical simulation of the eigenvalues of a real elliptic random matrix.

In this note, we consider the product of independent real elliptic random matrices. In particular, we assume each real elliptic random matrix has atom variables which satisfy the following conditions.

###### Assumption 2.2.

There exists such that the following conditions hold.

1. both have mean zero and unit variance.

2. .

3. satisfies .

4. has mean zero and finite variance.

In our main result below, we show that the limiting distribution (with density given by (1.2)) from Theorem 1.3 for the product of independent iid random matrices is also the limiting distribution for the product of independent elliptic random matrices. In other words, the limit law for the product of independent random matrices is independent of the limit laws for the individual matrices themselves. This type of universality was first considered by Burda, Janik, and Waclaw in  for matrices with Gaussian entries; see also . Figure 2 displays several numerical simulations which illustrate this phenomenon.

###### Theorem 2.3.

Let be an integer. For each , let be real random elements that satisfy Assumption 2.2. For each and , let be an real elliptic random matrix with atom variables , and assume are independent. Then the ESD of the product

 PN:=N−m/2YN,1⋯YN,m

converges almost surely to (with density given by (1.2)) as .

More generally, we establish a version of Theorem 2.3 where each elliptic random matrix is perturbed by a deterministic, low rank matrix with small Hilbert-Schmidt norm. In fact, Theorem 2.3 will follow from Theorem 2.4 below. We recall that, for any matrix , the Hilbert-Schmidt norm is given by the formula

 ∥M∥2:=√tr(MM∗)=√tr(M∗M). (2.1)
###### Theorem 2.4.

Let be an integer. For each , let be real random elements that satisfy Assumption 2.2. For each and , let be an real elliptic random matrix with atom variables , and assume are independent. For each , let be a deterministic matrix, and assume

 (2.2)

for some . Then the ESD of the product

 PN:=N−m/2m∏k=1(YN,k+AN,k) (2.3)

converges almost surely to (with density given by (1.2)) as .

###### Remark 2.5.

We conjecture that items (2) and (3) from Assumption 2.2 are not required for Theorem 2.4 to hold. Indeed, in view of Theorem 1.2 and , it is natural to conjecture that need only have two finite moments. Also, our proof of Theorem 2.4 can almost be completed under the assumption that . We only require that in Section 5 in order to control the least singular value of matrices of the form , where is a deterministic matrix whose entries are bounded by , for some . See Remark 5.3 and Theorem 2.8 below for further details.

###### Remark 2.6.

Among other things, the perturbation by in Theorem 2.4 allows one to consider elliptic random matrices with nonzero mean. Indeed, let be a real number, and assume each entry of takes the value . Then is an elliptic random matrix whose atom variables have mean .

###### Remark 2.7.

In , a result similar to Theorem 2.3 is proved under a different set of assumptions.

As noted above, when , the matrix is known as a real symmetric Wigner matrix. Theorem 2.4 requires that , but in the special case when , we are able to extend our proof to show that the same result holds for the product of two independent real symmetric Wigner matrices.

###### Theorem 2.8.

Let be real random variables with mean zero and unit variance, and which satisfy

 E|ξ1,1|2+τ+E|ξ2,1|2+τ<∞

for some . For each and , let be an real symmetric matrix whose diagonal entries and upper diagonal entries are iid copies of , and assume and are independent. Then the ESD of the product

 PN:=N−1YN,1YN,2

converges almost surely to (with density given by (1.2) when ) as . Figure 2. The plots show the eigenvalues of the product of two independent 1000×1000 elliptic random matrices. The plot in the upper-left corner shows the eigenvalues of the product of two identically distributed elliptic random matrices with Gaussian entries when ρ1=ρ2=1/2. The upper-right plot depicts the eigenvalues of the product of two independent Wigner matrices. The bottom-left plot shows the eigenvalues of the product of two iid random matrices. The plot in the bottom-right corner contains the eigenvalues of the product of a Wigner matrix and an independent iid random matrix.

### 2.1. Overview and outline

We begin by outlining the proof of Theorem 2.4. Instead of directly considering , we introduce a linearized random matrix, , where and are block matrices of the form

 YN:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0YN,1000YN,20⋱⋱00YN,m−1YN,m0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (2.4)

and

 AN:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0AN,1000AN,20⋱⋱00AN,m−1AN,m0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (2.5)

The following theorem gives the limiting distribution of , from which we will deduce our main theorem as a corollary.

###### Theorem 2.9.

Under the assumptions of Theorem 2.4, the ESD of converges almost surely to the circular law as .

In Section 3, we show that Theorem 2.4 is a short corollary of Theorem 2.9. This same linearization trick was used in  to study products of non-Hermitian matrices with iid entries. Similar techniques were also used in [7, 30] to study general self-adjoint polynomials of self-adjoint random matrices.

Sections 4, 5, and 6 are dedicated to proving Theorem 2.9. Following the ideas of Girko [21, 22], we compute the limiting spectral measure of a non-Hermitian random matrix , by employing the method of Hermitizaition. Given an matrix , we recall that the empirical spectral measure of is given by

 μM:=1NN∑i=1δλi(M),

where are the eigenvalues of . We let denote the symmetric empirical measure built from the singular values of . That is,

 νM:=12NN∑i=1(δσi(M)+δ−σi(M)),

where are the singular values of . In particular,

 σ1(M):=sup∥x∥=1∥Mx∥

is the largest singular value of and

 σN(M):=inf∥x∥=1∥Mx∥

is the smallest singular value, both of which will play a key role in our analysis below.

The key observation of Girko [21, 22] relates the empirical spectral measure of a non-Hermitian matrix to that of a Hermitian matrix. To illustrate the connection, consider the Cauchy–Stieltjes transform of the measure , where is an matrix, given by

 sN(z):=1NN∑i=11λi(M)−z=∫C1x−zμM(dx),

for . Since is analytic everywhere except at the poles (which are exactly the eigenvalues of ), the real part of determines the eigenvalues. Let denote the imaginary unit, and set . Then we can write the real part of as

 Re(sN(z)) =1NN∑i=1Re(λi(M))−s|λi(M)−z|2 =−12NN∑i=1∂∂slog|λi(M)−z|2 =−12N∂∂slogdet(M−zI)(M−zI)∗ =−∂∂s∫Clogx2νM−zI(dx),

where denotes the identity matrix. In other words, the task of studying reduces to studying the measures . The difficulty now is that the function has two poles, one at infinity and one at zero. The largest singular value can easily be bounded by a polynomial in . The main difficulty is controlling the least singular value.

In order to study it is useful to note that it is also the empirical spectral measure for the Hermitization of . The Hermitization of is defined to be

 H:=[0MM∗0].

For an matrix, the Stieltjes transform of is also the trace of the Hermitized resolvent. That is, for , we have

 ∫1x−ηνM−zI(dx)=12Ntr(R(q)),

where

 R(q):=(H−q⊗IN)−1,q:=[ηz¯zη].

Here denotes the Kronecker product of the matrix and the identity matrix .

Typically, in order to estimate the measures , one shows that the Stieltjes transform approximately satisfies a fixed point equation. Then one can show that this Stieltjes transform is close to the Stieltjes transform that exactly solves the fixed point equation. Because of the dependencies between entries in the matrix , directly computing the trace of the resolvent of the Hermitization of is troublesome. To circumvent this issue, in Section 4, instead of taking the trace of the resolvent, we instead take the partial trace and consider a matrix-valued Stieltjes transform. Then we show this partial trace approximately satisfies a matrix-valued fixed point equation.

In Section 5, we deduce a bound for the least singular value of the matrix from the known bounds on the least singular values of the individual matrices . We finally complete the proof of Theorem 2.9 in Section 6.

The proof of Theorem 2.8 is very similar to the proof of Theorem 2.4. In fact, there are only a few places in the proof of Theorem 2.4 where the condition is required. We prove Theorem 2.8 in Section 7.

### 2.2. A remark from free probability

The fact that the limiting distribution of the product is isotropic when the limiting distributions of the individual matrices are not might be surprising at first. Free probability, which offers a natural way to study limits of random matrices by considering joint distributions of elements from a non-commutative probability space, can shed some light on this. In free probability, the natural distribution of non-normal elements is known as the Brown measure. For an introduction to free probability, we refer the reader to ; see  for further details about -diagonal pairs as well as [12, 29] for computations of Brown measures. The distribution in Theorem 2.4 has also appeared in .

A non-commutative probability space is a unital algebra with a tracial state . We say a collection of elements are free if

 τ(p1(ai(1))…pk(ai(k)))=0

whenever are polynomials such that , and .

In free probability, there are a distinguished set of elements known as -diagonal elements. We refer the reader to [32, Section 4.4] for complete details. These operators enjoy several nice properties. When they are non-singular, one such property is that their polar decomposition is , where is a haar unitary operator, is a positive operator, and are free. As a result of this decomposition, their Brown measure is isotropic. Additionally, the set of -diagonal operators is closed under addition and multiplication of free elements.

In many cases, the Brown measure can be computed using the techniques of [12, 29]; however, for the purposes of this note (and due to discontinuities of the Brown measure), we will instead focus on a purely random matrix approach when computing the limiting distribution.

We conclude this subsection by showing that the product of two elliptical elements is -diagonal. We consider two elements for simplicity; however, the argument easily generalizes to the product of elliptical elements.

First, we decompose an elliptical operator into the sum of a semicircular and circular elements, that are free from each other: . Since the sum of free -diagonal elements is again -diagonal, it suffices to consider each term in the sum individually and then observe that the terms are free from one another. Each term is of the form , where is either semicircular or circular, with polar decomposition: , , where is a quarter circular element, has distribution 1/2 at -1 and 1/2 at 1, and commutes with , is haar unitary free from . Then we consider the product:

 x1x2=v1h1v2h2.

We begin by introducing a new free haar unitary . Indeed, has the same distribution as

 uv1h1u∗v2h2.

Then and are haar unitaries, and one can check they are free from each other and and . Since the product of -diagonal elements remains -diagonal is -diagonal. Repeating this process for each term leads to the sum of free -diagonal operators.

### 2.3. Notation

We use asymptotic notation (such as ) under the assumption that . We use , or to denote the bound for all sufficiently large and for some constant . Notations such as and mean that the hidden constant depends on another constant . We always allow the implicit constants in our asymptotic notation to depend on the integer from Theorem 2.4; we will not denote this dependence with a subscript. or means that as .

is the spectral norm of the matrix . denotes the Hilbert-Schmidt norm of (defined in (2.1)). We let denote the identity matrix. Often we will just write for the identity matrix when the size can be deduced from the context.

We write a.s., a.a., and a.e. for almost surely, Lebesgue almost all, and Lebesgue almost everywhere respectively. We use to denote the imaginary unit and reserve as an index. We let denote the indicator function of the event .

We let and denote constants that are non-random and may take on different values from one appearance to the next. The notation means that the constant depends on another parameter . We always allow the constants and to depend on the integer from Theorem 2.4; we will not denote this dependence with a subscript.

In view of Theorem 2.4 and Assumption 2.2, we define the correlations for the atom variables . In addition, we let be such that

 m∑k=1(E|ξk,1|2+τ+E|ξk,2|2+τ)<∞.

## 3. Proof of Theorem 2.4

We begin by proving Theorem 2.4 assuming Theorem 2.9. The majority of the paper will then be devoted to proving Theorem 2.9.

We remind the reader that the matrices , and are defined in (2.3), (2.4), and (2.5), respectively.

###### Proof of Theorem 2.4.

Let . Then is a block diagonal matrix of the form

 ⎡⎢ ⎢ ⎢⎣ZN,10⋱0ZN,m⎤⎥ ⎥ ⎥⎦,

where and is the matrix

 N−m/2(YN,k+AN,k)⋯(YN,m+AN,m)(YN,1+AN,1)⋯(YN,k−1+AN,k−1)

for .

Let be a bounded and continuous function. Since each has the same eigenvalues as , we have

 ∫Cf(z)dμPN(z)=1Nn∑i=1f(λi(PN))=1mNmN∑i=1f(λi(ZmN))=∫Cf(zm)dμZN(z).

By Theorem 2.9, we have almost surely

 ∫Cf(zm)dμZN(z)⟶1π∫Df(zm)d2z

as , where is the unit disk in the complex plane centered at the origin and . Thus, by the transformation , we obtain

 1π∫Df(zm)d2z=mπ∫Df(z)1m2|z|2m−2d2z,

where the factor of out front of the integral corresponds to the fact that the transformation maps the complex plane times onto itself.

Combining the computations above, we conclude that almost surely

 ∫Cf(z)dμPN(z)⟶1πm∫Df(z)|z|2m−2d2z

as . Since was an arbitrary bounded and continuous function, the proof of Theorem 2.4 is complete. ∎

## 4. A matrix-valued Stieltjes transform

In this section, we define a matrix-valued Stieltjes transform and introduce the relevant notation and limiting objects. Then we show that this Stieltjes transform concentrates around its expectation and estimate the error between its expectation and the limiting transform.

Here and in the sequel, we will take advantage of the following form for the inverse of a partitioned matrix (see, for instance, [33, Section 0.7.3]):

 [ABCD]−1=[(A−BD−1C)−1−A−1B(D−CA−1B)−1−(D−CA−1B)−1CA−1(D−CA−1B)−1], (4.1)

where and are square matrices.

Set , and let . Let be the Hermitization of . Define the resolvent

 RN(q):=(HN−q⊗IN)−1,

where

 q:=[ηImzIm¯zImηIm] (4.2)

for .

By the Stieltjes inversion formula, can be recovered from . Because of the dependencies between matrix entries, each entry of the resolvent cannot be computed directly by Schur’s Complement. One possible way to compute resolvent entries is by following the approach in [44, Section 4.3] and use a decoupling formula to compute matrix entries. See also [37, 40] for computations in the elliptical case. The dependencies introduce more terms to these computations, leading to a system of equations involving diagonal entries of each block of the resolvent. These equations do not seem to admit an obvious solution. Instead we offer a matrix-valued interpretations of these equations as well as a more direct derivation of the equations.

In order to study the resolvent we will retain the block structure of and view matrices as elements of matrices tensored with matrices. Taking this view, is by matrix with by blocks. When we wish to refer to one of these blocks (or more generally any element of a matrix) we will use a superscript for the entry. Instead of considering the full trace of , we instead take the partial trace over the matrix part of the tensor product and define . That is, is a matrix whose entry is the normalized trace of the block of . In other words, . To compute this partial trace we consider , the matrix whose entry is the entry of the block . Finally, we define the scalar

 aN(q):=12mtrΓN(q).

For each , let denote the matrix with the -th rows and -th columns of replaced by zeroes. Let be the Hermitization of . Define the resolvent

 R(k)N:=(H(k)N−q⊗IN)−1, (4.3)

and set .

Let be the matrix whose entry is the column of the block of , with the entry of each vector set to . Note that we use a semi-colon when we refer to matrix entries or columns, in contrast to the comma, which referred to a matrix.

Later in this section we will show that approximately satisfies the fixed point equation

 Γ=−(q+Σ(Γ))−1 (4.4)

with being a linear operator on matrices defined by:

 Σ(A)ab=2m∑c,d=1σ(a,c;d,b)Acd

where and is the matrix that is the block of the matrix , of course, the choice was arbitrary. More concretely, we define for any to be the column index of nonzero block in the row of . So

 Σ(A)ab=Aa′a′δab+ρaAa′aδa′b,

where for we define . It is important that leaves diagonal entries of on the diagonal and that .

To describe the limiting matrix-valued Stieltjes transform, , we first define , the Stieltjes transform corresponding to the circular law. That is, for each , is the unique Stieltjes transform that solves the equation

 a(q)=a(q)+η|z|2−(a(q)+η)2 (4.5)

for all ; see [27, Section 3].

Let

 Γ(q):=[−(a(q)+η)Im−zIm−¯zIm−(a(q)+η)Im]−1;

 Γ(q)=⎡⎢⎣a(q)Imz(a(q)+η)2−|z|2Im¯z(a(q)+η)2−|z|2Ima(q)Im⎤⎥⎦.

We recall that for a square matrix , the imaginary part of is given by . We say has positive imaginary part if is positive definite. It was shown in  that (4.4) has one solution with positive imaginary part and is therefore a matrix-valued Stietljes transform. Furthermore, the last two equalities show that is a solution to (4.4).

A good way to see that the solution to (4.4) is of the given form is to note that for large , is approximately . Then by analytic continuation, the entries of that are non-zero must also be non-zero entries of . Finally, this ansatz for the form of the solution is applied to (4.4) and iterated until the non-zero entries of are preserved by (4.4). Through this process one observes that the value of each does not affect the solution.

### 4.1. Concentration

In this section we show that concentrates around its expectation.

We introduce -nets as a convenient way to discretize a compact set. Let . A set is an -net of a set if for any , there exists such that . The following estimate for the maximum size of an -net is well-known and follows from a standard volume argument (see, for example, [43, Lemma 3.11]).

###### Lemma 4.1 (Lemma 3.11 from ).

Let be a compact subset of . Then admits an -net of size at most

 (1+2Mε)2.
###### Lemma 4.2.

Let . Under the assumptions of Theorem 2.4, a.s.

 sup|z|≤Msup|η|≤M,Im(η)≥N−1/8∥ΓN(q)−EΓN(q)∥=OM(N−1/8).
###### Proof.

By the Borel-Cantelli lemma, it suffices to show that

 P(sup|z|≤Msup|η|≤M,Im(η)≥N−1/8∥ΓN(q)−EΓN(q)∥≥CN−1/8)≤1N2,

for some constant .

Let and be -nets of and respectively. By Lemma 4.1,

 |N1|+|N2|=OM(N2).

Let be the set of all (defined by (4.2)) such that and . Hence . By the resolvent identity,

 ∥ΓN(q)−ΓN(q′)∥≤N1/4∥q−q′∥.

Thus, by a standard -net argument, it suffices to show that

 P(supq∈N∥ΓN(q)−EΓN(q)∥≥N−1/8)≤1N2.

By the union bound and Markov’s inequality, we have, for any ,

 P(supq∈N∥ΓN(q)−EΓN(q)∥≥N−1/8) ≤∑q∈NP(∥ΓN(q)−EΓN(q)∥≥N−1/8) ≤∑q∈NNp/8E∥ΓN(q)−EΓN(q)∥p.

Therefore, it will suffice to show that for some sufficiently large, there exists a constant (depending only on ), such that

 E∥ΓN(q)−EΓN(q)∥p≤KpN3p/8

for any .

In fact, since is a matrix, we will show that, for every ,

 E|ΓabN(q)−EΓabN(q)|p≤KpN3p/8 (4.6)

for any and .

Fix . Let denote the conditional expectation with respect to the first rows and columns of each matrix .

We now rewrite as a martingale difference sequence. Indeed,

 ΓabN(q)−EΓabN(q) =N∑k=1(Ek−Ek−1)ΓabN(q) =N∑k=1(Ek−Ek−1)(ΓabN(q)−Γ(k)abN(q)).

Since is at most rank , the resolvent identity implies that is at most rank and . This then gives the bound

 (4.7)

for any . Thus, by the Burkholder inequality  (see for example [9, Lemma 2.12] for a complex-valued version of the Burkholder inequality), for any ,

 E∣∣ΓabN(q)−EΓabN(q)∣∣p ≤KpE(N∑k=1∣∣Γab