Spectrum of large random reversible Markov chains: Heavy-tailed weights on the complete graph

Spectrum of large random reversible Markov chains: Heavy-tailed weights on the complete graph

[ [[    [ [[    [ [[ Université de Toulouse III, Università Roma Tre and Université Paris-Est Marne-la-Vallée C. Bordenave
IMT UMR5219 CNRS
Université de Toulouse III
France
\printeade1
\printeadu1
D. Chafaï
LAMA UMR8050 CNRS
Université Paris-Est Marne-la-Vallée
France
\printeade3
\printeadu3
P. Caputo
Dipartimento di Matematica
Università Roma Tre
Italy
\printeade2
\printeadu2
\smonth4 \syear2009\smonth5 \syear2010
\smonth4 \syear2009\smonth5 \syear2010
\smonth4 \syear2009\smonth5 \syear2010
Abstract

We consider the random reversible Markov kernel obtained by assigning i.i.d. nonnegative weights to the edges of the complete graph over vertices and normalizing by the corresponding row sum. The weights are assumed to be in the domain of attraction of an -stable law, . When , we show that for a suitable regularly varying sequence of index , the limiting spectral distribution of coincides with the one of the random symmetric matrix of the un-normalized weights (Lévy matrix with i.i.d. entries). In contrast, when , we show that the empirical spectral distribution of converges without rescaling to a nontrivial law supported on , whose moments are the return probabilities of the random walk on the Poisson weighted infinite tree (PWIT) introduced by Aldous. The limiting spectral distributions are given by the expected value of the random spectral measure at the root of suitable self-adjoint operators defined on the PWIT. This characterization is used together with recursive relations on the tree to derive some properties of and . We also study the limiting behavior of the invariant probability measure of .

[
\kwd
\doi

10.1214/10-AOP587 \volume39 \issue4 2011 \firstpage1544 \lastpage1590 \newproclaimdefi[theorem]Definition \newproclaimrem[theorem]Remark

\runtitle

Heavy-tailed weights on the complete graph

{aug}

A]\fnmsCharles \snmBordenavelabel=e1]charles.bordenave(at)math.univ-toulouse.frlabel=u1,url]http://www.math.univ-toulouse.fr/~bordenave/, C]\fnmsPietro \snmCaputo\thanksreft1label=e2]caputo(at)mat.uniroma3.itlabel=u2,url]http://www.mat.uniroma3.it/users/caputo/ and B]\fnmsDjalil \snmChafaï\correflabel=e3]djalil(at)chafai.netlabel=u3,url]http://djalil.chafai.net/

\thankstext

t1Supported in part by NSF Grant DMS-03-01795 and by the European Research Council through the “Advanced Grant” PTRELSS 228032.

class=AMS] \kwd47A10 \kwd15A52 \kwd60K37 \kwd05C80. Spectral theory \kwdobjective method \kwdoperator convergence \kwdstochastic matrices \kwdrandom matrices \kwdreversible Markov chains \kwdrandom walks \kwdrandom graphs \kwdprobability on trees \kwdrandom media \kwdheavy-tailed distributions \kwd-stable laws \kwdPoisson–Dirichlet laws \kwdpoint processes \kwdeigenvectors.

1 Introduction

Let denote the complete graph with vertex set , and edge set , including loops , . Assign a nonnegative random weight (or conductance) to each edge , and assume that the symmetric weights are i.i.d. with common law independent of . This defines a random network, or weighted graph, denoted . Next, consider the random walk on defined by the transition probabilities

(1)

The Markov kernel is reversible with respect to the measure in that

for all . Note that we have not assumed that has no atom at . If for some , then for that index we set , . However, as soon as is not concentrated at then almost surely, for all sufficiently large, for all , is irreducible and aperiodic and is its unique invariant measure, up to normalization (see, e.g., bordenave-caputo-chafai ()).

For any square matrix with eigenvalues , the Empirical Spectral Distribution (ESD) is the discrete probability measure with at most atoms defined by

All matrices to be considered in this work have real spectrum, and the eigenvalues will be labeled in such a way that .

Note that defines a square random Markov matrix whose entries are not independent due the normalizing sums . By reversibility, is self-adjoint in and its spectrum is real. Moreover, , and . Since is Markov, its ESD carries further probabilistic content. Namely, for any , if denotes the probability that the random walk on started at returns to after steps, then the th moment of satisfies

(2)

Convergence of the ESD

The asymptotic behavior of as depends strongly on the tail of at infinity. When has finite mean we set . This is no loss of generality since is invariant under the dilation . If has a finite second moment we write for the variance.

The following result, from bordenave-caputo-chafai (), states that if , then the bulk of the spectrum of behaves, when , as if we had truly i.i.d. entries (Wigner matrix). Without loss of generality, we assume that the weights come from the truncation of a unique infinite table of i.i.d. random variables of law . This gives a meaning to the almost sure (a.s.) convergence of . The symbol denotes weak convergence of measures with respect to continuous bounded functions. Note that .

Theorem 1.1 ((Wigner-like behavior))

If has variance , then a.s.

(3)

where is the Wigner semi-circle law on . Moreover, if has finite fourth moment, then and converge a.s. to the edge of the limiting support .

This Wigner-like scenario can be dramatically altered if we allow to have a heavy tail at infinity. For any , we say that belongs to the class if is supported in and has a regularly varying tail of index , that is, for all ,

(4)

where is a function with slow variation at ; that is, for any ,

Set . Then as , and

(5)

It is well known that has regular variation at with index , that is,

for some function with slow variation at (see, e.g., Resnick resnick (), Section 2.2.1). As an example, if is uniformly distributed on the interval , then for every , the law of , supported in , belongs to . In this case, for , and .

To understand the limiting behavior of the spectrum of in the heavy-tailed case it is important to consider first the symmetric i.i.d. matrix corresponding to the un-normalized weights . More generally, we introduce the random symmetric matrix defined by

(6)

where are i.i.d. such that has law in with , and

(7)

It is well known that, for , a random variable is in the domain of attraction of an -stable law iff the law of is in and the limit (7) exists (cf. Feller (), Theorem IX.8.1a). It will be useful to view the entries in (6) as the marks across edge of a random network , just as the marks defined the network introduced above.

Remarkable works have been devoted recently to the asymptotic behavior of the ESD of matrices defined by (6), sometimes called Lévy matrices. The analysis of the Limiting Spectral Distribution (LSD) for is considerably harder than the finite second moment case (Wigner matrices), and the LSD is nonexplicit. Theorem 1.2 below has been investigated by the physicists Bouchaud and Cizeau BouchaudCizeau () and rigorously proved by Ben Arous and Guionnet benarous-guionnet (), and Belinschi, Dembo and Guionnet belinschi () (see also Zakharevich zakharevich () for related results).

Theorem 1.2 ([Symmetric i.i.d. matrix, ])

For every , there exists a symmetric probability distribution on depending only on such that [with the notation of (5) and (6)] a.s.

In Section 3.2, we give a new independent proof of Theorem 1.2. The key idea of our proof is to exhibit a limiting self-adjoint operator for the sequence of matrices , defined on a suitable Hilbert space, and then use known spectral convergence theorems of operators. The limiting operator will be defined as the “adjacency matrix” of an infinite rooted tree with random edge weights, the so-called Poisson weighted infinite tree (PWIT) introduced by Aldous aldous92 () (see also aldoussteele ()). In other words, the PWIT will be shown to be the local weak limit of the random network when the edge marks are rescaled by . In this setting the LSD arises as the expected value of the (random) spectral measure of the operator at the root of the tree. The PWIT and the limiting operator are defined in Section 2. Our method of proof can be seen as a variant of the resolvent method, based on local convergence of operators. It is also well suited to investigate properties of the LSD (cf. Theorem 1.6 below).

Let us now come back to our random reversible Markov kernel defined by (1) from weights with law . We obtain different limiting behavior in the two regimes and . The case corresponds to a Wigner-type behavior (special case of Theorem 1.1). We set

Theorem 1.3 ([Reversible Markov matrix, ])

Let be the probability distribution which appears as the LSD in the symmetric i.i.d. case (Theorem 1.2). If with then a.s.

Theorem 1.4 ([Reversible Markov matrix, ])

For every , there exists a symmetric probability distribution supported on depending only on such that a.s.

The proofs of Theorems 1.3 and 1.4 are given in Sections 3.3 and 3.1, respectively. As in the proof of Theorem 1.2, the main idea is to exploit convergence of our matrices to suitable operators defined on the PWIT. To understand the scaling in Theorem 1.3, we recall that if , then by the strong law of large numbers, we have a.s. for every row sum , and this is shown to remove, in the limit , all dependencies in the matrix , so that we obtain the same behavior of the i.i.d. matrix of Theorem 1.2. On the other hand, when , both the sum and the maximum of its elements are on scale . The proof of Theorem 1.4 shows that the matrix converges (without rescaling) to a random stochastic self-adjoint operator defined on the PWIT. The operator can be described as the transition matrix of the simple random walk on the PWIT and is naturally linked to Poisson–Dirichlet random variables. This is based on the observation that the order statistics of any given row of the matrix converges weakly to the Poisson–Dirichlet law (see Lemma 2.3 below for the details). In fact, the operator provides an interesting generalization of the Poisson–Dirichlet law.

Since is supported in , (2) and Theorem 1.4 imply that for all , a.s.

(8)

The LSD will be obtained as the expectation of the (random) spectral measure of at the root of the PWIT. It will follow that (the th moment of ) is the expected value of the (random) probability that the random walk returns to the root in -steps. In particular, the symmetry of follows from the bipartite nature of the PWIT.

It was proved by Ben Arous and Guionnet benarous-guionnet (), Remark 1.5, that is continuous with respect to weak convergence of probability measures, and by Belinschi, Dembo and Guionnet belinschi (), Remark 1.2 and Lemma 5.2, that tends to the Wigner semi-circle law as . We believe that Theorem 1.3 should remain valid for with LSD given by the Wigner semi-circle law. Further properties of the measures and are discussed below.

The case is qualitatively similar to the case with the difference that the sequence in Theorem 1.3 has to be replaced by where

(9)

Indeed, here the mean of may be infinite and the closest one gets to a law of large numbers is the statement that in probability (see Section 3.4). The sequence (and therefore ) is known to be slowly varying at for (see, e.g., Feller Feller (), VIII.8). The following mild condition will be assumed: There exists such that

(10)

For example, if is uniform on , then and. In the next theorem stands for the LSD from Theorem 1.2, at .

Theorem 1.5 ((Reversible Markov matrix, ))

Suppose that with and assume (10). If is the ESD of , with , then, as , a.s. .

Properties of the LSD

In Section 4 we prove some properties of the LSDs and .

Theorem 1.6 ((Properties of ))

Let be the symmetric LSD in Theorems 1.2 and 1.3.

  1. [(iii)]

  2. is absolutely continuous on .

  3. The density of at is equal to

  4. is heavy tailed, and as goes to ,

Statements (i) and (ii) answer some questions raised in benarous-guionnet (), belinschi (). Statement (iii) is already contained in belinschi (), Theorem 1.7, but we provide a new proof based on a Tauberian theorem for the Cauchy–Stieltjes transform that may be of independent interest.

Theorem 1.7 ((Properties of ))

Let be the symmetric LSD in Theorem 1.4, with moments as in (8). Then the following statements hold true.

  1. [(iii)]

  2. For , there exists such that

    Moreover, we have .

  3. For the topology of the weak convergence, the map is continuous in .

  4. For the topology of the weak convergence,

It is delicate to provide liable numerical simulations of the ESDs.

Figure 1: Histograms of scaled ESDs illustrating the convergence stated by Theorems 1.3 and 1.4, for the following values of , , , , , , , . Here and is the law of where is a uniform random variable on . The first three plots are the histogram of the spectrum of a single realization of . The fourth plot corresponds to and is a histogram of the spectrum of a single realization of . The four last plots are the histogram of the spectrum of a single realization of . In order to avoid scaling problems, an asymptotically negligible portion of the spectrum edge was discarded: only were used.

Nevertheless, Figure 1 provides histograms for various values of and a large value of , illustrating Theorems 1.31.7.

Invariant measure and edge behavior

Finally, we turn to the analysis of the invariant probability distribution for the random walk on . This is obtained by normalizing the vector of row sums

Following bordenave-caputo-chafai (), Lemma 2.2, if , then as a.s. This uniform strong law of large numbers does not hold in the heavy-tailed case : the large behavior of is then dictated by the largest weights in the system.

Below we use the notation for the ranked values of , so that and their sum is . The symbol denotes convergence in distribution. We refer to Section 2.4 for more details on weak convergence in the space of ranked sequences and for the definition of the Poisson–Dirichlet law .

Theorem 1.8 ((Invariant probability measure))

Suppose that .

  1. [(ii)]

  2. If , then

    (11)

    where stands for a Poisson–Dirichlet random vector.

  3. If , then

    (12)

    where denote the ranked points of the Poisson point process on with intensity measure . Moreover, the same convergence holds for provided the sequence is replaced by , with as in (9).

Theorem 1.8 is proved in Section 5. These results will be derived from the statistics of the ranked values of the weights , , on the scale (diagonal weights are easily seen to give negligible contributions). The duplication in the sequences in (12) and (11) then comes from the fact that each of the largest weights belongs to two distinct rows and determines alone the limiting value of the associated row sum.

Theorem 1.8 is another indication that the random walk with transition matrix shares the features of a trap model. Loosely speaking, instead of being trapped at a vertex, as in the usual mean field trap models (see Bouchaud (), BenArousCerny (), MR2152251 (), MR2435851 ()) here the walker is trapped at an edge.

Large edge weights are responsible for the large eigenvalues of . This phenomenon is well understood in the case of symmetric random matrices with i.i.d. entries, where it is known that, for , the edge of the spectrum gives rise to a Poisson statistics (see MR2081462 (), auffinger-benarous-peche ()). The behavior of the extremal eigenvalues of when has finite fourth moment has been studied in bordenave-caputo-chafai (). In particular, it is shown there that the spectral gap is . In the present case of heavy-tailed weights, in contrast, by localization on the largest edge weight it is possible to prove that, a.s. and up to corrections with slow variation at ,

(13)

Similarly, for one has that is bounded below by . Understanding the statistics of the extremal eigenvalues remains an interesting open problem.

2 Convergence to the Poisson weighted infinite tree

The aim of this section is to prove that the matrices and appearing in Theorems 1.21.3 and 1.4, when properly rescaled, converge “locally” to a limiting operator defined on the Poisson weighted infinite tree (PWIT). The concept of local convergence of operators is defined below. We first recall the standard construction of the PWIT.

2.1 The PWIT

Given a Radon measure on , is the random rooted tree defined as follows. The vertex set of the tree is identified with by indexing the root as , the offsprings of the root as and, more generally, the offsprings of some as [for short notation, we write in place of ]. In this way the set of identifies the th generation.

We now assign marks to the edges of the tree according to a collection of independent realizations of the Poisson point process with intensity measure on . Namely, starting from the root , let be ordered in such a way that and assign the mark to the offspring of the root labeled . Now, recursively, at each vertex of generation , assign the mark to the offspring labeled , where satisfy

2.2 Local operator convergence

We give a general formulation and later specialize to our setting. Let be a countable set, and let denote the Hilbert space defined by the scalar product

where and denote the unit vector with support . Let denote the dense subset of of vectors with finite support. {defi}[(Local convergence)] Suppose is a sequence of bounded operators on , and is a closed linear operator on with dense domain . Suppose further that is a core for (i.e., the closure of restricted to equals ). For any we say that converges locally to and write

if there exists a sequence of bijections such that and, for all ,

in , as .

In other words, this is the standard strong convergence of operators up to a re-indexing of which preserves a distinguished element. With a slight abuse of notation we have used the same symbol for the linear isometry induced in the obvious way, that is, such that for all . The point for introducing Definition 2.2 lies in the following theorem on strong resolvent convergence. Recall that if is a self-adjoint operator its spectrum is real, and for all , the operator is invertible with bounded inverse. The operator-valued function is the resolvent of .

Theorem 2.1 ((From local convergence to resolvents))

If and are self-adjoint operators that satisfy the conditions of Definition 2.2 and for some , then, for all ,

(14)
{pf}

It is a special case of reedsimon (), Theorem VIII.25(a). Indeed, if we define , then for all in a common core of the self-adjoint operators . This implies the strong resolvent convergence, that is, for any , . The conclusion follows by taking the scalar product

\upqed

We shall apply the above theorem in cases where the operators and are random operators on , which satisfy with probability one the conditions of Definition 2.2. In this case we say that in distribution if there exists a random bijection as in Definition 2.2 such that converges in distribution to , for all [where a random vector converges in distribution to if

for all bounded continuous functions ]. Under these assumptions then (14) becomes convergence in distribution of (bounded) complex random variables. In our setting the Hilbert space will be , with , the vertex set of the PWIT, the operator will be a rescaled version of the matrix defined by (6) or the matrix defined by (1). The operator will be the corresponding limiting operator defined below.

2.3 Limiting operators

Let be as in Theorem 1.2, and let be the positive Borel measure on the real line defined by . Consider a realization of . As before the mark from vertex to is denoted by . We note that almost surely

(15)

since a.s. and converges for . Recall that for , is the dense set of of vectors with finite support. We may a.s. define a linear operator by letting, for ,

Note that if every edge in the tree with mark is given the “weight” then we may look at the operator as the “adjacency matrix” of the weighted tree. Clearly, is symmetric, and therefore it has a closed extension with domain such that (see, e.g., reedsimon (), Chapter VIII, Section 2). We will prove in Proposition A.2 below that is essentially self-adjoint, that is, the closure of is self-adjoint. With a slight abuse of notation, we identify with its closed extension. As stated below, is the weak local limit of the sequence of i.i.d. matrices , where is defined by (6). To this end we view the matrix as an operator in by setting , where denote the labels of the offsprings of the root (the first generation), with the convention that when either or , and by setting when either or does not belong to the first generation.

Similarly, taking now , in the case of Markov matrices defined by (1), for , is the local limit operator of . To work directly with symmetric operators we introduce the symmetric matrix

(17)

which is easily seen to have the same spectrum of (see, e.g., bordenave-caputo-chafai (), Lemma 2.1). Again the matrix can be embedded in the infinite tree as described above for .

In the case the Markov matrix has a different limiting object that is defined as follows. Consider a realization of , where is the Lebesgue measure on . We define an operator corresponding to the random walk on this tree with conductance equal to the mark to the power . More precisely, for , let

with the convention that . Since a.s. , is almost surely finite for . We define the linear operator on , by letting, for ,

(18)

Note that is not symmetric, but it becomes symmetric in the weighted Hilbert space defined by the scalar product

Moreover, on , is a bounded self-adjoint operator since Schwarz’s inequality implies

so that the operator norm of is less than or equal to . To work with self-adjoint operators in the unweighted Hilbert space we shall actually consider the operator defined by

(19)

This defines a bounded self-adjoint operator in . Indeed, the map induces a linear isometry such that

(20)

for all . In this way, when , will be the limiting operator associated with the matrix defined in (17). Note that no rescaling is needed here. The main result of this section is the following.

Theorem 2.2 ((Limiting operators))

As goes to infinity, in distribution:

  1. [(iii)]

  2. if and , then ;

  3. if and , then ;

  4. if , then .

From the remark after Theorem 2.1 we see that Theorem 2.2 implies convergence in distribution of the resolvent at the root. As we shall see in Section 3, this in turn gives convergence of the expected values of the Cauchy–Stieltjes transform of the ESD of our matrices. The rest of this section is devoted to the proof of Theorem 2.2.

2.4 Weak convergence of a single row

In this paragraph, we recall some facts about the order statistics of the first row of the matrix and , that is,

where has law . Let us denote by the order statistics of the variables , . Recall that . Let us define for and . Call the set of sequences with such that , and let be the subset of sequences satisfying . We shall view

as elements of and , respectively, simply by adding zeros to the right of and . Equipped with the standard product metric, and are complete separable metric spaces ( is compact), and convergence in distribution for -valued random variables is equivalent to finite-dimensional convergence (cf., e.g., Bertoin bertoin06 ()).

Let denote i.i.d. exponential variables with mean and write . We define the random variable in

The law of is the law of the ordered points of a Poisson process on with intensity measure . For we define the variable in 

For the sum is a.s. finite. The law of in is called the Poisson–Dirichlet law (see Pitman and Yor MR1434129 (), Proposition 10). The next result is rather standard but we give a simple proof for convenience.

Lemma 2.3 ((Poisson–Dirichlet laws and Poisson point processes))
  1. [(iii)]

  2. For all , converges in distribution to . Moreover, for , is a.s. uniformly square integrable, that is, a.s..

  3. If , converges in distribution to . Moreover, is a.s. uniformly integrable, that is, a.s. .

  4. If is a finite set and denote the order statistics of then (i) and (ii) hold with and .

As an example, from (i), we retrieve the well-known fact that for any , the random variable converges weakly as to the law of . This law, known as a Fréchet law, has density on . {pf*}Proof of Lemma 2.3 As in LePage, Woodroofe and Zinn zinn81 () we take advantage of the following well-known representation for the order statistics of i.i.d. random variables. Let be the function in (4) and write

. We have that equals in distribution the vector

(21)

where has been defined above. To prove (i) we start from the distributional identity

which follows from (21). It suffices to prove that for every , almost surely the first terms above converge to the first terms in . Thanks to (5), almost surely, for every ,

(22)

and the convergence in distribution of to follows. Moreover, from (5), for any we can find such that

for , . Since , a.s. we see that the expression above is a.s. bounded by , for sufficiently large, and the second part of (i) follows from a.s. summability of