Spectrum of large random reversible Markov chains: Heavytailed weights on the complete graph
Abstract
We consider the random reversible Markov kernel obtained by assigning i.i.d. nonnegative weights to the edges of the complete graph over vertices and normalizing by the corresponding row sum. The weights are assumed to be in the domain of attraction of an stable law, . When , we show that for a suitable regularly varying sequence of index , the limiting spectral distribution of coincides with the one of the random symmetric matrix of the unnormalized weights (Lévy matrix with i.i.d. entries). In contrast, when , we show that the empirical spectral distribution of converges without rescaling to a nontrivial law supported on , whose moments are the return probabilities of the random walk on the Poisson weighted infinite tree (PWIT) introduced by Aldous. The limiting spectral distributions are given by the expected value of the random spectral measure at the root of suitable selfadjoint operators defined on the PWIT. This characterization is used together with recursive relations on the tree to derive some properties of and . We also study the limiting behavior of the invariant probability measure of .
10.1214/10AOP587 \volume39 \issue4 2011 \firstpage1544 \lastpage1590 \newproclaimdefi[theorem]Definition \newproclaimrem[theorem]Remark
Heavytailed weights on the complete graph
A]\fnmsCharles \snmBordenavelabel=e1]charles.bordenave(at)math.univtoulouse.frlabel=u1,url]http://www.math.univtoulouse.fr/~bordenave/, C]\fnmsPietro \snmCaputo\thanksreft1label=e2]caputo(at)mat.uniroma3.itlabel=u2,url]http://www.mat.uniroma3.it/users/caputo/ and B]\fnmsDjalil \snmChafaï\correflabel=e3]djalil(at)chafai.netlabel=u3,url]http://djalil.chafai.net/
t1Supported in part by NSF Grant DMS0301795 and by the European Research Council through the “Advanced Grant” PTRELSS 228032.
class=AMS] \kwd47A10 \kwd15A52 \kwd60K37 \kwd05C80. Spectral theory \kwdobjective method \kwdoperator convergence \kwdstochastic matrices \kwdrandom matrices \kwdreversible Markov chains \kwdrandom walks \kwdrandom graphs \kwdprobability on trees \kwdrandom media \kwdheavytailed distributions \kwdstable laws \kwdPoisson–Dirichlet laws \kwdpoint processes \kwdeigenvectors.
1 Introduction
Let denote the complete graph with vertex set , and edge set , including loops , . Assign a nonnegative random weight (or conductance) to each edge , and assume that the symmetric weights are i.i.d. with common law independent of . This defines a random network, or weighted graph, denoted . Next, consider the random walk on defined by the transition probabilities
(1) 
The Markov kernel is reversible with respect to the measure in that
for all . Note that we have not assumed that has no atom at . If for some , then for that index we set , . However, as soon as is not concentrated at then almost surely, for all sufficiently large, for all , is irreducible and aperiodic and is its unique invariant measure, up to normalization (see, e.g., bordenavecaputochafai ()).
For any square matrix with eigenvalues , the Empirical Spectral Distribution (ESD) is the discrete probability measure with at most atoms defined by
All matrices to be considered in this work have real spectrum, and the eigenvalues will be labeled in such a way that .
Note that defines a square random Markov matrix whose entries are not independent due the normalizing sums . By reversibility, is selfadjoint in and its spectrum is real. Moreover, , and . Since is Markov, its ESD carries further probabilistic content. Namely, for any , if denotes the probability that the random walk on started at returns to after steps, then the th moment of satisfies
(2) 
Convergence of the ESD
The asymptotic behavior of as depends strongly on the tail of at infinity. When has finite mean we set . This is no loss of generality since is invariant under the dilation . If has a finite second moment we write for the variance.
The following result, from bordenavecaputochafai (), states that if , then the bulk of the spectrum of behaves, when , as if we had truly i.i.d. entries (Wigner matrix). Without loss of generality, we assume that the weights come from the truncation of a unique infinite table of i.i.d. random variables of law . This gives a meaning to the almost sure (a.s.) convergence of . The symbol denotes weak convergence of measures with respect to continuous bounded functions. Note that .
Theorem 1.1 ((Wignerlike behavior))
If has variance , then a.s.
(3) 
where is the Wigner semicircle law on . Moreover, if has finite fourth moment, then and converge a.s. to the edge of the limiting support .
This Wignerlike scenario can be dramatically altered if we allow to have a heavy tail at infinity. For any , we say that belongs to the class if is supported in and has a regularly varying tail of index , that is, for all ,
(4) 
where is a function with slow variation at ; that is, for any ,
Set . Then as , and
(5) 
It is well known that has regular variation at with index , that is,
for some function with slow variation at (see, e.g., Resnick resnick (), Section 2.2.1). As an example, if is uniformly distributed on the interval , then for every , the law of , supported in , belongs to . In this case, for , and .
To understand the limiting behavior of the spectrum of in the heavytailed case it is important to consider first the symmetric i.i.d. matrix corresponding to the unnormalized weights . More generally, we introduce the random symmetric matrix defined by
(6) 
where are i.i.d. such that has law in with , and
(7) 
It is well known that, for , a random variable is in the domain of attraction of an stable law iff the law of is in and the limit (7) exists (cf. Feller (), Theorem IX.8.1a). It will be useful to view the entries in (6) as the marks across edge of a random network , just as the marks defined the network introduced above.
Remarkable works have been devoted recently to the asymptotic behavior of the ESD of matrices defined by (6), sometimes called Lévy matrices. The analysis of the Limiting Spectral Distribution (LSD) for is considerably harder than the finite second moment case (Wigner matrices), and the LSD is nonexplicit. Theorem 1.2 below has been investigated by the physicists Bouchaud and Cizeau BouchaudCizeau () and rigorously proved by Ben Arous and Guionnet benarousguionnet (), and Belinschi, Dembo and Guionnet belinschi () (see also Zakharevich zakharevich () for related results).
Theorem 1.2 ([Symmetric i.i.d. matrix, ])
In Section 3.2, we give a new independent proof of Theorem 1.2. The key idea of our proof is to exhibit a limiting selfadjoint operator for the sequence of matrices , defined on a suitable Hilbert space, and then use known spectral convergence theorems of operators. The limiting operator will be defined as the “adjacency matrix” of an infinite rooted tree with random edge weights, the socalled Poisson weighted infinite tree (PWIT) introduced by Aldous aldous92 () (see also aldoussteele ()). In other words, the PWIT will be shown to be the local weak limit of the random network when the edge marks are rescaled by . In this setting the LSD arises as the expected value of the (random) spectral measure of the operator at the root of the tree. The PWIT and the limiting operator are defined in Section 2. Our method of proof can be seen as a variant of the resolvent method, based on local convergence of operators. It is also well suited to investigate properties of the LSD (cf. Theorem 1.6 below).
Let us now come back to our random reversible Markov kernel defined by (1) from weights with law . We obtain different limiting behavior in the two regimes and . The case corresponds to a Wignertype behavior (special case of Theorem 1.1). We set
Theorem 1.3 ([Reversible Markov matrix, ])
Let be the probability distribution which appears as the LSD in the symmetric i.i.d. case (Theorem 1.2). If with then a.s.
Theorem 1.4 ([Reversible Markov matrix, ])
For every , there exists a symmetric probability distribution supported on depending only on such that a.s.
The proofs of Theorems 1.3 and 1.4 are given in Sections 3.3 and 3.1, respectively. As in the proof of Theorem 1.2, the main idea is to exploit convergence of our matrices to suitable operators defined on the PWIT. To understand the scaling in Theorem 1.3, we recall that if , then by the strong law of large numbers, we have a.s. for every row sum , and this is shown to remove, in the limit , all dependencies in the matrix , so that we obtain the same behavior of the i.i.d. matrix of Theorem 1.2. On the other hand, when , both the sum and the maximum of its elements are on scale . The proof of Theorem 1.4 shows that the matrix converges (without rescaling) to a random stochastic selfadjoint operator defined on the PWIT. The operator can be described as the transition matrix of the simple random walk on the PWIT and is naturally linked to Poisson–Dirichlet random variables. This is based on the observation that the order statistics of any given row of the matrix converges weakly to the Poisson–Dirichlet law (see Lemma 2.3 below for the details). In fact, the operator provides an interesting generalization of the Poisson–Dirichlet law.
Since is supported in , (2) and Theorem 1.4 imply that for all , a.s.
(8) 
The LSD will be obtained as the expectation of the (random) spectral measure of at the root of the PWIT. It will follow that (the th moment of ) is the expected value of the (random) probability that the random walk returns to the root in steps. In particular, the symmetry of follows from the bipartite nature of the PWIT.
It was proved by Ben Arous and Guionnet benarousguionnet (), Remark 1.5, that is continuous with respect to weak convergence of probability measures, and by Belinschi, Dembo and Guionnet belinschi (), Remark 1.2 and Lemma 5.2, that tends to the Wigner semicircle law as . We believe that Theorem 1.3 should remain valid for with LSD given by the Wigner semicircle law. Further properties of the measures and are discussed below.
The case is qualitatively similar to the case with the difference that the sequence in Theorem 1.3 has to be replaced by where
(9) 
Indeed, here the mean of may be infinite and the closest one gets to a law of large numbers is the statement that in probability (see Section 3.4). The sequence (and therefore ) is known to be slowly varying at for (see, e.g., Feller Feller (), VIII.8). The following mild condition will be assumed: There exists such that
(10) 
For example, if is uniform on , then and. In the next theorem stands for the LSD from Theorem 1.2, at .
Theorem 1.5 ((Reversible Markov matrix, ))
Suppose that with and assume (10). If is the ESD of , with , then, as , a.s. .
Properties of the LSD
In Section 4 we prove some properties of the LSDs and .
Theorem 1.6 ((Properties of ))
Statements (i) and (ii) answer some questions raised in benarousguionnet (), belinschi (). Statement (iii) is already contained in belinschi (), Theorem 1.7, but we provide a new proof based on a Tauberian theorem for the Cauchy–Stieltjes transform that may be of independent interest.
Theorem 1.7 ((Properties of ))
It is delicate to provide liable numerical simulations of the ESDs.
Invariant measure and edge behavior
Finally, we turn to the analysis of the invariant probability distribution for the random walk on . This is obtained by normalizing the vector of row sums
Following bordenavecaputochafai (), Lemma 2.2, if , then as a.s. This uniform strong law of large numbers does not hold in the heavytailed case : the large behavior of is then dictated by the largest weights in the system.
Below we use the notation for the ranked values of , so that and their sum is . The symbol denotes convergence in distribution. We refer to Section 2.4 for more details on weak convergence in the space of ranked sequences and for the definition of the Poisson–Dirichlet law .
Theorem 1.8 ((Invariant probability measure))
Suppose that .

[(ii)]

If , then
(11) where stands for a Poisson–Dirichlet random vector.

If , then
(12) where denote the ranked points of the Poisson point process on with intensity measure . Moreover, the same convergence holds for provided the sequence is replaced by , with as in (9).
Theorem 1.8 is proved in Section 5. These results will be derived from the statistics of the ranked values of the weights , , on the scale (diagonal weights are easily seen to give negligible contributions). The duplication in the sequences in (12) and (11) then comes from the fact that each of the largest weights belongs to two distinct rows and determines alone the limiting value of the associated row sum.
Theorem 1.8 is another indication that the random walk with transition matrix shares the features of a trap model. Loosely speaking, instead of being trapped at a vertex, as in the usual mean field trap models (see Bouchaud (), BenArousCerny (), MR2152251 (), MR2435851 ()) here the walker is trapped at an edge.
Large edge weights are responsible for the large eigenvalues of . This phenomenon is well understood in the case of symmetric random matrices with i.i.d. entries, where it is known that, for , the edge of the spectrum gives rise to a Poisson statistics (see MR2081462 (), auffingerbenarouspeche ()). The behavior of the extremal eigenvalues of when has finite fourth moment has been studied in bordenavecaputochafai (). In particular, it is shown there that the spectral gap is . In the present case of heavytailed weights, in contrast, by localization on the largest edge weight it is possible to prove that, a.s. and up to corrections with slow variation at ,
(13) 
Similarly, for one has that is bounded below by . Understanding the statistics of the extremal eigenvalues remains an interesting open problem.
2 Convergence to the Poisson weighted infinite tree
The aim of this section is to prove that the matrices and appearing in Theorems 1.2, 1.3 and 1.4, when properly rescaled, converge “locally” to a limiting operator defined on the Poisson weighted infinite tree (PWIT). The concept of local convergence of operators is defined below. We first recall the standard construction of the PWIT.
2.1 The PWIT
Given a Radon measure on , is the random rooted tree defined as follows. The vertex set of the tree is identified with by indexing the root as , the offsprings of the root as and, more generally, the offsprings of some as [for short notation, we write in place of ]. In this way the set of identifies the th generation.
We now assign marks to the edges of the tree according to a collection of independent realizations of the Poisson point process with intensity measure on . Namely, starting from the root , let be ordered in such a way that and assign the mark to the offspring of the root labeled . Now, recursively, at each vertex of generation , assign the mark to the offspring labeled , where satisfy
2.2 Local operator convergence
We give a general formulation and later specialize to our setting. Let be a countable set, and let denote the Hilbert space defined by the scalar product
where and denote the unit vector with support . Let denote the dense subset of of vectors with finite support. {defi}[(Local convergence)] Suppose is a sequence of bounded operators on , and is a closed linear operator on with dense domain . Suppose further that is a core for (i.e., the closure of restricted to equals ). For any we say that converges locally to and write
if there exists a sequence of bijections such that and, for all ,
in , as .
In other words, this is the standard strong convergence of operators up to a reindexing of which preserves a distinguished element. With a slight abuse of notation we have used the same symbol for the linear isometry induced in the obvious way, that is, such that for all . The point for introducing Definition 2.2 lies in the following theorem on strong resolvent convergence. Recall that if is a selfadjoint operator its spectrum is real, and for all , the operator is invertible with bounded inverse. The operatorvalued function is the resolvent of .
Theorem 2.1 ((From local convergence to resolvents))
If and are selfadjoint operators that satisfy the conditions of Definition 2.2 and for some , then, for all ,
(14) 
It is a special case of reedsimon (), Theorem VIII.25(a). Indeed, if we define , then for all in a common core of the selfadjoint operators . This implies the strong resolvent convergence, that is, for any , . The conclusion follows by taking the scalar product
We shall apply the above theorem in cases where the operators and are random operators on , which satisfy with probability one the conditions of Definition 2.2. In this case we say that in distribution if there exists a random bijection as in Definition 2.2 such that converges in distribution to , for all [where a random vector converges in distribution to if
for all bounded continuous functions ]. Under these assumptions then (14) becomes convergence in distribution of (bounded) complex random variables. In our setting the Hilbert space will be , with , the vertex set of the PWIT, the operator will be a rescaled version of the matrix defined by (6) or the matrix defined by (1). The operator will be the corresponding limiting operator defined below.
2.3 Limiting operators
Let be as in Theorem 1.2, and let be the positive Borel measure on the real line defined by . Consider a realization of . As before the mark from vertex to is denoted by . We note that almost surely
(15) 
since a.s. and converges for . Recall that for , is the dense set of of vectors with finite support. We may a.s. define a linear operator by letting, for ,
Note that if every edge in the tree with mark is given the “weight” then we may look at the operator as the “adjacency matrix” of the weighted tree. Clearly, is symmetric, and therefore it has a closed extension with domain such that (see, e.g., reedsimon (), Chapter VIII, Section 2). We will prove in Proposition A.2 below that is essentially selfadjoint, that is, the closure of is selfadjoint. With a slight abuse of notation, we identify with its closed extension. As stated below, is the weak local limit of the sequence of i.i.d. matrices , where is defined by (6). To this end we view the matrix as an operator in by setting , where denote the labels of the offsprings of the root (the first generation), with the convention that when either or , and by setting when either or does not belong to the first generation.
Similarly, taking now , in the case of Markov matrices defined by (1), for , is the local limit operator of . To work directly with symmetric operators we introduce the symmetric matrix
(17) 
which is easily seen to have the same spectrum of (see, e.g., bordenavecaputochafai (), Lemma 2.1). Again the matrix can be embedded in the infinite tree as described above for .
In the case the Markov matrix has a different limiting object that is defined as follows. Consider a realization of , where is the Lebesgue measure on . We define an operator corresponding to the random walk on this tree with conductance equal to the mark to the power . More precisely, for , let
with the convention that . Since a.s. , is almost surely finite for . We define the linear operator on , by letting, for ,
(18) 
Note that is not symmetric, but it becomes symmetric in the weighted Hilbert space defined by the scalar product
Moreover, on , is a bounded selfadjoint operator since Schwarz’s inequality implies
so that the operator norm of is less than or equal to . To work with selfadjoint operators in the unweighted Hilbert space we shall actually consider the operator defined by
(19) 
This defines a bounded selfadjoint operator in . Indeed, the map induces a linear isometry such that
(20) 
for all . In this way, when , will be the limiting operator associated with the matrix defined in (17). Note that no rescaling is needed here. The main result of this section is the following.
Theorem 2.2 ((Limiting operators))
As goes to infinity, in distribution:

[(iii)]

if and , then ;

if and , then ;

if , then .
From the remark after Theorem 2.1 we see that Theorem 2.2 implies convergence in distribution of the resolvent at the root. As we shall see in Section 3, this in turn gives convergence of the expected values of the Cauchy–Stieltjes transform of the ESD of our matrices. The rest of this section is devoted to the proof of Theorem 2.2.
2.4 Weak convergence of a single row
In this paragraph, we recall some facts about the order statistics of the first row of the matrix and , that is,
where has law . Let us denote by the order statistics of the variables , . Recall that . Let us define for and . Call the set of sequences with such that , and let be the subset of sequences satisfying . We shall view
as elements of and , respectively, simply by adding zeros to the right of and . Equipped with the standard product metric, and are complete separable metric spaces ( is compact), and convergence in distribution for valued random variables is equivalent to finitedimensional convergence (cf., e.g., Bertoin bertoin06 ()).
Let denote i.i.d. exponential variables with mean and write . We define the random variable in
The law of is the law of the ordered points of a Poisson process on with intensity measure . For we define the variable in
For the sum is a.s. finite. The law of in is called the Poisson–Dirichlet law (see Pitman and Yor MR1434129 (), Proposition 10). The next result is rather standard but we give a simple proof for convenience.
Lemma 2.3 ((Poisson–Dirichlet laws and Poisson point processes))

[(iii)]

For all , converges in distribution to . Moreover, for , is a.s. uniformly square integrable, that is, a.s..

If , converges in distribution to . Moreover, is a.s. uniformly integrable, that is, a.s. .

If is a finite set and denote the order statistics of then (i) and (ii) hold with and .
As an example, from (i), we retrieve the wellknown fact that for any , the random variable converges weakly as to the law of . This law, known as a Fréchet law, has density on . {pf*}Proof of Lemma 2.3 As in LePage, Woodroofe and Zinn zinn81 () we take advantage of the following wellknown representation for the order statistics of i.i.d. random variables. Let be the function in (4) and write
. We have that equals in distribution the vector
(21) 
where has been defined above. To prove (i) we start from the distributional identity
which follows from (21). It suffices to prove that for every , almost surely the first terms above converge to the first terms in . Thanks to (5), almost surely, for every ,
(22) 
and the convergence in distribution of to follows. Moreover, from (5), for any we can find such that
for , . Since , a.s. we see that the expression above is a.s. bounded by , for sufficiently large, and the second part of (i) follows from a.s. summability of