The eigenvalues of stochastic blockmodel graphs

# The eigenvalues of stochastic blockmodel graphs

\fnmsMinh \snmTanglabel=e1]minh@jhu.edu [ Johns Hopkins University Department of Applied Mathematics and Statistics,
Johns Hopkins University,
3400 N. Charles St,
Baltimore, MD 21218, USA.
###### Abstract

We derive the joint limiting distribution for the largest eigenvalues of the adjacency matrix for stochastic blockmodel graphs when the number of vertices tends to infinity. We show that, in the limit, these eigenvalues are jointly multivariate normal with bounded covariances. Our result extends the classical result of Füredi and Komlós on the fluctuation of the largest eigenvalue for Erdős-Rényi graphs.

\setattribute

journalname {aug}

## 1 Introduction

The systematic study of eigenvalues of random matrices dates back to the seminal work of Wigner on the semicircle law for Wigner ensembles of symmetric or Hermitean matrices. A random symmetric matrix is said to be a Wigner matrix if, for , the entries are independent mean zero random variables with variance for and . Many important and beautiful results are known for the spectral properties of these matrices, such as universality of the semi-circle law for bulk eigenvalues (Erdos2; Tao2), universality of the Tracy-Widom distribution for the largest eigenvalue (Soshnikov), universality properties of the eigenvectors (tao2012random; knowles), and eigenvalue and eigenvector delocalization (Erdos3).

In contrast, much less is known about the spectral properties of random symmetric matrices where the entries are independent but not necessarily mean zero random variables with possibly heterogeneous variances. Such random matrices arise naturally in many settings, with the most popular example being perhaps the adjacency matrices of (inhomogeneous) independent edge random graphs. In the case when is the adjacency matrix for an Erdős-Rényi graph where the edges are i.i.d. Bernoulli random variables, Arnold and ding10 show that the empirical distribution of the eigenvalues of also converges to a semi-circle law. Meanwhile, the following result of furedi1981eigenvalues shows that the largest eigenvalue of is normally distributed when and for .

###### Theorem 1 (furedi1981eigenvalues).

Let be an symmetric matrix where the are independent (not necessarily identically distributed) random variables uniformly bounded in magnitude by a constant . Assume that for , the have a common expectation and variance . Furthermore, assume that for all . Then the distribution of , the largest eigenvalue of , can be approximated in order by a normal distribution with mean and variance , i.e.,

 λ1(A)−(n−1)μ−vd⟶N(σ2μ,2σ2) (1.1)

as . Furthermore, with probability tending to ,

 maxi≥2|λi(A)|<2σ√n+O(n1/3logn). (1.2)

In the case when is the adjacency matrix of an Erdős-Rényi graph with edge probability , Theorem 1 yields

 λ1(A)−npd⟶N(1−p,2p(1−p))

as .

A natural generalization of Erdős-Rényi random graphs is the notion of stochastic blockmodel graphs (holland) where, given an integer , the for are independent Bernoulli random variables with for some set of cardinality . More specifically, we have the following definition.

###### Definition 1.

Let be a positive integer and let be a non-negative vector in with . Let be symmetric. We say that if the following holds. First, and the are i.i.d. random variables with . Then is a symmetric matrix such that, conditioned on , for all the are independent Bernoulli random variables with .

The stochastic blockmodel is among the most popular generative models for random graphs with community structure; the nodes of such graphs are partitioned into blocks or communities, and the probability of connection between any two nodes is a function of their block assignment. The adjacency matrix of a stochastic blockmodel graph can be viewed as where is a low-rank deterministic matrix and is a generalized Wigner matrix whose elements are independent mean zero random variables with heteregoneous variances. We emphasize that our assumptions on distinguish us from existing results in the literature. For example, peche; knowles_yin; bordenave; pizzo consider finite rank additive perturbations of the random matrix given by under the assumption that is either a Wigner matrix or is sampled from the Gaussian unitary ensembles. Meanwhile, in benaych-georges, the authors assume that or is orthogonally invariant; a symmetric random matrix is orthogonally invariant if its distribution is invariant under similarity transformations whenever is an orthogonal matrix. Finally, in O_Rourke, the entries of are assumed to be from an elliptical family of distributions, i.e., the collection for are i.i.d. according to some random variable with .

The characterization of the empirical distribution of eigenvalues for stochastic blockmodel graphs is of significant interest, but there are only a few available results. In particular, Zhang and avrachenkov derived the Stieltjes transform for the limiting empirical distribution of the bulk eigenvalues for stochastic blockmodel graphs, thereby showing that the empirical distribution of the eigenvalues need not converge to a semicircle law. Zhang and avrachenkov also considered the edge eigenvalues, but their characterization depends upon inverting the Stieltjes transform and thus currently does not yield the limiting distribution for these largest eigenvalues. lei2014 derived the limiting distribution for the largest eigenvalue of a centered and scaled version of . More specifically, lei2014 showed that there is a consistent estimate of such that the matrix with entries has a limiting Tracy-Widom distribution, i.e., converges to Tracy-Widom.

This paper addresses the open question of determining the limiting distribution of the edge eigenvalues of adjacency matrices for stochastic blockmodel graphs. In particular, we extend the result of Füredi and Komlós and show that, in the limit, these eigenvalues are jointly multivariate normal with bounded covariances.

## 2 Main results

We present our result in the more general framework of generalized random dot product graph where is only assumed to be low rank, i.e, we do not require that the entries of takes on a finite number of distinct values. We first define the notion of a (generalized) random dot product graph (young2007random; rubin_delanchy_grdpg).

###### Definition 2 (Generalized random dot product graph).

Let be a positive integer and and be non-negative integers such that . Let denote the diagonal matrix whose first diagonal elements equal and the remaining diagonal entries equal . Let be a subset of such that for all . Let be a distribution taking values in . We say with signature if the following holds. First let and set . Then is a symmetric matrix such that the entries are independent and

 aij∼Bernoulli(X⊤iIp,qXj). (2.1)

We therefore have

 P[A∣X]=∏i≤j(X⊤iIp,qXj)aij(1−X⊤iIp,qXj)(1−aij). (2.2)

When , we say that , i.e., is a random dot product graph.

###### Remark.

Any stochastic blockmodel graph can be represented as a (generalized) random dot product graph where is a mixture of point masses. Indeed, suppose is a matrix and let be the eigendecomposition of . Then, denoting by the rows of , we can define where is the Dirac delta function. The signature is given by the number of positive and negative eigenvalues of , respectively. Similar constructions show that degree-corrected stochastic blockmodel graphs (karrer2011stochastic) and mixed-membership stochastic blockmodel graphs (Airoldi2008) are also special cases of generalized random dot product graphs.

###### Remark.

We note that non-identifiability is an intrinsic property of generalized random dot product graphs. More specifically, if where is a distribution on with signature , then for any matrix such that , we have that is identically distributed to , where denote the distribution of for . A matrix satisfying is said to be an indefinite orthogonal matrix. For the special case of random dot product graphs where , the condition on reduces to that of an orthogonal matrix.

With the above notations in place, we now state our generalization of furedi1981eigenvalues for the generalized random dot product graph setting.

###### Theorem 2.

Let be a -dimensional generalized random dot product graph with signature . Let where and suppose that has simple eigenvalues. Let and for , let and be the -th largest eigenvalues of and (in modulus), respectively. Let and denote the -th largest eigenvalue and associated (unit-norm) eigenvector pair for the matrix . Let and denote by the vector whose elements are

 ηi=1λi(ΔIp,q)E[ξ⊤iΔ−1/2XX⊤Δ−1/2ξi(X⊤Ip,qμ−X⊤Ip,qΔIp,qX)] (2.3)

Also let be the matrix whose elements are

 (2.4)

We then have

 (^λ1−λ1,^λ2−λ2,…,^λd−λd)d⟶MVN(η,Γ)

as .

When is a -dimensional random dot product graph, Theorem 2 simplifies to the following result.

###### Corollary 1.

Let be a -dimensional random dot product graph and suppose that has simple eigenvalues. Let and let and denote the -th largest eigenvalue and associated (unit-norm) eigenvector of . Let and denote by the vector with elements

 ηi=E[ξ⊤iXX⊤ξi(X⊤μ−X⊤ΔX)]λi(Δ)2. (2.5)

and by the matrix whose elements are

 Γij=2λi(Δ)λj(Δ)(E[ξ⊤iXX⊤ξjX]⊤E[ξ⊤iXX⊤ξjX])−2λi(Δ)λj(Δ)tr(E[ξ⊤iXX⊤ξjXX⊤]E[ξ⊤iXX⊤ξjXX⊤]) (2.6)

We then have

 (^λ1−λ1,^λ2−λ2,…,^λd−λd)⟶MVN(η,Γ)

as .

To illustrate Corollary 1, let be an Erdős-Rényi graph with edge probability ; then is the Dirac delta measure at and hence , , and . We thus recover the earlier result of Füredi and Komlós that

 ^λi−np⟶N(1−p,2p(1−p)).

When the eigenvalues of are not all simple eigenvalues, Theorem 2 can be adapted to yield the following result.

###### Theorem 3.

Let be a -dimensional generalized random dot product graph on vertices with signature . Let and for , let and denote the -th largest eigenvalues of and (in modulus), respectively. Also let be the unit norm eigenvector satisfying for . Denote by the vector with elements

 ˜ηi=nλi(1nn∑s=1v⊤i(X⊤Xn)−1/2XsX⊤s(X⊤Xn)−1/2viX⊤sIp,q(1nn∑t=1(Xt−XtX⊤tIp,qXs))) (2.7)

and by the vector whose elements are

 σ2i=2(∑k(X⊤k(X⊤X)−1/2vi)2X⊤k)Ip,q(∑l(X⊤l(X⊤X)−1/2vi)2)−2tr(∑k(X⊤k(X⊤X)−1/2vi)2XkX⊤k)Ip,q(∑l(X⊤l(X⊤X)−1/2vi)2XlX⊤l)Ip,q (2.8)

We then have, for ,

 1σi(^λi−λi−˜ηi)⟶N(0,1)

as .

The main differences between Theorem 3 and Theorem 2 are that (1) we do not claim that the quantities and in Theorem 3 (which, for are functions of the underlying latent positions ) converge as and (2) we do not claim that the collection in Theorem 3 converges jointly to multivariate normal. The above diffences stem mainly from the fact that when the eigenvalues of are not simple eigenvalues, then as but does not necessarily converges to , the corresponding eigenvector of , as .

## 3 Proof of Theorem 2 and Theorem 3

Let be the eigenvectors corresponding to the non-zero eigenvalues of . Similarly, let be the eigenvectors corresponding to the eigenvalues of .

A sketch of the proof of Theorem 2 and Theorem 3 is as follows. First we derive the following approximation of by a sum of two quadratic forms and , namely

 ^λi−λi=λi^λiu⊤i(A−P)ui+λi^λ2iu⊤i(A−P)2ui+OP(n−1/2). (3.1)

Now, the term is a function of the independent random variables and hence is concentrated around its expectation, i.e.,

 λi^λ2iu⊤i(A−P)2ui=E[λ−1iu⊤i(A−P)2ui]+OP(n−1/2) (3.2)

where the expectation is taken with respect to . Letting , we obtain, after some straightforward algebraic manipulations, the expression for in Eq. (2.7). When the eigenvalues of are distinct, we derive the limit where is defined in Eq. (2.3). Next, with being the -th element of ,

 u⊤i(A−P)ui=∑r

is, conditional on , a sum of independent mean random variables and the Lindeberg-Feller central limit theorem yield

 λi^λiσiu⊤i(A−P)uid⟶N(0,1) (3.3)

as , where is as defined in Eq. (2.8). Thus for each , as . When the eigenvalues of are distinct, then as defined in Eq. (2.4). The joint distribution of in Theorem 2 then follows from the Cramer-Wold device.

We now provide detailed derivations of Eq. (3.1) through Eq. (3.3).

Proof of Eq. (3.1) For a given , we have

 (^λiI−(A−P))^ui=A^ui−(A−P)^ui=P^ui=(r∑j=1λjuju⊤j)^ui

Now suppose that is invertible; this holds with high probability for sufficiently large . Then multiplying both sides of the above display by on the left and using the von Neumann identity for , we have

 u⊤i^ui=d∑j=1λju⊤i(^λiI−(A−P))−1uju⊤j^ui=d∑j=1λj^λ−1iu⊤i(I−^λ−1i(A−P))−1uju⊤j^ui=d∑j=1λj^λ−1iu⊤i(I+∞∑k=1^λ−ki(A−P)k)uju⊤j^ui=λi^λiu⊤i(I+∞∑k=1^λ−ki(A−P)k)uiu⊤i^ui+∑j≠iλj^λiu⊤i(∞∑k=1^λ−ki(A−P)k)uju⊤j^ui (3.4)

We first assume that all of the eigenvalues of are simple eigenvalues. The eigenvalues of are then well-separated, i.e., for . The Davis-Kahan theorem (davis70; samworth) therefore implies, for some constant ,

 1−u⊤i^ui=12∥ui−^ui∥2≤C2∥A−P∥2min{|λi−λi+1|2,|λi−1−λi|2}=OP(n−1) (3.5) |u⊤j^ui|≤C∥A−P∥min{|λi−λi+1|,|λi−1−λi|}=OP(n−1/2). (3.6)

We can thus divide both side of the above display by to obtain

 1=λi^λi+λi^λiu⊤i(∞∑k=1^λ−ki(A−P)k)ui+∑j≠iλj^λiu⊤i(∞∑k=1^λ−ki(A−P)k)uju⊤j^uiu⊤i^ui.

Equivalently,

 ^λi−λi=λiu⊤i(∞∑k=1^λ−ki(A−P)k)ui+∑j≠iλj^λiu⊤i(A−P)uju⊤j^uiu⊤i^ui+∑j≠iλju⊤i(∞∑k=2^λ−ki(A−P)k)uju⊤j^uiu⊤i^ui. (3.7)

Now , and by Hoeffding’s inequality, . Since , we have

 ∑j≠iλj^λiu⊤i(A−P)uju⊤j^uiu⊤i^ui=OP(n−1/2).

Next we note that can be bounded as

 ∥∞∑k=2^λ−ki(A−P)k∥≤∥^λ−2i(A−P)2∥1−∥^λ−1i(A−P)∥=OP(^λ−1i). (3.8)

We thus have

 ∑j≠iλju⊤i(∞∑k=2^λ−ki(A−P)k)uju⊤j^uiu⊤i^ui=OP(n−1/2)

The above bounds then implies

 ^λi−λi=λiu⊤i(∞∑k=1^λ−ki(A−P)k)ui+OP(n−1/2). (3.9)

Similar to the derivation of Eq. (3.8), we also show that

 ∥∞∑k=3^λ−ki(A−P)k∥≤C∥^λ−3i(A−P)3∥≤C^λ−3/2i (3.10)

with high probability and thus Eq. (3.9) and Eq. (3.10) imply

 ^λi−λi=λi^λiu⊤i(A−P)ui+λi^λ2iu⊤i(A−P)2ui+OP(n−1/2), (3.11)

and Eq. (3.1) is established.

We now consider the case where the -th eigenvalue of has multiplicity . Let be the indices of the eigenvalues of that is closest to , i.e.,

 maxj∈Si|λj−nλi(ΔIp,q)|≤mink∉Si|λk−nλi(ΔIp,q)|;|Si|=ri.

Denote by the matrix whose columns are the eigenvectors corresponding to the . We note that with high probability for sufficiently large . Furthermore, for all and . Therefore, by the Davis-Kahan theorem, for all . We now consider for . We note that

 u⊤i^uj=u⊤iA^uj−u⊤iP^uj^λj−λi=u⊤i(A−P)USiU⊤Si^uj^λj−λi+u⊤i(A−P)(I−USiU⊤Si)^uj^λj−λi=n−1/2u⊤i(A−P)USiU⊤Si^ujn−1/2(^λj−λj)+n−1/2(λj−λi)+n−1/2u⊤i(A−P)(I−USiU⊤Si)^ujn−1/2(^λj−λj)+n−1/2(λj−λi). (3.12)

By Hoeffding inequality, with high probability. Since , we have by the Davis-Kahan theorem. We then bound using the following result of (cape_16_conc, Theorem 3.7) (see also orourke13:_random).

###### Theorem 4.

Let and be a symmetric random matrix with for and the entries are independent. Denote the largest singular values of by , and denote the largest singular values of by . Suppose that , , for some absolute constants . Then for each , there exists some positive constant such that as , with probability at least , we have

 |^σk−σk|≤ck,dlogn.

We thus have

 |u⊤i^uj|=n−1/2OP(1)n−1/2OP(logn)+n−1/2(λj−λi).

We now analyze . We can view as a kernel matrix with symmetric kernel where . As is finite-rank, let denote the eigenvalues and associated eigenfunctions of the integral operator , i.e.,

 Khϕi(x)\coloneqq∫h(x,y)ϕi(y)dF(y)=λi(Ip,qΔ)ϕi(x).

Then, following koltchinskii00:_random, let denote the random symmetric matrix whose half-vectorization