An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model

# An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model

Gulikers, L., Lelarge, M., Massoulié, L.
###### Abstract

We consider a Degree-Corrected Planted-Partition model: a random graph on nodes with two equal-sized clusters. The model parameters are two constants and an i.i.d. sequence of weights , with finite second moment . Vertices and are joined by an edge with probability when they are in the same class and with probability otherwise.

We prove that it is information-theoretically impossible to estimate the spins in a way positively correlated with the true community structure when .

A by-product of our proof is a precise coupling-result for local-neighbourhoods in Degree-Corrected Planted-Partition models, which could be of independent interest.

## 1 Introduction

It is well-known that many networks exhibit a community structure. Think about groups of friends, web pages discussing related topics, or people speaking the same language (for instance, the Belgium population could be roughly divided into people speaking either Flemish or French). Finding those communities helps us understand and exploit general networks.

Instead of looking directly at real networks, we experiment first with models for networks with communities. One of the most elementary models is the Planted-Partition Model (PPM) [14]: a random graph on vertices partitioned into two equal-sized clusters such that vertices within the same cluster are connected with probability and between the two communities with probability . Note that the PPM is a special case of the Stochastic Block Model (SBM). The question is now: given an instance of the PPM, can we retrieve the community-membership of its vertices?

Most real networks are sparse and a thorough analysis of the sparse regime in the PPM - i.e., and for some constants - will therefore lead to a better understanding of networks.

When the difference between and is small, the graph might not even contain enough information to distinguish between the two clusters. In [9] it was first conjectured that a detectability phase-transition exists in the PPM: detection would be possible if and only if . The negative-side of this conjecture has been confirmed in [25]. The positive side has been recently confirmed in [20] and [24] using sophisticated (but still running in polynomial time) algorithms designed for this particular problem.

In this paper we study an extension of the PPM: a Degree-Corrected Planted-Partition Model (DC-PPM), a special case of the Degree-Corrected Stochastic Block Model (DC-SBM) in [16]. Because, although the PPM is a useful model due to its analytical tractability, it fails to accurately describe networks with a wide variety in their degree-sequences (because nodes in the same cluster are stochastically indistinguishable). Indeed, real degree distributions follow often, but not always, a power-law [1]. Compare this to fitting a straight line on intrinsically curved data, which is doomed to miss important information.

The DC-PPM is defined as follows: It is a random graph on vertices partitioned into two asymptotically equal-sized clusters by giving each vertex a spin drawn uniformly from . The vertices have i.i.d. weights governed by some law with support in where are constants. We denote the second moment of the weights by . An edge is drawn between nodes and with probability when and have the same spin and with probability otherwise. The model parameters and are constant.

In the underlying paper we extend results in [25] to the degree-corrected setting. More specifically, we prove that when , it is information-theoretically impossible to estimate the spins in a way positively correlated with the true community structure.

In a follow-up paper [13], we show that above the threshold (i.e., ), reconstruction is possible based on the second eigenvector of the so-called non-backtracking matrix. This is an extension of the results in [3] for the ordinary Stochastic Block Model.

We remark that there is an interpretation of the threshold in terms of eigenvalues of the conditional expectation of . Indeed, if denotes the adjacency matrix and and are the vectors defined for by and , then

 E[A|ϕ1,…,ϕn]=a+bnψ1ψ∗1+a−bnψ2ψ∗2−a1ndiag{ϕ2u}.

So that, in probability,

 E[A|ϕ1,…,ϕn]ψ1=(a+b21nn∑u=1ϕ2u)ψ1+(a−b21nn∑u=1σuϕ2u)ψ2+O(1)→a+b2Φ(2)ψ1,

and

 E[A|ϕ1,…,ϕn]ψ2=(a−b21nn∑u=1ϕ2u)ψ2+(a+b21nn∑u=1σuϕ2u)ψ1+O(1)→a−b2Φ(2)ψ2,

by the law of large numbers. Now, the condition is equivalent to

### 1.1 Our results

It is a well-known fact that in the sparse graph, vertices are isolated whose type thus cannot be recovered by any algorithm. Therefore, the best we can ask for is that our reconstruction is positively correlated with the true partition:

###### Definition 1.1.

Let be an observation of the degree-corrected planted partition-model, with true communities . Further, let be a reconstruction of the communities, based on the observation , such that . Then, we say that is positively correlated with the true partition if there exists such that

 1nn∑u=11{σu=ˆσu}≥12+δ,

with high probability.

Our main result is as follows:

###### Theorem 1.2.

Let be an observation of the degree-corrected planted partition-model with . Then, no reconstruction based on is positively correlated with .

As mentioned above, this theorem is an extension of the result in [25]. Their strategy, that we shall follow, invokes a connection with the tree reconstruction problem (see for instance [23]): deducing the sign of the root based on all the spins at some distance from the root. Indeed, we shall see that the -neighbourhood of a vertex looks like the following random labelled tree, that we denote by :

We begin with a single particle, the root , having spin and weight (which we often take random). The root is replaced in generation by particles of spin and particles of spin . Further, the weights of those particles are i.i.d. distributed following law , the size-biased version of , defined for by

 ν∗(x)=1Φ(1)∫xϕminydν(y). (1.1)

For generation , a particle with spin and weight is replaced in the next generation by particles with the same spin and particles of the opposite sign. Again, the weights of the particles in generation follow in an i.i.d. fashion the law . The offspring-size of an individual is thus a Poisson-mixture.

### 1.2 General proof idea

We first note that reconstruction is senseless when , because in this regime there is no giant component111Indeed, the main result in [2] concerns the existence, size and uniqueness of the giant component. In particular, in the setting considered here, a giant component emerges if and only if . We shall henceforth assume a giant component to emerge.. Note further that already implies .

To prove that detection is not possible when and , we show in Theorem 4.2 that for uniformly chosen vertices and ,

 P(σu=+|σv,G)P→12, (1.2)

as . I.e., it is already impossible to decide the sign of two random vertices, which is an easier problem than reconstructing the group-membership of all vertices (made precise in Lemma 4.3). To establish (1.2), we condition on the boundary spins of an neighbourhood around , where tends to infinity: it should make reconstruction easier. But, as we shall see, long-range correlations in this model are weak (Lemma 4.1). Hence, we can leave out the conditioning on the spin of , so that we are precisely in the setting of a tree-reconstruction problem, see Section 2. In fact, we shall prove (Theorem 2.4) that reconstruction of the sign of the root in a tree based on the spins at depth (where ), is impossible when .

### 1.3 Outline and differences with ordinary Planted-Partition model

Due to the presence of weights, the offspring in the branching process is governed by a Poisson-mixture. Section 2 deals with these type of branching processes. The main theorem (i.e., Theorem 2.4) deals with a reconstruction problem on sequences of trees rather than a single random tree as in Theorem in [25].

In Section 3 we establish a coupling between the local neighbourhood and . This result is different from the coupling in [2], because we need the weights in the graph and their counterparts in the branching process to be exactly the same.

Finally, in Section 4 we show that long-range interactions are weak. The proof of Lemma 4.1 is based on an idea in the proof of Lemma in [25]. Note however that (besides the presence of weights) the statement of our Lemma 4.1 is slightly stronger than Lemma in [25], see below for details.

### 1.4 Background

Without the degree correction (i.e., ), the authors of [9] where the first to conjecture a phase-transition for the ordinary planted partition model based on ideas from statistical physics: Clustering positively correlated with the true spins is possible if and impossible if . They conjectured further that using the the so-called belief propagation algorithm would establish the positive part. In [17] ’spectral redemption conjecture’ was made: detection using the second eigenvalue of the so called non-backtracking matrix would also establish the positive part.

The work [6] showed that positively correlated results exist in the sparse case, however not applying all the way down to the threshold. The remainder of the positive part in was established [20] by using a matrix counting the number of self-avoiding paths in the graph. The work [24] establishes independently of [20] the positive part of the conjecture in [9]. Further, the authors of [24] show impossibility in [25]. In fact they show a bit more, namely that for reconstructions are never positively correlated. We shall here extend their results for the DC-PPM by relying on similar techniques.

Recovering the planted partition (without degree-corrections) often coincides with finding the minimum bisection in the same graph. That is, finding a partition of the graph such that the number of edges between separated components (the bisection width) is small. This problem is NP-hard [10].

Graph bisection on random graphs has been studied intensively. For instance, [4] studies the collection of labelled simple random graphs that have nodes, node-degree at least and bisection width . For these graphs the minimum bisection is much smaller than the average bisection. The main result is a polynomial-time algorithm based on the maxflow-mincut theorem, that finds exactly the minimum bisection for almost all graphs.

Another example is given in [10]. There, the authors consider the uniform distribution on the set of all graphs that have a small cut containing at most a fraction of the total number of edges for some fixed . Those authors show that, if in the planted partition model are fixed (for ), then the underlying community structure coincides with the minimum bisection and it can be retrieved in polynomial time. This result is improved in [15].

In [21] the case of non-constant and is analysed. A spectral algorithm is presented that recovers the communities with probability if . Here is a sufficiently large constant.

Positive results of spectral clustering in the DC-SBM have been obtained by various authors. The work [8] introduces a reconstruction algorithm based on the matrix that is obtained by dividing each element of the adjacency matrix by the geometric mean of its row and column degrees.

The extended planted-partition is studied in [7]. In that model, an edge is present between and with probability where , the average weight. The main result is a polynomial time algorithm that outputs a partitioning that differs from the planted clusters on no more that nodes. This recovery succeeds only under certain conditions: the minimum weight should be a fraction of the average weight and the degree of each vertex is .

The article [18] gives an algorithm based on the adjacency matrix of a graph together with performance guarantees. The average degree should be at least of order . However, since the spectrum of the adjacency matrix is dominated by the top eigenvalues [5], the algorithm does a poor job when the degree-sequence is very irregular.

The authors of the underlying paper propose in [12] an algorithm that recovers consistently the block-membership of all but a vanishing fraction of nodes, even when the lowest degree is of order . It outperforms algorithms based on the adjacency matrix in case of heterogeneous degree-sequences.

## 2 Broadcasting on the branching process

Here we repeat without changes the definition of a Markov broadcasting process on trees given in [25]. Let be an infinite tree with root . Given a number , define a random labelling . First, draw uniformly in . Then, conditionally independently given , take every child of and set with probability and otherwise. Continue this construction recursively to obtain a labelling for which every vertex, independently, has probability of having the same label as its parent.

Suppose that the labels at depth in the tree are known (here, ). The paper [11] gives precise conditions as to when reconstruction of the root label is feasible using the optimal reconstruction strategy (maximum likelihood), i.e., deciding according to the sign of . Interestingly, this is completely decided by the branching number of .

###### Definition 2.1.

The branching number of a tree , denoted by , is defined as follows:

• If is finite, then ;

• If is infinite, then where the infimum is taken over all cutsets .

Theorem in [11] reads tailored to our needs:

###### Theorem 2.2.

(Theorem in [11]) Consider the problem of reconstructing from the spins at the th level of . Define as the difference between the probability of correct and incorrect reconstruction given the information at level :

 Δm:=∣∣P(τρ=+|τ∂Tm)−P(τρ=−|τ∂Tm)∣∣.

If then .
If, however, then .

Note that in this theorem the tree is fixed, compared to the setting in this paper where the multi-type branching process defined in Section 1.1 is considered. But, it can be easily seen that the spins on a fixed instance of are distributed according to the above broadcasting process with error probability 222Indeed, instances of the tree when ignoring spins are generated according to a Galton-Watson process where the number of offspring of a particle is an independent copy of Poi , with governed by . We obtain the spins by giving particles, independently, the same spin as its parent with probability and the opposite sign with probability . Thus, a particle gives birth to particles of the same sign, and particles of the opposite sign. Those numbers are seen to be independent. . We thus need to calculate the branching number of a typical instance :

###### Proposition 2.3.

Assume that . Consider the multi-type branching process , where the root has spin drawn uniformly from and weight governed by . Then, given the event that the branching process does not go extinct, almost surely.

###### Proof.

Denote the multi-type branching process by . Assume w.l.o.g. that the root has children denoted as . Denote by the subtree of all particles with common ancestor . We observe that if and only if for all .

Now, conditional on the spin of the root, are i.i.d. copies of with weight governed by the biased law . The latter is a Galton-Watson process with offspring mean . Hence Proposition in [19] entails that a.s. ∎

Note that it can in fact be easily proved that almost surely.

We conclude with the main theorem of this section. Note that we assume that , so that the branching process does not die out with non-zero probability. Remark further that the theorem is a bit more precise than Theorem in [25] (which deals with unweighed Poisson trees), in the sense that we need to re-sample the tree for each . Indeed, in the coupling result Theorem 3.1 below we re-sample for each .

###### Theorem 2.4.

Assume that . Let be a collection of i.i.d. copies of . Denote for each tree its spins by . Further, let be an unbounded non-decreasing function. Assume that , then

 P(τnρ=+|TnR(n),τn∂TnR(n))P→12,

as .

###### Proof.

We begin by describing the above broadcasting process on random trees more precisely. By the triple we denote the underlying probability-space of the following stochastic process: Let be the branching process with root where we ignore all types on it (we denote the collection of its realizations by , and we let be a sigma-algebra on it). We define a new random labelling on every instance of by running the Markov broadcast process. I.e., is uniformly drawn from and each child has the same spin as its parent with probability .

Let, for each , be an independent copy of . Formally, the random variable is thus a mapping from to : it therefore makes sense to define the pull-back measure for by .

With this notation,

 E[∣∣∣P(τnρ=+|TnR(n),τn∂TnR(n))−12∣∣∣] =∫Ω′E[∣∣∣P(τnρ=+|TnR(n),τn∂TnR(n))−12∣∣∣∣∣∣Tn=T]dPT(T) =∫Ω′E[∣∣∣P(τρ=+|T=T,τ∂TR(n))−12∣∣∣]dPT(T).

Since Br almost surely, almost surely. Consequently,

 fn(T):=E[∣∣∣P(τρ=+|T=T,τ∂TR(n))−12∣∣∣]→0

for almost every realization of . Because , it is the immediate consequence of Lebesgue’s dominated convergence theorem that

 P(τnρ=+|TnR(n),τn∂TnR(n))→12,

in as .

Finally, it is a well-known fact that convergence in implies convergence in probability. ∎

## 3 Coupling of local neighbourhood

This section has as its objective to establish a coupling between the local neighbourhood of an arbitrary fixed vertex in the DC-PPM and . The main result is the following theorem, where we let , , and be random instances of , its spins and its weights, respectively.

###### Theorem 3.1.

Let , with . Let be a uniformly picked vertex in , where for each , is an instance of the DC-PPM. Let for each , be an independent copy of , then

 P((GR(n)(ρ),σGR,ϕGR)=(TnR(n),τnTR,ψnTR))=1−n−12log(4/e).

We defer its proof to the end of this section. It uses an alternative description of the branching process in Section 1.1.

### 3.1 Alternative description of branching process

We obtain an alternative description of the graph by considering a particle with spin and weight to be of type . We denote the law of by , i.e., for , Two distinct vertices and are then joined by an edge with probability , where is defined for by

 κ(x,y)=|xy|(1{xy>0}a+1{xy<0}b). (3.1)

Analogously, we obtain the following equivalent description of the branching process: We begin with a single particle of type governed by , giving birth to Poi children, where for , and ,

 λx(A)=∫Aκ(x,y)dμ(y). (3.2)

Conditional on the children have i.i.d. types governed by 333Note that if has law , then for any , . Hence, we can identify sign with the particle’s spin and with its independent weight., where for , and ,

 μ∗x(A)=λx(A)λx(S)=∫A(aa+b1xy>0+ba+b1xy<0)|y|dν(|y|)Φ(1). (3.3)

For generation , all particles give birth independently in the following way: A particle with type is replaced in the next generation by Poi children, again with i.i.d. types governed by .

In [2] it is shown that local neighbourhoods of the graph are described by the above branching process, if we ignore the types. (To be precise: the equivalent description used in [2] is that a particle of type gives birth to Poi children with type in , for any . Those numbers are independent for different sets and different particles.)

The coupling-technique in [2] uses a discretization of as an intermediate step, thereby losing some information: types in the tree deviate slightly from their counterparts in the graph. We shall therefore use another coupling method, presented below, so that the types in graph and branching process are exactly the same.

### 3.2 Coupling

We use the following exploration process: At time , choose a vertex uniformly in , where is an instance of the DC-PPM. Initially, it is the only active vertex: . All other vertices are neutral at start: . No vertex has been explored yet: . At each time we arbitrarily pick an active vertex in that has shortest distance to , and explore all its edges in : if for , then we set active in step , otherwise it remains neutral. At the end of step , we designate to be explorated. Thus,

 E(m+1)=E(m)∪{u},
 A(m+1)=(A(m)∖{u})∪(N(u)∩U(m)),

and,

 U(m+1)=U(m)∖N(u).

Our aim in this section is to show that the exploration process and the branching process are equal upto depth (defined in Theorem 3.1) with probability tending to one for large . We do this in two steps:

Firstly, we establish that the i.i.d. vertices in follow a law such that

 ∣∣∣∣μ(m)−μ∣∣∣∣TV=O(mn).

This is the content of the following:

###### Lemma 3.2.

Let be the vertices in , with types . Then, the vertices in are i.i.d. with law where

 dμ(m)(⋅)=g(⋅)dμ(⋅)∫Sg(z)dμ(z), (3.4)

with

 g(⋅)=m∏i=1(1−κ(xi,⋅)n). (3.5)

Further, in the regime , there exists such that for all :

 ∣∣∣∣μ(m)x1,…,xm−μ∣∣∣∣TV≤2κmaxmn,

if .

Secondly, if has type , then its neighbours in (i.e., those vertices that will be added to ) are i.i.d. with law , which is away from in total variation distance. Further, we can approximate the number of neighbours by with error :

###### Lemma 3.3.

Assume has type . Let be the number of neighbours has in . Then, the types of those neighbours are i.i.d. with law , where

 dμ∗(m)x(⋅)= κ(x,⋅)dμ(m)(⋅)∫Sκ(x,y)dμ(m)(y). (3.6)

Recall from Lemma 3.2: if then

 ∣∣∣∣μ∗(m)x−μ∗x∣∣∣∣TV≤4κ3maxκ2minmn. (3.7)

Further,

 ∣∣∣∣D−\emphPoi(λx(S))∣∣∣∣TV≤κmaxn−|U(m)|n+3κ2maxmn. (3.8)

To establish the desired coupling, let us give names to all good events:

 Ar+1={∀u∈∂Gr:Du=ˆDu},
 Br+1={∀u∈∂Gr,v∈{1,…,Du}:Uuv=ˆUuv},
 Cr={|∂Gs|≤g(s)=2sMslog(n) ∀s≤r},

where, for (we identify ),

• ;

and where, conditional that has type ,

• ;

• moreover, for :

• denotes the type of child of vertex ;

• is a random variable with law .

The types attached to siblings are independent conditional on their parents type.

With the above lemma’s established, the event

 Er=r⋂s=1{As∩Bs∩Cs}

happens indeed w.h.p.:

###### Lemma 3.4.

For any integer ,

 P(Er+1|Er)≥1−n3Clog(2κmax)−1−n−log(4/e),

for large enough .

The events and alone do not contain enough information to completely reconstruct the neighbourhood of : vertices in might be merged among each other, or it is possible that they share a child in . But, those events, are rare. Indeed, let be the event that no vertex in has more than one neighbour outside and that there are no edges in . Then, we have the following:

###### Lemma 3.5.

Let , then

 P(Kr|CR)≥1−n3Clog(2κmax)−1,

for large enough .

###### Proof of Lemma 3.2.

Consider vertex with type . We show first that, conditional on and , has law . To this end we shall calculate for ,

 P(Y≤y|v∉N(1,…,m),X1=x1,…,Xm=xm) (3.9)

since . Recall (3.5) and observe that

 g(⋅)=P(v∉N(1,…,m)|Y=⋅,X1=x1,…,Xm=xm).

Hence, the denominator in (3.9) is just

 P(v∉N(1,…,m)|X1=x1,…,Xm=xm)=∫Sg(z)dμ(z). (3.10)

Evaluating the numerator yields,

 P(Y≤y)P(v∉N(1,…,m)|Y≤y,X1=x1,…,Xm=xm) (3.11) =P(Y≤y)∫y−ϕming(z)dp(z)=∫y−ϕming(z)dμ(z),

where, we defined for , . By combining (3.10) and (3.11) we establish (3.4), i.e., has distribution .

To see that indeed approximates , observe that

 (1−κmaxn)m≤g(⋅)∫Sg(z)dμ(z)≤1(1−κmaxn)m.

We use Taylor’s theorem to appropriately bound both sides for large :

 (1−κmaxn)m=1−κmaxmn+O⎛⎝κ2max2(mn)2⎞⎠,

where and,

where . Consequently, there exists such that

 ∣∣∣∣μ(m)−μ∣∣∣∣TV ≤∫S∣∣ ∣∣g(y)∫Sg(z)dμ(z)−1∣∣ ∣∣dμ(y)≤2κmaxmn,

for all . ∎

###### Proof of Lemma 3.3.

Put and let denote the types of the neighbours of .

Let be an arbitrary measurable function. The first claim follows if we prove that

 E[e−∑Dj=1f(Yj)∣∣D=d]=(∫Se−f(y)dμ∗(m)x(y))d (3.12)

Now,

 E[e−∑Dj=1f(Yj)1D=d]

where, conditioning on means that . We have,

Hence,

 E[e−∑Dj=1f(Yj)∣∣D=d]=1(nmd)∑F⊂[nm],|F|=dE[e−∑j∈Ff(Yj)∣∣F].

Conditional on , the types are i.i.d., thus

 E[e−∑j∈Ff(Yj)∣∣F]=⎛⎜⎝∫Se−f(y)κ(x,y)ndμ(m)(y)∫Sκ(x,y)ndμ(m)(y)⎞⎟⎠d,

which combined with (3.6) gives (3.12), our first claim.

Further, an explicit calculation yields

 ∫S|dμ∗(m)x−dμ∗x|≤4κ3% maxκ2minmn,

establishing (3.7).

For the last claim, observe that Bin, where . Hence,

 ∣∣∣∣Bin(nm,p)−Poi(nmp)∣∣∣∣%TV≤nm∑i=1p2≤κ2maxn.

Standard bounds for Poisson random variables entail the existence of a constant such that Poi Poi. Consequently,

 1CPoi∣∣∣∣Poi(nmp)−Poi(λx(S))∣∣∣∣TV ≤|nm−n|p+∣∣∣∫Sκ(x,y)dμ(m)(y)−∫Sκ(x,y)dμ(y)∣∣∣ ≤κmax|nm−n|n+κmax∫S∣∣dμ(m)(y)−dμ(y)∣∣ ≤κmax|nm−n|n+2κ2maxmn.

Thus,

###### Proof of Lemma 3.4.

Write . We have

 P(Er+1|Er)≥P(Br+1|Er)−P(¬Ar+1|Er)−P(¬Cr+1|Er).

Now,

 P(Br+1|Er,nr) ≥1−nr∑u=1P(¬B(u)r+1∣∣u−1⋂v=1B(v)r+1,Er), (3.13)

where Denote the already explored vertices by (where ) and their types as . Conditional on those types, the vertices in are i.i.d. with distribution . Hence:

 P(B(u)r+1∣∣u−1⋂v=1B(v)r+1,Er,nr,X1,…,Xm)=P(B(u)r+1∣∣X1,…,Xm) (3.14) ≥P(B(u)r+1∣∣Du≤