Efficient inference in stochastic block models with vertex labels

Efficient inference in stochastic block models with vertex labels

Abstract

We study the stochastic block model with two communities where vertices contain side information in the form of a vertex label. These vertex labels may have arbitrary label distributions, depending on the community memberships. We analyze a linearized version of the popular belief propagation algorithm. We show that this algorithm achieves the highest accuracy possible whenever a certain function of the network parameters has a unique fixed point. Whenever this function has multiple fixed points, the belief propagation algorithm may not perform optimally. We show that increasing the information in the vertex labels may reduce the number of fixed points and hence lead to optimality of belief propagation.

I Introduction

Many real-world networks contain community structures: groups of densely connected nodes. Finding these group structures based on the connectivity matrix of the network is a problem of interest, and several algorithms have been developed to extract these community structures, see [6] for an overview. In many applications however, the network contains more information than just the connectivity matrix. For example, the edges can be weighted, or the vertices can carry information. This extra network information may help in extracting the community structure of the network. In this paper, we study the setting where the vertices have labels, which arises in particular when vertices can be distinguished into different types. For example, in social networks vertex types may include the interests of a person, the age of a person or the city a person lives in. We investigate how the knowledge of these vertex types helps us in identifying the community structures.

We focus on the stochastic block model (SBM), a popular random graph model to analyze community detection problems [8, 4, 18]. In the simplest case, the stochastic block model generates a random graph with 2 communities. First, the vertex set is partitioned into two communities. Then, two vertices in communities and are connected with probability for some connection probability matrix . To include the vertex labels, we then attach a label to every vertex, where the label distribution depends on the community membership of the vertex.

In the stochastic block model with two equally sized communities, it is not always possible to infer the community structure from the connectivity matrix. A phase transition occurs at the so-called Kesten-Stigum threshold , where is the second largest eigenvalue of a matrix related to the connectivity matrix and the average degree in the network. Underneath the Kesten-Stigum threshold, no algorithm is able to infer the community memberships better than a random guess, even though a community structure may be present [14]. In this setting, it is even impossible to discriminate between a graph generated by the stochastic block model and an Erdős Rényi random graph with the same average degree, even though a community structure is present [13]. Above the Kesten-Stigum threshold, the communities can be efficiently reconstructed [11, 15].

A popular algorithm for community detection is belief propagation [5] (BP). This algorithm starts with initial beliefs on the community memberships, and iterates until these beliefs converge to a fixed point. Above the Kesten-Stigum threshold, a fixed point that is correlated with the true community memberships is believed to be the only stable fixed point, so that the algorithm always converges to that fixed point. Underneath the Kesten-Stigum threshold, the fixed point correlated with the true community memberships becomes unstable and the belief propagation algorithm will in general not result in a partition that is correlated with the true community memberships. However, when the belief propagation algorithm is initialized with the real community memberships, there is still a regime of the parameters where the fixed point correlated with the true community spins can be distinguished from the other fixed points. In this regime, community detection is believed to be possible (for example by exhaustive search of the state space), but not in polynomial time.

When the two communities are equally sized (the symmetric stochastic block model), the phase where community detection may only be possible by non-polynomial time algorithms is not present [15, 11, 14]. In the case of unbalanced communities (the asymmetric stochastic block model) it has been shown that it is possible to infer the community structure better than random guessing even below the Kesten-Stigum threshold [17]. Thus, according to the conjecture of [5], a regime where community detection is possible but not in polynomial time may be present in the case of two unbalanced communities.

In this paper, we investigate the performance of the belief propagation algorithm on the asymmetric stochastic block model when vertices contain side information. We are interested in the fraction of correctly inferred community labels by the algorithm, and we say that the algorithm performs optimally if it achieves the highest possible fraction of correctly inferred community labels among all algorithms. Some special cases of stochastic block models with side information have already been studied. One such case is the setting where a fraction of the vertices reveals its true group membership [21]. Typically, it is assumed that the fraction of vertices that reveal their true membership tends to zero when the graph becomes large [21, 3]. In this setting, a variant of the belief propagation algorithm including the vertex labels seems to perform optimally in a symmetric stochastic block model [21], but may not perform optimally if the communities are not of equal size [3]. Another special case of the label distribution is when the observed labels are a noisy version of the community memberships, where a fraction of vertices receives the label corresponding to their community, and a fraction of vertices receives the label corresponding to the other community. It was conjectured in [16] that for this label distribution the belief propagation algorithm always performs optimally in the symmetric stochastic block model.

Our contribution

We focus on asymmetric stochastic block models with arbitrary label distributions, generalizing the above examples.

  • We provide an algorithm that uses both the label distribution and the network connectivity matrix to estimate the group memberships.

  • The algorithm is a local algorithm, which means that it only depends on a finite neighborhood of each vertex. In particular, this implies that the running time of the algorithm is linear, allowing it to be used on large networks. The algorithm is a variant of the belief propagation algorithm, and a generalization of the algorithms provided in [16, 3] to include arbitrary label distributions and an asymmetric stochastic block model.

  • In a regime where the average vertex degrees are large, we obtain an expression for the probability that the community of a vertex is identified correctly. Furthermore, we show that this algorithm performs optimally if a function of the network parameters has a unique fixed point.

  • Similarly to belief propagation without labels, we show that when multiple fixed points exist, the belief propagation algorithm may not converge to the fixed point containing the most information about the community structure. This phenomenon was previously observed in a setting where the information carried by the vertex labels tends to zero in the large graph limit [3], but we show that this may also happen if the information carried by the vertex labels does not tend to zero. The existence of multiple fixed points either indicates that the optimal fixed point can still be found by an exhaustive search of the partition space or it may indicate that no algorithm is able to detect the community partition.

  • We show that increasing the correlation between the vertex covariate and the community structure changes the number of fixed points of the BP algorithm for a specific example of node covariates. In particular it is possible that the BP algorithm does not converge to the fixed point that is the most informative on the vertex spins if the correlation between the vertex covariates and the vertex spins is small, but that BP does converge to this fixed point if the vertex labels contain more information on the vertex spins. This shows that including node covariates for community detection is helpful, and that it may significantly improve the performance of polynomial time algorithms for community detection.

We start by showing with an example that in some cases vertex labels allow us to efficiently detect communities even below the Kesten-Stigum threshold.

Example 1.

We now present a simple example where it is not possible to detect communities using the connectivity matrix only, but where it is possible when we also use knowledge of the vertex labels. Consider an SBM with four communities of size , 1,2,3 and 4, where the probability that a vertex in community connects to a vertex in community is given by . Here is the connection probability matrix defined as

The nonzero eigenvalues of this matrix are given by (appears with multiplicity two) and . Community detection in this example is not able to obtain a partition that is better than a random guess below the Kesten-Stigum threshold, which is

Now suppose that all vertices in communities 1 and 2 have label , and all vertices in communities 3 and 4 have labels . Then, there are vertices with label and with label . Thus, using the labels of the communities alone we cannot distinguish between vertices in community 1 and 2 or between vertices in communities 3 and 4. Thus, using the labels only, we can only correctly infer at most half of the community spins.

Now suppose we split the network into two smaller networks based on the label of the vertices. Then, we obtain two small networks with connection probability matrices

Thus, community detection can achieve a partition that is better than a random guess in these two networks as long as , i.e., above the corresponding Kesten-Stigum threshold. Thus, in the regime

it is impossible to infer the community structure better than a random guess without information about the vertex labels, or when using only the vertex labels. However, when using the vertex label information combined with the underlying graph structure, one can infer the community structure of strictly more than half of the vertices correctly.

Notation

We say that a sequence of events happens with high probability (w.h.p.) if . Furthermore, we write if , and if is uniformly bounded, where is nonnegative.

I-a Model

Let be a labeled SBM with two communities. That is, every vertex has a spin , where independently for all . Each pair of nodes is connected with probability if , with probability if , and with probability if , so that controls the average degree in the graph. When the communities do not have equal degrees, partitioning vertices based on their degrees already results in a community detection algorithm that correctly identifies the spin of a vertex with probability at strictly larger than  [3]. We therefore assume that all vertices have the same average degree, that is

(1)

so that the average degree is .

Beside the vertex spins, every vertex has a label attached to it. Let be a finite set of labels. Then vertices in community + have label with probability , and vertices in community - have label with probability .

For an estimator of the community spins , let be the estimated label of vertex in graph under estimator . We then define the success probability of an estimator as

(2)

where the subtraction of -1 is to give zero performance measure to estimators that do not depend on the graph structure . Let be a uniformly chosen vertex. By [3, Proposition 3],

(3)

where and are the conditional distributions of , given that and respectively and denotes the total variation distance. We say that the community detection problem is solvable if the estimator maximizing (2) satisfies

(4)

Note that the estimator that estimates community one if and community two otherwise has a success probability of

(5)

Thus, the community detection problem is always solvable when . Furthermore, an estimator performs better when combining the network data and the vertex labels than when only using the vertex labels if

I-B Labeled Galton-Watson trees

A widely used algorithm to detect communities in the stochastic block model is Belief Propagation [5]. The algorithm computes the belief that a specific vertex belongs to community , given the beliefs of the other vertices. Because the stochastic block model is locally tree-like, we study a Galton-Watson tree that behaves similar to the labeled stochastic block model. We denote this labeled Galton-Watson tree by , where is a Galton-Watson tree rooted at with a Poisson offspring distribution. Each vertex in the tree has two covariates, and . Here denotes the spin of the node, and denotes the vertex label of node . The root has spin with probability and spin - with probability . Given the label of the parent of node , the probability that is if and if . Given , with probability , whereas given , with probability .

Let denote such a tree of depth rooted at , where the labels are observed, but the spins of the nodes are not observed. Let denote the set of children of vertex . Then Bayes’ rule together with (1) yields

If we define

(6)

we can write the recursion

(7)

with ,

(8)

and

(9)

I-C Local algorithms

A local algorithm is an algorithm that bases the estimate of the spin of a vertex only on the neighborhood of vertex of radius . In general, local algorithms are not able to obtain a success probability (2) larger than zero in the stochastic block model [9], so that an estimator based on a local algorithm does not satisfy (4). However, when a vanishing fraction of vertices reveals their labels, a local algorithm is able to achieve the maximum possible success probability (2) when the parameters of the stochastic block model are above the Kesten Stigum threshold [9].

I-D Local, linearized belief Propagation

The specific local algorithm we consider is a version of the widely used belief propagation [5]. Algorithm 1 uses the observed labels to initialize the belief propagation, and then updates the beliefs as in (7). Here denotes the neighbors of vertex . Since the algorithm only uses the neighborhood of vertex up to depth , it is indeed a local algorithm. Note that Algorithm 1 does require knowledge of all parameters of the stochastic block model: as well as the label distributions and . Furthermore, if the underlying graph is a tree, then (11) is the same as (7).

1 Set for all and .
2 for  do
3       For all let
(10)
4 end for
5For all set
(11)
For all , set if , and set if
Algorithm 1 Local linearized belief propagation with vertex labels.

Ii Properties of local, linearized belief propagation

We now consider the setting where and grow large. In this regime, we give specific performance guarantees on the success probability of the local belief propagation algorithm. Define

We focus on the regime where is fixed,where is the smallest eigenvalue of and define . We let the average degree . Then, corresponds to the Kesten Stigum bound [3]. Furthermore, we assume that the average degree in each community is equal, so that (1) holds. Under this assumption, . Thus, if we let , then in the regime we are interested in, . Then also

(12)

Define

(13)

where

(14)

Here is a random variable, and is a random variable independent of which takes values with probability . Let

(15)

and let denote a random variable which takes values with probability . Then the following theorem gives the success probability of Algorithm 1 in terms of the function and a fixed point of and compares it with the performance of the optimal estimator .

Theorem 1.

Let denote the estimator given by Algorithm 1 up to depth . Then,

(16)

Furthermore, if has a unique fixed point, then

(17)

and the estimator of Algorithm 1 is asymptotically optimal.

We now comment on the results and its implications.

Special cases of

The function has been investigated for two special cases of the labeled stochastic block model. Analyzing for these special cases already turned out to be difficult, but some conjectures on its behavior have been made based on simulations. In [16], it was conjectured that for the special case where and the vertex labels are noisy versions of the spins, the function only has one fixed point for all possible values of . Thus, Algorithm 1 is conjectured to perform optimally for the symmetric stochastic block model with noisy spins as vertex labels.

The asymmetric stochastic block model where the information about the community memberships carried by the vertex labels goes to zero was studied in [3]. Instead of noisy spins as labels, a fraction of vertices reveals their true spins while the other vertices carry an uninformative label, where tends to zero as the graph grows large. In that setting, it was conjectured that the function may have 2 or 3 fixed points for small values of and .

Influence of the initial beliefs on the performance of Algorithm 1

When has more than one fixed point, the success probability of Algorithm 1 corresponds to the smallest fixed point of . If in the belief propagation initialization the true unknown beliefs are used, the success probability of the algorithm corresponds to the largest fixed point of . Since is increasing, this also implies that the success probability when initializing with the true unknown beliefs is higher than the success probability of Algorithm 1.

Multiple fixed points of

Figures 0(a) and 0(b) show that in the setting of Theorem 1 where information in the labels about the vertex spins does not vanish, the function may have more than one fixed point, even when the probability of observing the correct label does not go to as . This is very different from the special case where , where the function was conjectured to have at most one fixed point [16]. Indeed, Figures 0(c) and 0(d) show that for the symmetric stochastic block model only contains one fixed point. For the asymmetric stochastic block model on the other hand, there is a region of parameters where Algorithm 1 may not achieve the highest possible accuracy among all algorithms. Belief propagation initialized with the true beliefs corresponds to the highest fixed point of , and thus results in a better estimator than belief propagation initialized with beliefs based on the vertex labels. In this case, exhaustive search of the partition space may still find all fixed points of the belief propagation algorithm, of which one corresponds to the fixed point having maximal overlap with the true partition. However, whether this fixed point can be distinguished from the other fixed points without knowledge of the true community spins is unknown. If this is possible, this would indicate a phase where community detection is possible, but computationally hard. If the fixed point is indistinguishable from the other fixed points, even exhaustive search of the partition space will not result in a better partition. In the asymmetric stochastic block model without vertex labels, it was shown that it is sometimes indeed possible to detect communities even underneath the Kesten-Stigum threshold [17] (in non-polynomial time). It would be interesting to see in which cases this also holds for the stochastic block model including vertex labels.

Increasing vertex label information

Interestingly, Figure 0(b) shows an example where the lowest and the highest fixed point of are stable, but the middle fixed point is unstable. Thus, to converge to a fixed point corresponding to a better correlation with the network partition than initializing at , the initial beliefs should correspond to an -value that is equal to or larger than the second fixed point of in this example. Note that the case is similar to the community detection problem without extra vertex information, because for the vertex labels are independent of the community memberships. Thus, in the asymmetric stochastic block model, there is a fixed point of corresponding to a partition with non-trivial overlap with the true partition, but the BP algorithm does not find this partition when initialized with random beliefs. The same situation occurs when the information about the community membership carried by the vertex labels is small (for example when ). However, when the information carried by the vertex labels is sufficiently large, the BP algorithm starts to converge to the largest fixed point of , and the BP algorithm performs optimally. Thus, including node covariates in the BP algorithm for the asymmetric stochastic block model may change the number of fixed points, and therefore significantly improve the performance of the BP algorithm.

(a)
(b) , zoomed in
(c)
(d) , zoomed in
Fig. 1: The function for and noisy labels (, , , ) for various values of . The black line is the line .

Success probability

Figures 1(a) and 1(b) plot the success probability of Algorithm 1 given by equation (16) against for the case of noisy labels and revealed labels respectively. We see that for small and large values of , these is a rapid increase in the success probability. This increase is caused by the shape of , shown in Figures 0(a) and 0(c) for the setting with noisy labels. The location of the fixed point of is much closer to the origin for than for . This difference causes the increase in the success probability.

Figures 2(a) and 2(b) show the success probability given by (16) as a function for unbalanced communities. Here we also see that there is a small range of where the success probability increases rapidly in . Figure 4 shows that the accuracy obtained in Theorem 1 is higher than the accuracy that is obtained when only using the vertex labels to distinguish the communities, even underneath the Kesten-Stigum threshold .

(a) Noisy vertex labels: , .
(b) Revealed vertex labels: , .
Fig. 2: as a function of for
(a) Noisy vertex labels: , .
(b) Revealed vertex labels: , .
Fig. 3: as a function of for
Fig. 4: Success probability of Algorithm 1 and the success probability when only using the vertex labels for , , and .

Iii Proof of Theorem 1

Because the SBM is locally tree-like, we first investigate Algorithm 1 on a Galton-Watson tree defined in Section I-B, where we study the recursion (7). Denote by the value of for a randomly chosen vertex in community + and define similarly. Then, and . We first investigate the distribution of .

Lemma 2.

As ,

(18)
(19)

where denotes convergence in the Wasserstein metric (see for example [7, Section 2]).

Proof.

From the recursion (13) we obtain

Furthermore, we can write as

(20)

where , independent of . Subtracting and adding the mean of the Poisson variable yields

(21)

In the regime we are interested in, , , and for some (see (12)). Then, Taylor expanding around results in

This shows that the last term in (21) can be rewritten as

By [3, Corollary A3],

as . Then,

Thus, as

and similar arguments prove the lemma for . ∎

We now proceed to the distribution of for by using induction.

Lemma 3.

Assume that

(22)
(23)

for some . Then,

(24)
(25)

as .

Proof.

Define

(26)

and define as the value of for a randomly chosen in community +. We start by investigating the first moment of . Using Walds equations, we obtain

(27)

where the second line uses (12). We then use that [3, Eq. (A4)]

Taylor expanding then results in

(28)

For all bounded continuous functions by [3, Lemma A6]

(29)

Denote and . Then,

Combining this with (27) and (28) gives

Combining this with the induction hypothesis results in

For the variance, we obtain using Walds equation

(30)

where we used (28) again. Similar computations as for the expected value then lead to

(31)

Thus, the first and second moment of are of the correct size. The proof that converges to a normal distribution then follows the exact same lines as the proof in [3, Proposition 23]. ∎

We now study the total variation distance of a labeled Galton-Watson tree where the root is in community and a Galton-Watson tree where the root is in community .

Lemma 4.

Let and denote the conditional distributions of conditionally on the spin of being + and - respectively. Then,

(32)
Proof.

By (3), the term on the left hand side is the same as the success probability of the estimator of Algorithm 1 on a Galton-Watson tree. Using that and converge to normal distributions in the large graph limit, we then obtain for the total variation distance that

Finally, we need to relate our results on the labeled Galton-Watson trees to the SBM. Denote by the subgraph of induced by all vertices at distance at most from vertex . Let denote the spins of all vertices in . Similarly, let denote the spins of the vertices in . Then, the following Lemma can be proven analogously to [14].

Lemma 5.

For such that , there exists a coupling between and such that with high probability.

This lemma allows us to finish the proof of Theorem 1.

Proof of Theorem 1.

On the event that , the estimator of Algorithm 1 is the same as the estimator based on the sign of . Therefore,