# Community detection by label propagation with compression of flow

## 1Introduction

Real-life complex systems in many research fields such as biology, sociology, economy and computer science, can be studied as networks with nodes representing for individuals and links for interactions or relations between individuals. Many networks exhibit the so-called community structure: nodes tend to organize themselves in groups such that connections are denser within groups while sparser between groups. Community structure is a prominent feature of complex networks, as it often represents functional modules with nodes of common properties and accounts for the functionality of the system. Community detection enables us to probe the organization and functional behavior of real-world systems, therefore has been paid much attention and applied to many kinds of networks, including the collaboration networks [1], social networks [2], and biological networks [3], etc.

Community detection has been studied as the graph partitioning in computer science for decades and remains quite challenging. Algorithms to detect reasonably good quality communities have been proposed and improved extensively [4], especially in recent years, such as Girvan-Newman algorithm [5], spectral clustering [6], multi-state spin model [8] (e.g., q-state Potts model), random walk [11], modularity optimization [14] and statistical inference [18].

As one of the fastest algorithms for community detection, the label propagation algorithm (LPA) [22] uses the network structure alone to guide its process and requires neither parameters nor optimization of any object function. It starts by assigning each node a unique label, indicating the community it belongs to. At every label propagation step, each node sequentially updates its label to a new one that most of its neighbors own. If more than one label is the most frequent, the new label is chosen randomly among them. The label propagation step is performed iteratively until each node has a label that is the most frequent among its neighbors’. Through this iterative process, the densely connected groups of nodes form consensus on one label to form communities. Finally, LPA converges when no node changes its label anymore. Therefore, nodes with the same label are classified into the same community. In addition to its nearly linear time complexity, LPA introduces no parameter and requires no priori information of communities, and thus is suitable to process large-scale networks with millions of nodes and edges.

Due to the frequent tie-breaks and the random order update strategy, LPA usually delivers multiple partitions starting from the same initial condition, with different random seeds. Raghavan et al. [22] proposed to label each node with the set of all labels it has in different partitions to detect possible overlapping communities. However, in a recent paper, Tibely and Kertesz [23] showed that this method was equivalent to finding the local minima in a simple zero-temperature kinetic Potts model. The number of such local minima was found to be much larger than the number of nodes in the underlying network. Aggregating partitions suggested by Raghavan et al. [22] leads to a fragmentation of the resulting partition in small clusters when the number of aggregated partitions is large.

In order to eliminate undesired solutions, Barber and Clark [24] proposed a modularity-specialized LPA (LPAm) to constrain the label propagation process, which is inclined to get stuck in poor local maximum of modularity. To solve this problem, Liu et al. [25] introduced an advanced modularity-specialized LPA (LPAm+), which is more stable than LPAm. Due to the usage of modularity, the capability of both algorithms will be affected by the resolution limit [26].

Leung et al. [27] have found that LPA often yields partitions with one giant community together with much smaller ones when applied to online social networks. In order to avoid such a disturbing feature, they proposed a modified method by adding a decreasing score assignment for each label in label propagation process (LPA-), which encourages the formation of a stronger local community and deters the occurrence of trivial solutions. Tests of LPA- on the LFR benchmark produced good results [28]. To save the running time of LPA-, Leung et al. proposed to avoid label update of those nodes with high neighbor purity [27]. Since the neighbor purity ignores contribution of the small degree nodes to the community detection, the detection precision is not high enough.

In this paper, we propose the LPAf which introduces a new update rule to update the label of a node by taking into account the compression of flow (random walks on a network), and uses an incomplete update condition in label propagation process to speed up the convergence. Like LPAm+, LPAf employs a multi-step greedy agglomerative algorithm (MSG) [29] to simultaneously merge multiple pairs of communities. Although LPAf is also applicable to weighted and directed networks, we currently focus on unweighted and undirected networks. The paper is organized as follows. In Section 2, we present our new method in detail. Experimental results on synthetic and real-world networks are shown in Section 3. Finally, the main findings are summarized in Section 4.

## 2Algorithm

To reveal community structures in networks, Rosvall and Bergstrom [30] introduced an information theoretic approach (known as Infomap algorithm). They use the probability flow of random walks on a network as a proxy for information flows in real systems and decompose the network into communities by compressing a description of the probability flow.

For a network partition of nodes containing communities, the average description length of random walks is defined as [30],

where

and

in which and .

Here is the probability of exiting community , is the probability that the random walker switches to a different community at any given time step, is the probability of visiting node and is the fraction of time the random walker spends in community plus the probability of exiting that community.

By combining Eqs. (Equation 1), (Equation 2) and (Equation 3), the expanded form of map equation can be written as,

Note that the term is independent of partitioning. Consequently, when we update the label of node from to , it is sufficient to only keep track of changes of and . They can be easily derived for any update event, and updating them is fast and straightforward (see Appendix A for details).

We extend the LPA by modifying the label update rule so that the average description length can be minimized. When update the label for , we pick the one with the smallest (as illustrated on *karate* network in Fig. ?). Hence, our new update rule can be expressed as,

where is the current label for node , is the new label for node , includes the labels of the neighboring nodes of , is the change of when update the label of node from community to (see Appendix A for details), and returns the label that minimizes . If more than one label shares the same minimum of , the new label is chosen randomly among them. The label propagation step is performed iteratively until no longer decreases.

In our tests, this update rule helps form local subgroups. However, it alone does not provide satisfying performance in dealing with large-scale networks, as it usually gets stuck in poor local minima in space.

In order to escape the local minimum, we adopted a greedy rule for merging communities that minimizes , i.e., when the LPA with our new update rule gets stuck in a local minimum (no decrease in can be achieved via further label propagation), we calculate the changes of for merging pairs of communities, and merge those pairs that decrease the most. In actual operation, we employ the MSG technique to simultaneously merge multiple pairs of communities (as illustrated in Fig. ?). After merging communities, we escape the local minimum. Then we should perform another round of label propagation using the new update rule. This is analogous to downhill into another local minimum. However, it is not guaranteed that the new local minimum reached is good enough. Hence the above process should be repeated indefinitely until no longer decreases.

To avoid unnecessary updates in each iteration of LPAf, the incomplete update condition proposed in Ref. [31] was adopted. Consequently, we only update the labels of the active nodes which would change their labels if they attempt to update. A list containing all currently active nodes is maintained to allow the algorithm to finish execution when the list is empty (i.e., we only track the nodes that potentially change their labels). The pseudo-code of our algorithm is presented in Algorithm ?.

## 3Results

Many metrics have been proposed to quantify the quality of a network partition. When the ground truth is unknown, a common measure for the significance of the identified community structure is *modularity* [5], which is defined as,

where is the total number of edges in the network. if nodes and are connected and 0 otherwise, is the probability in the null model that an edge exists between nodes and , and is the Kronecker function: two vertices and provide a non-zero contribution to the value of if and only if they belong to the same community. The concept of *modularity* is based on the idea that a random graph is not expected to exhibit the community structure.

For a more sufficient assessment of the significance of detected communities, we also adopt the *modularity density* [32] and the *conductance* [33] metrics.

Given an undirected network, the modularity density is defined as

where is the set of all the communities, is any given community in , is the internal density of community , is the pair-wise density between communities and , is the number of edges between nodes within community , is the number of edges from the nodes in community to the nodes of other communities, and is the number of edges between communities and . Compared to modularity, the *modularity density* is an improved measurement for assessing the quality of communities, since it does not suffer from the well-known resolution limit of modularity.

For a community , the conductance is defined as

where is the degree of node . Informally, *conductance* is the fraction of total edge volume that points outside the community . Lower values of *conductance* imply that the communities have more internal connections than external ones, and thus represent more significant communities. Due to the fact that conductance cannot be easily extended to an entire community structure of a network, results are commonly assessed at different scales separately in the form of *network community profile (NCP)* [34] plots.

For networks with known community structures, two metrics from the field of information theory [35] are adopted to compare identified communities with the true ones. The first one, *normalized mutual information (NMI)* [36], estimates the amount of information correctly extracted by the detection algorithms and has become a de facto standard to quantify the quality of a detected partition with respect to the ground truth. It is defined as,

where and denote two partitions of the network, , is the Shannon entropy of and is the conditional entropy of given . *NMI* equals 1 if the detected partition is identical to the real one, whereas it has an expected value of 0 if the detected partition is totally independent of the real one.

The second metric is the *variation of information(VOI)* [37], which has several desirable properties with respect to *NMI*. Specifically, it can be regarded as a kind of distance in the space of partitions. *VOI* of and is defined as

Thus, lower values represent higher similarities between partitions. The value of *VOI* ranges from 0 to , where is the network size. Therefore, we divide the obtained values by for meaningful comparisons.

We have tested our algorithm on both synthetic and real-world networks. For comparisons, five algorithms, the original LPA [22], the neighbor strength driven LPA (nsdLPA) [31], the Louvain method [14], the Infomap algorithm [30], and the fine-tuned modularity density algorithm (FineTune) [38], are included in the experiments as references. The nsdLPA enhances the basic LPA by taking into account the positive neighborhood strength, and is generally efficient in practice [31]. The Louvain method is a greedy optimization algorithm that attempts to optimize the modularity of a partition, which usually produces high modularity values and is by far one of the most widely used method for detecting communities in large networks [14]. The Infomap algorithm decomposes a network into communities by compressing a description of information flow on the network as mentioned above [30]. The FineTune algorithm iteratively attempts to improve the modularity density measurement by splitting and merging the given network community structure [38].

### 3.1Tests on synthetic networks

We first tested our method on the well-known GN benchmark [39], and compared the results to the counterparts of other methods. The GN benchmark network consists of 128 nodes, each with expected degree 16, which are divided into four groups with 32 nodes each. The mixing parameter measures the ratio of the external degree of a node with respect to its community to the total degree of the node.

The results of different methods on the GN benchmark networks are shown in Fig. ?. As can be seen, Louvain method performs fairly well on the GN benchmark network. This indicates that the community size of the GN benchmark network is not below the resolution limit, and the optimization of modularity indeed reveals the true partitions. LPAf performs next to Louvain method, and significantly better than the rest four methods. All the methods except Louvian and FineTune arrive at the same stable value of at high , which corresponds to the trivial partition. LPAf cannot detect the real communities in this range by minimizing , because the trivial partition has a lower than the real partition.

We also adopted the LFR benchmark [28], which is a special case of the planted partition model [40]. LFR networks are similar to real-world networks, since all of them are characterized by heterogeneous distributions of node degrees and community sizes. In our experiments, the parameters are fixed as follows: node degrees and community sizes are governed by the power law, with exponents being -2 and -1 respectively; the maximum degree is 50; the ranges of community sizes are [10,50] and [20,100] for smaller and bigger communities respectively; the network size is either 1000 or 5000. The significance of community structure is controlled by a mixing parameter where smaller values correspond to more obvious community structure. is the expected fraction of links of a node connecting to other communities.

Results are assessed in terms of average NMI, shown in Fig. ?, which shows that, the LPAf outperforms other methods consistently for a wide range of . In contrast to the GN benchmark, Louvain method fails to detect the real communities even when is small for larger networks with smaller communities. This is due to the well-known resolution limit of modularity, i.e., there exists a size cutoff below which modularity cannot identify communities [26]. In order to optimize modularity, Louvain method tends to merge natural communities into much larger ones, which leads to rather poor performance. FineTune does not have remarkable performance either, as it also starts to fail for low values of . The nsdLPA performs better than LPA due to the consideration of the positive neighbor strength. Infomap performs comparably with LPAf in larger networks but is outperformed by LPAf when networks are small. Moreover, LPAf is more stable than LPA, because of the lower standard deviation of its NMI scores. The results thus confirm that LPAf performs better than or at least as well as the rest five methods in all the LFR networks.

To further address the validity of LPAf, we also computed the average ratio of the number of detected communities to the number of actual ones and showed them in Fig. ?. As can be seen, the number of communities detected by the LPAf is very close to the actual one up to a high in all cases. The number of communities detected by nsdLPA is larger than the actual one at high values of , which implies that nsdLPA tends to form local subgroups and favors smaller communities due to the consideration of neighborhood strength. Louvain method tends to find less communities than planted ones due to the resolution limit of modularity, whereas FineTune normally detects more communities than actual ones in most cases. This indicates that FineTune resolves, to a certain degree, the resolution problem of Louvain method. In most cases, Infomap tends to find sightly less communities than actual ones.

To compare the computational loads of different methods, we plot the average elapsed times in Fig. ?. Generally, the running times of all methods increase when gets larger. This is due to that when is small, the communities are well separated and all the methods can easily detect them in a short period of time. When increases to a specific value where the community structure still persists but is much more difficult to be revealed, the convergence speed slows down and thus results in peaks of the curves. When continues increasing, most of the methods cannot detect non-trivial communities and converge sightly faster than at the transition stage. Specifically, LPA and nsdLPA are faster than the rest four algorithms. LPAf, Louvain and Infomap exhibit similar time consuming patterns.

To test how well LPAf performs in finding the local minimum in space, we computed the values of for the partitions detected by LPAf and plotted them in Fig. ?. Due to the global minimum of is not available, -values of the planted partitions are adopted as references. As can be seen, when is small, i.e., the community structure is clear enough, the detected partitions and the true partitions almost have the same values of , which indicates that LPAf correctly finds the real communities in the corresponding range of . When increases to a specific value, decreases rapidly to a stable value on smaller networks, which corresponds to the trivial partition that the whole network is regarded as a single community. As the trivial partition has a lower value of than the planted one above a certain value of , LPAf cannot detect any non-trivial communities within this range. However in larger networks, LPAf yields larger than that of the planted partition above a certain value of , which implies that LPAf is trapped in a suboptimal valley in space.

### 3.2Tests on real-world networks

We also applied the algorithms to several real-world networks that are commonly used for tests. The details of such networks are listed in Table ?.

We first compared directly the stability of different methods. All the methods are applied to each network 1000 times and the numbers of distinct detected partitions are reported. The pairwise VOI of the partitions are also computed to further evaluate the robustness of the methods. FineTune is not considered here since it is a deterministic algorithm. Due to the time complexity, two larger networks, *power* and *mat-cond*, are excluded from the analysis. Results are shown in Tables ? and ?. It is shown that LPAf is comparatively stable with less distinct partitions in most cases. LPA and Infomap are relatively unstable, even on smaller networks. Louvain method and nsdLPA have similar robustness, except on the *netsci* network where Louvain method yields the most stable results. Moreover, as shown in Table ?, the values of pairwise VOI between the partitions revealed by LPAf are lower than those for other methods in most cases. This concludes that LPAf is significantly more robust than LPA, and performs fairly stable.

Next, we detailedly analyzed the three networks (*karate*, *dolphins*, and *football*) which have known community structures. Fig. ? shows the communities detected by LPAf on *karate* and *dolphin* networks with the lowest . Zachary’s karate club is a social network of friendships between 34 members of a karate club at a US university in the 1970s. It splits into two smaller clubs after a dispute between club president John (node 34) and instructor Mr. Hi (node 1). As can be seen, three communities are discovered in this network by our algorithm. One of the two real communities is divided into two small ones (as shown in Fig. ? (left panel)). The dolphin social network describes the frequent associations between 62 dolphins living off Doubtful Sound, New Zealand. The links represent that dolphins are observed to stay together more often than expected by chance during the years from 1994 to 2001. Four communities identified by our algorithm in this network are shown in Fig. ? (right panel).

The *football* network describes football games among Division IA colleges during regular season Fall 2000. As shown in Fig. ?, the 115 nodes in the network represent teams, which are grouped into eleven different conferences, except for five independent teams. The regular season games between each pair of teams are shown as 613 edges of the network. Our algorithm identifies eleven communities within this network, as shown in Fig. ?. Among them, eight conferences are correctly identified. The three remaining communities closely resemble the Conference USA, Sun Belt and Western Athletic conferences. Five independent teams that do not belong to any conference tend to be grouped with the conferences which they are most closely associated.

For comparison, we applied different methods to *karate*, *dolphins*, *books* and *football* networks, and measured the NMI between the real partitions and those detected by different methods. The average values of NMI over 1000 runs are shown in Table ?. FineTune is deterministic and thus we only run it once. As one can see, LPAf performs fairly well on the *karate* and the *football* networks, although not the best. However it does not work well on the other two networks. The reason could be that the known partitions of these two networks do not have the lowest values of , which prevents LPAf from detecting the real communities on these networks.

In Table ?, we also reported average modularity of the detected partitions for all networks so as to enable a complete comparison. It is not surprising that Louvain method yields the highest values on almost all networks, because it is based on the optimization of modularity. Therefore, for clarity, we show the results of Louvain method in the rightmost column of the table. We also mark the best results of the rest methods in bold type. As one can see, LPAf achieves the best performance among those methods which do not directly optimize modularity in most cases.

In Table ?, we presented average modularity density of partitions detected by different methods. FineTune is based on the optimization of modularity density. Therefore, we show the results of FineTune in the last column of the table and highlight the best results of the rest methods in bold type. As one can see, LPAf performs quite well in terms of in most cases.

In Table ?, we compared different methods in terms of . -values of the true partitions are presented as references. As seen, LPAf achieves the best performance in most cases. It should be pointed out that the true partitions do not possess the global minimum of . LPAf always obtains a lower than that of the true partition in some networks. This explains why LPAf cannot detect the real communities correctly on these networks.

Lastly, we further analyzed the two larger networks, *power* and *mat-cond*. For simplicity, we only compared LPA and LPAf. We ran each method 100 times and analyze the conductances of the detected communities at various scales. The results are given in the form of NCP plots, as shown in Fig. ?. NCP plots evaluate the quality of the best community (in terms of conductance) as a function of its size. Previous studies show that many kinds of real-world networks exhibit a common characteristic structure of NCP plots, i.e., initial decreasing and subsequent increasing trend [34].

In the case of *power* network, LPAf detects communities on a much boarder scale with significant lower conductances, including also larger communities with around 80 nodes. On the *mat-cond* network, both LPAf and LPA find the best communities at the same scale (i.e, at around 15 nodes), while the conductances of LPAf are sightly lower than that of LPA. Note that LPA reveals a number of larger communities with significant high conductances in both networks (i.e., blue circles in the top right part of the top two plots of Fig. ?), which could be that many tie-breaks encountered in the label propagation process contributes to the formation of some large communities with high conductances.

### 3.3Time complexity

Given a network with nodes and edges, let be the maximum degree of nodes in this network. The time complexity of each step of LPAf is roughly estimated as follows:

Initialization

takes time of . Assigning a unique label to each node takes time of .

Label propagation

takes time at most . For each node, it iterates through at most neighbors, thus, the upper bound of cost time of this step is .

Merging communities

takes time at most . Merging pairs of communities using MSG requires a time of in the worst case (see Ref. [29] for detailed analysis).

Steps 2 and 3 are repeated, so the time per iteration is . Consequently, the time complexity of LPAf is roughly .

To evaluate the efficiency of LPAf, we have run LPA, nsdLPA, LPAf, Infomap and Louvain method on LFR networks with different sizes. Due to the high time complexity, FineTune is not considered in the benchmark situation. We repeated each experiment 30 times and reported the average running times. As shown in Fig. ?, the time complexity of LPA and nsdLPA is quite lower compared to the rest three methods. Still, all methods exhibit near linear time complexity and can be easily scaled to larger networks.

## 4Conclusion

In this paper, we propose a modified label propagation algorithm (LPAf) to detect community structures in networks. In this algorithm, we introduce a new update rule which updates the label of a node by compressing a description of probability flow. Besides, by employing a multi-step greedy agglomerative algorithm, we merge pairs of communities so as to escape local minima in -space. Furthermore, an incomplete update condition is adopted to accelerate the convergence.

We test the proposed algorithm on both synthetic and real-world networks, and compare its performance with that of the other five widely used methods in terms of *modularity*, *modularity density*, NMI, VOI and *conductance*. Firstly, we find that LPAf performs very well on synthetic networks. In contrast to the Louvain method, LPAf is able to detect small communities in large networks. Secondly, we find that, LPAf detects communities which have lower conductances than that of LPA; by minimizing , LPAf may fail to detect the real community structure which does not have the lowest ; LPAf is generally more stable than LPA. Finally, we analyze the time complexity of LPAf and find that it depends linearly on the network size in sparse networks.

In the future work, we intend to test our algorithm on weighted and directed networks. We also plan to extend our approach to overlapping community detection by allowing each node possess multi-labels.

## Acknowledgements

This work was in part supported by the Program of Introducing Talents of Discipline to Universities under grant no. B08033, and National Natural Science Foundation of China (Grant No. 11505071).

## Author contribution statement

J.H. designed the algorithm, implemented the experiments, and prepared all the figures. J.H., L.Z. and Z.S. analyzed the results. All authors wrote, reviewed and approved the manuscript.

## Appendix A: The change of average description length when a node moves from one community to another

From Eq. (Equation 4), for undirected and unweighted networks, the change of average description length when a node updates its label from to is given by,

with

where is the total number of edges of the network, and are the nodes in community and respectively, and is the neighbors of . Extension to directed and weighted networks is straightforward.

### References

- M.E.J. Newman, Proceedings of the National Academy of Sciences of the United States of America
**98**, 404 (2001),`0007214`

- J. Scott,
*Social Network Analysis: A Handbook*, Vol. 3 (2000), ISBN 0761963391,`http://www.amazon.com/dp/0761963391`

- D.A. Fell, A. Wagner, Nat Biotech
**18**, 1121 (2000) - S. Fortunato, Physics Reports
**486**, 75 (2010) - M.E.J. Newman, M. Girvan, Phys. Rev. E
**69**, 026113 (2004) - M.E.J. Newman, Phys. Rev. E
**74**, 036104 (2006) - S. White, P. Smyth,
*A Spectral Clustering Approach To Finding Communities in Graph.*, in*SDM*(2005),`citeseer.ist.psu.edu/734075.html`

- J. Reichardt, S. Bornholdt, Phys. Rev. Lett.
**93**, 218701 (2004) - S.W. Son, H. Jeong, J.D. Noh, The European Physical Journal B - Condensed Matter and Complex Systems
**50**, 431 (2006) - J.M. Kumpula, J. Saramäki, K. Kaski, J. Kertész, The European Physical Journal B
**56**, 41 (2007) - H. Zhou, R. Lipowsky, in
*Computational Science - ICCS 2004*, edited by M. Bubak, G. van Albada, P. Sloot, J. Dongarra (Springer Berlin Heidelberg, 2004), Vol. 3038 of*Lecture Notes in Computer Science*, pp. 1062–1069, ISBN 978-3-540-22116-6,`http://dx.doi.org/10.1007/978-3-540-24688-6_137`

- P. Pons, M. Latapy, in
*Computer and Information Sciences - ISCIS 2005*, edited by p. Yolum, T. Güngör, F. Gürgen, C. Özturan (Springer Berlin Heidelberg, 2005), Vol. 3733 of*Lecture Notes in Computer Science*, pp. 284–293, ISBN 978-3-540-29414-6,`http://dx.doi.org/10.1007/11569596_31`

- Ochab, J.K., Burda, Z., Eur. Phys. J. Special Topics
**216**, 73 (2013) - V.D. Blondel, J.L. Guillaume, R. Lambiotte, E. Lefebvre, Journal of Statistical Mechanics: Theory and Experiment
**2008**, P10008 (2008) - Barber, Michael J., Eur. Phys. J. B
**86**, 385 (2013) - Waltman, Ludo, Jan van Eck, Nees, Eur. Phys. J. B
**86**, 471 (2013) - Xiang, J., Hu, X.G., Zhang, X.Y., Fan, J.F., Zeng, X.L., Fu, G.Y., Deng, K., Hu, K., Eur. Phys. J. B
**85**, 352 (2012) - M.E.J. Newman, E.A. Leicht, Proceedings of the National Academy of Sciences
**104**, 9564 (2007),`http://www.pnas.org/content/104/23/9564.full.pdf`

- M. Mungan, J.J. Ramasco, Journal of Statistical Mechanics: Theory and Experiment
**2010**, P04028 (2010) - W. Ren, G. Yan, X. Liao, L. Xiao, Phys. Rev. E
**79**, 036111 (2009) - J.M. Hofman, C.H. Wiggins, Phys. Rev. Lett.
**100**, 258701 (2008) - U.N. Raghavan, R. Albert, S. Kumara, Phys. Rev. E
**76**, 036106 (2007) - G. Tibély, J. Kertész, Physica A: Statistical Mechanics and its Applications
**387**, 4982 (2008) - M.J. Barber, J.W. Clark, Phys. Rev. E
**80**, 026129 (2009) - X. Liu, T. Murata, Physica A: Statistical Mechanics and its Applications
**389**, 1493 (2010) - S. Fortunato, M. Barthélemy, Proceedings of the National Academy of Sciences
**104**, 36 (2007),`http://www.pnas.org/content/104/1/36.full.pdf`

- I.X.Y. Leung, P. Hui, P. Liò, J. Crowcroft, Phys. Rev. E
**79**, 066107 (2009) - A. Lancichinetti, S. Fortunato, F. Radicchi, Phys. Rev. E
**78**, 046110 (2008) - A. Oades, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics
**77**, 1 (2008),`0712.1163`

- M. Rosvall, D. Axelsson, C.T. Bergstrom, The European Physical Journal Special Topics
**178**, 13 (2009) - J. Xie, B. Szymanski,
*Community detection using a neighborhood strength driven Label Propagation Algorithm*, in*Network Science Workshop (NSW), 2011 IEEE*(2011), pp. 188–195 - M. Chen, T. Nguyen, B.K. Szymanski,
*On Measuring the Quality of a Network Community Structure*, in*Social Computing (SocialCom), 2013 International Conference on*(2013), pp. 122–127 - B. Bollobas,
*Modern Graph Theory*, Graduate Texts in Mathematics (Springer New York, 1998), ISBN 9780387984889,`https://books.google.ca/books?id=SbZKSZ-1qrwC`

- J. Leskovec, K.J. Lang, A. Dasgupta, M.W. Mahoney, Internet Mathematics
**6**, 29 (2009) - D.J. MacKay,
*Information Theory, Inference and Learning Algorithms*(Cambridge University Press, 2003) - L. Danon, A. Díaz-Guilera, J. Duch, A. Arenas, Journal of Statistical Mechanics: Theory and Experiment
**2005**, P09008 (2005) - M. Meilă, Journal of Multivariate Analysis
**98**, 873 (2007) - M. Chen, K. Kuzmin, B.K. Szymanski, IEEE Transactions on Computational Social Systems
**1**, 46 (2014) - M. Girvan, M.E.J. Newman, Proceedings of the National Academy of Sciences
**99**, 7821 (2002),`http://www.pnas.org/content/99/12/7821.full.pdf`

- A. Condon, R.M. Karp, Random Structures & Algorithms
**18**, 116 (2001) - W.W. Zachary, Journal of Anthropological Research
**33**, 452 (1977) - D. Lusseau, K. Schneider, O. Boisseau, P. Haase, E. Slooten, S. Dawson, Behavioral Ecology and Sociobiology
**54**, 396 (2003) - V. Krebs (2008)
- M.E.J. Newman, SIAM Review
**45**, 167 (2003),`http://dx.doi.org/10.1137/S003614450342480`

- L.A. Adamic, N. Glance,
*The Political Blogosphere and the 2004 U.S. Election: Divided They Blog*, in*Proceedings of the 3rd International Workshop on Link Discovery*(ACM, New York, NY, USA, 2005), LinkKDD ’05, pp. 36–43, ISBN 1-59593-215-1,`http://doi.acm.org/10.1145/1134271.1134277`

- D.J. Watts, S.H. Strogatz, Nature
**393**, 440 (1998)