Community detection by label propagation with compression of flow
Abstract
The label propagation algorithm (LPA) has been proved to be a fast and effective method for detecting communities in large complex networks. However, its performance is subject to the nonstable and trivial solutions of the problem. In this paper, we propose a modified label propagation algorithm LPAf to efficiently detect community structures in networks. Instead of the majority voting rule of the basic LPA, LPAf updates the label of a node by considering the compression of a description of random walks on a network. A multistep greedy agglomerative strategy is employed to enable LPAf to escape the local optimum. Furthermore, an incomplete update condition is also adopted to speed up the convergence. Experimental results on both synthetic and realworld networks confirm the effectiveness of our algorithm.
pacs:
89.75.FbStructures and organization in complex systems and 89.75.HcNetworks and genealogical trees1 Introduction
Reallife complex systems in many research fields such as biology, sociology, economy and computer science, can be studied as networks with nodes representing for individuals and links for interactions or relations between individuals. Many networks exhibit the socalled community structure: nodes tend to organize themselves in groups such that connections are denser within groups while sparser between groups. Community structure is a prominent feature of complex networks, as it often represents functional modules with nodes of common properties and accounts for the functionality of the system. Community detection enables us to probe the organization and functional behavior of realworld systems, therefore has been paid much attention and applied to many kinds of networks, including the collaboration networks Newman2001 (), social networks Scott2000 (), and biological networks Fell2000 (), etc.
Community detection has been studied as the graph partitioning in computer science for decades and remains quite challenging. Algorithms to detect reasonably good quality communities have been proposed and improved extensively Fortunato201075 (), especially in recent years, such as GirvanNewman algorithm PhysRevE.69.026113 (), spectral clustering PhysRevE.74.036104 (); White05 (), multistate spin model PhysRevLett.93.218701 (); Son2006 (); Kumpula2007 () (e.g., qstate Potts model), random walk Bubak2004 (); Pons2005 (); Ochab2013 (), modularity optimization 17425468200810P10008 (); Barber2013 (); Waltman2013 (); Xiang2012 () and statistical inference Newman05062007 (); 17425468201004P04028 (); PhysRevE.79.036111 (); PhysRevLett.100.258701 ().
As one of the fastest algorithms for community detection, the label propagation algorithm (LPA) PhysRevE.76.036106 () uses the network structure alone to guide its process and requires neither parameters nor optimization of any object function. It starts by assigning each node a unique label, indicating the community it belongs to. At every label propagation step, each node sequentially updates its label to a new one that most of its neighbors own. If more than one label is the most frequent, the new label is chosen randomly among them. The label propagation step is performed iteratively until each node has a label that is the most frequent among its neighbors’. Through this iterative process, the densely connected groups of nodes form consensus on one label to form communities. Finally, LPA converges when no node changes its label anymore. Therefore, nodes with the same label are classified into the same community. In addition to its nearly linear time complexity, LPA introduces no parameter and requires no priori information of communities, and thus is suitable to process largescale networks with millions of nodes and edges.
Due to the frequent tiebreaks and the random order update strategy, LPA usually delivers multiple partitions starting from the same initial condition, with different random seeds. Raghavan et al. PhysRevE.76.036106 () proposed to label each node with the set of all labels it has in different partitions to detect possible overlapping communities. However, in a recent paper, Tibely and Kertesz TibÃ©ly20084982 () showed that this method was equivalent to finding the local minima in a simple zerotemperature kinetic Potts model. The number of such local minima was found to be much larger than the number of nodes in the underlying network. Aggregating partitions suggested by Raghavan et al. PhysRevE.76.036106 () leads to a fragmentation of the resulting partition in small clusters when the number of aggregated partitions is large.
In order to eliminate undesired solutions, Barber and Clark PhysRevE.80.026129 () proposed a modularityspecialized LPA (LPAm) to constrain the label propagation process, which is inclined to get stuck in poor local maximum of modularity. To solve this problem, Liu et al. Liu20101493 () introduced an advanced modularityspecialized LPA (LPAm+), which is more stable than LPAm. Due to the usage of modularity, the capability of both algorithms will be affected by the resolution limit Fortunato02012007 ().
Leung et al. PhysRevE.79.066107 () have found that LPA often yields partitions with one giant community together with much smaller ones when applied to online social networks. In order to avoid such a disturbing feature, they proposed a modified method by adding a decreasing score assignment for each label in label propagation process (LPA), which encourages the formation of a stronger local community and deters the occurrence of trivial solutions. Tests of LPA on the LFR benchmark produced good results PhysRevE.78.046110 (). To save the running time of LPA, Leung et al. proposed to avoid label update of those nodes with high neighbor purity PhysRevE.79.066107 (). Since the neighbor purity ignores contribution of the small degree nodes to the community detection, the detection precision is not high enough.
In this paper, we propose the LPAf which introduces a new update rule to update the label of a node by taking into account the compression of flow (random walks on a network), and uses an incomplete update condition in label propagation process to speed up the convergence. Like LPAm+, LPAf employs a multistep greedy agglomerative algorithm (MSG) Oades2008 () to simultaneously merge multiple pairs of communities. Although LPAf is also applicable to weighted and directed networks, we currently focus on unweighted and undirected networks. The paper is organized as follows. In Sec. 2, we present our new method in detail. Experimental results on synthetic and realworld networks are shown in Sec. 3. Finally, the main findings are summarized in Sec. 4.
2 Algorithm
To reveal community structures in networks, Rosvall and Bergstrom Rosvall2009 () introduced an information theoretic approach (known as Infomap algorithm). They use the probability flow of random walks on a network as a proxy for information flows in real systems and decompose the network into communities by compressing a description of the probability flow.
For a network partition of nodes containing communities, the average description length of random walks is defined as Rosvall2009 (),
(1) 
where
(2) 
and
(3) 
in which and .
Here is the probability of exiting community , is the probability that the random walker switches to a different community at any given time step, is the probability of visiting node and is the fraction of time the random walker spends in community plus the probability of exiting that community.
By combining Eqs. (1), (2) and (3), the expanded form of map equation can be written as,
(4) 
Note that the term is independent of partitioning. Consequently, when we update the label of node from to , it is sufficient to only keep track of changes of and . They can be easily derived for any update event, and updating them is fast and straightforward (see Appendix A for details).
We extend the LPA by modifying the label update rule so that the average description length can be minimized. When update the label for , we pick the one with the smallest (as illustrated on karate network in Fig. 1). Hence, our new update rule can be expressed as,
(5) 
where is the current label for node , is the new label for node , includes the labels of the neighboring nodes of , is the change of when update the label of node from community to (see Appendix A for details), and returns the label that minimizes . If more than one label shares the same minimum of , the new label is chosen randomly among them. The label propagation step is performed iteratively until no longer decreases.
In our tests, this update rule helps form local subgroups. However, it alone does not provide satisfying performance in dealing with largescale networks, as it usually gets stuck in poor local minima in space.
In order to escape the local minimum, we adopted a greedy rule for merging communities that minimizes , i.e., when the LPA with our new update rule gets stuck in a local minimum (no decrease in can be achieved via further label propagation), we calculate the changes of for merging pairs of communities, and merge those pairs that decrease the most. In actual operation, we employ the MSG technique to simultaneously merge multiple pairs of communities (as illustrated in Fig. 2). After merging communities, we escape the local minimum. Then we should perform another round of label propagation using the new update rule. This is analogous to downhill into another local minimum. However, it is not guaranteed that the new local minimum reached is good enough. Hence the above process should be repeated indefinitely until no longer decreases.
To avoid unnecessary updates in each iteration of LPAf, the incomplete update condition proposed in Ref. 6004645 () was adopted. Consequently, we only update the labels of the active nodes which would change their labels if they attempt to update. A list containing all currently active nodes is maintained to allow the algorithm to finish execution when the list is empty (i.e., we only track the nodes that potentially change their labels). The pseudocode of our algorithm is presented in Algorithm 1.
3 Results
Many metrics have been proposed to quantify the quality of a network partition. When the ground truth is unknown, a common measure for the significance of the identified community structure is modularity PhysRevE.69.026113 (), which is defined as,
(6) 
where is the total number of edges in the network. if nodes and are connected and 0 otherwise, is the probability in the null model that an edge exists between nodes and , and is the Kronecker function: two vertices and provide a nonzero contribution to the value of if and only if they belong to the same community. The concept of modularity is based on the idea that a random graph is not expected to exhibit the community structure.
For a more sufficient assessment of the significance of detected communities, we also adopt the modularity density modularitydensity () and the conductance bollobas1998modern () metrics.
Given an undirected network, the modularity density is defined as
(7)  
where is the set of all the communities, is any given community in , is the internal density of community , is the pairwise density between communities and , is the number of edges between nodes within community , is the number of edges from the nodes in community to the nodes of other communities, and is the number of edges between communities and . Compared to modularity, the modularity density is an improved measurement for assessing the quality of communities, since it does not suffer from the wellknown resolution limit of modularity.
For a community , the conductance is defined as
(8) 
where is the degree of node . Informally, conductance is the fraction of total edge volume that points outside the community . Lower values of conductance imply that the communities have more internal connections than external ones, and thus represent more significant communities. Due to the fact that conductance cannot be easily extended to an entire community structure of a network, results are commonly assessed at different scales separately in the form of network community profile (NCP) leskovec2009community () plots.
For networks with known community structures, two metrics from the field of information theory MacKay03a () are adopted to compare identified communities with the true ones. The first one, normalized mutual information (NMI) 17425468200509P09008 (), estimates the amount of information correctly extracted by the detection algorithms and has become a de facto standard to quantify the quality of a detected partition with respect to the ground truth. It is defined as,
(9) 
where and denote two partitions of the network, , is the Shannon entropy of and is the conditional entropy of given . NMI equals 1 if the detected partition is identical to the real one, whereas it has an expected value of 0 if the detected partition is totally independent of the real one.
The second metric is the variation of information(VOI) MEILA2007873 (), which has several desirable properties with respect to NMI. Specifically, it can be regarded as a kind of distance in the space of partitions. VOI of and is defined as
(10) 
Thus, lower values represent higher similarities between partitions. The value of VOI ranges from 0 to , where is the network size. Therefore, we divide the obtained values by for meaningful comparisons.
We have tested our algorithm on both synthetic and realworld networks. For comparisons, five algorithms, the original LPA PhysRevE.76.036106 (), the neighbor strength driven LPA (nsdLPA) 6004645 (), the Louvain method 17425468200810P10008 (), the Infomap algorithm Rosvall2009 (), and the finetuned modularity density algorithm (FineTune) FineTune (), are included in the experiments as references. The nsdLPA enhances the basic LPA by taking into account the positive neighborhood strength, and is generally efficient in practice 6004645 (). The Louvain method is a greedy optimization algorithm that attempts to optimize the modularity of a partition, which usually produces high modularity values and is by far one of the most widely used method for detecting communities in large networks 17425468200810P10008 (). The Infomap algorithm decomposes a network into communities by compressing a description of information flow on the network as mentioned above Rosvall2009 (). The FineTune algorithm iteratively attempts to improve the modularity density measurement by splitting and merging the given network community structure FineTune ().
3.1 Tests on synthetic networks
We first tested our method on the wellknown GN benchmark Girvan11062002 (), and compared the results to the counterparts of other methods. The GN benchmark network consists of 128 nodes, each with expected degree 16, which are divided into four groups with 32 nodes each. The mixing parameter measures the ratio of the external degree of a node with respect to its community to the total degree of the node.
The results of different methods on the GN benchmark networks are shown in Fig. 3. As can be seen, Louvain method performs fairly well on the GN benchmark network. This indicates that the community size of the GN benchmark network is not below the resolution limit, and the optimization of modularity indeed reveals the true partitions. LPAf performs next to Louvain method, and significantly better than the rest four methods. All the methods except Louvian and FineTune arrive at the same stable value of at high , which corresponds to the trivial partition. LPAf cannot detect the real communities in this range by minimizing , because the trivial partition has a lower than the real partition.
We also adopted the LFR benchmark PhysRevE.78.046110 (), which is a special case of the planted partition model RSA:RSA1001 (). LFR networks are similar to realworld networks, since all of them are characterized by heterogeneous distributions of node degrees and community sizes. In our experiments, the parameters are fixed as follows: node degrees and community sizes are governed by the power law, with exponents being 2 and 1 respectively; the maximum degree is 50; the ranges of community sizes are [10,50] and [20,100] for smaller and bigger communities respectively; the network size is either 1000 or 5000. The significance of community structure is controlled by a mixing parameter where smaller values correspond to more obvious community structure. is the expected fraction of links of a node connecting to other communities.
Results are assessed in terms of average NMI, shown in Fig. 4, which shows that, the LPAf outperforms other methods consistently for a wide range of . In contrast to the GN benchmark, Louvain method fails to detect the real communities even when is small for larger networks with smaller communities. This is due to the wellknown resolution limit of modularity, i.e., there exists a size cutoff below which modularity cannot identify communities Fortunato02012007 (). In order to optimize modularity, Louvain method tends to merge natural communities into much larger ones, which leads to rather poor performance. FineTune does not have remarkable performance either, as it also starts to fail for low values of . The nsdLPA performs better than LPA due to the consideration of the positive neighbor strength. Infomap performs comparably with LPAf in larger networks but is outperformed by LPAf when networks are small. Moreover, LPAf is more stable than LPA, because of the lower standard deviation of its NMI scores. The results thus confirm that LPAf performs better than or at least as well as the rest five methods in all the LFR networks.
To further address the validity of LPAf, we also computed the average ratio of the number of detected communities to the number of actual ones and showed them in Fig. 5. As can be seen, the number of communities detected by the LPAf is very close to the actual one up to a high in all cases. The number of communities detected by nsdLPA is larger than the actual one at high values of , which implies that nsdLPA tends to form local subgroups and favors smaller communities due to the consideration of neighborhood strength. Louvain method tends to find less communities than planted ones due to the resolution limit of modularity, whereas FineTune normally detects more communities than actual ones in most cases. This indicates that FineTune resolves, to a certain degree, the resolution problem of Louvain method. In most cases, Infomap tends to find sightly less communities than actual ones.
To compare the computational loads of different methods, we plot the average elapsed times in Fig. 6. Generally, the running times of all methods increase when gets larger. This is due to that when is small, the communities are well separated and all the methods can easily detect them in a short period of time. When increases to a specific value where the community structure still persists but is much more difficult to be revealed, the convergence speed slows down and thus results in peaks of the curves. When continues increasing, most of the methods cannot detect nontrivial communities and converge sightly faster than at the transition stage. Specifically, LPA and nsdLPA are faster than the rest four algorithms. LPAf, Louvain and Infomap exhibit similar time consuming patterns.
To test how well LPAf performs in finding the local minimum in space, we computed the values of for the partitions detected by LPAf and plotted them in Fig. 7. Due to the global minimum of is not available, values of the planted partitions are adopted as references. As can be seen, when is small, i.e., the community structure is clear enough, the detected partitions and the true partitions almost have the same values of , which indicates that LPAf correctly finds the real communities in the corresponding range of . When increases to a specific value, decreases rapidly to a stable value on smaller networks, which corresponds to the trivial partition that the whole network is regarded as a single community. As the trivial partition has a lower value of than the planted one above a certain value of , LPAf cannot detect any nontrivial communities within this range. However in larger networks, LPAf yields larger than that of the planted partition above a certain value of , which implies that LPAf is trapped in a suboptimal valley in space.
3.2 Tests on realworld networks
We also applied the algorithms to several realworld networks that are commonly used for tests. The details of such networks are listed in Table 1.
Network  Reference  Vertices  Edges 

karate  Zachary’s karate club 10.2307/3629752 ()  34  78 
dolphins  Dolphin social network Lusseau2003 ()  62  159 
books  Books about US politics Krebs2008 ()  105  441 
football  American College football doi:10.1137/S003614450342480 ()  115  613 
blogs  Political blogs Adamic:2005:PBU:1134271.1134277 ()  1490  16715 
netsci  Network scientists PhysRevE.74.036104 ()  1589  2742 
power  US power grid Watts1998 ()  4941  6594 
matcond  Condensed matter collaborations Newman2001 ()  16726  47594 
Realworld networks with community structure.
We first compared directly the stability of different methods. All the methods are applied to each network 1000 times and the numbers of distinct detected partitions are reported. The pairwise VOI of the partitions are also computed to further evaluate the robustness of the methods. FineTune is not considered here since it is a deterministic algorithm. Due to the time complexity, two larger networks, power and matcond, are excluded from the analysis. Results are shown in Tables 2 and 3. It is shown that LPAf is comparatively stable with less distinct partitions in most cases. LPA and Infomap are relatively unstable, even on smaller networks. Louvain method and nsdLPA have similar robustness, except on the netsci network where Louvain method yields the most stable results. Moreover, as shown in Table 3, the values of pairwise VOI between the partitions revealed by LPAf are lower than those for other methods in most cases. This concludes that LPAf is significantly more robust than LPA, and performs fairly stable.
Network  LPA  nsdLPA  LPAf  Louvain  Infomap 

karate  81  11  9  23  32 
dolphins  425  52  72  39  609 
books  191  75  10  73  725 
football  464  78  33  47  706 
netsci  1000  1000  496  181  1000 
Network  LPA  nsdLPA  LPAf  Louvain  Infomap 

karate  0.5189(4)  0.2269(2)  0.00482(2)  0.1967(2)  0.2021(2) 
dolphins  0.4308(2)  0.1387(1)  0.2130(1)  0.2089(2)  0.3177(1) 
books  0.2989(1)  0.1803(1)  0.03818(9)  0.1677(1)  0.3095(1) 
football  0.14251(9)  0.05192(4)  0.02157(4)  0.05211(6)  0.12204(8) 
netsci  0.037384(6)  0.027995(5)  0.008604(5)  0.006858(5)  0.018963(6) 
Next, we detailedly analyzed the three networks (karate, dolphins, and football) which have known community structures. Fig. 8 shows the communities detected by LPAf on karate and dolphin networks with the lowest . Zachary’s karate club is a social network of friendships between 34 members of a karate club at a US university in the 1970s. It splits into two smaller clubs after a dispute between club president John (node 34) and instructor Mr. Hi (node 1). As can be seen, three communities are discovered in this network by our algorithm. One of the two real communities is divided into two small ones (as shown in Fig. 8 (left panel)). The dolphin social network describes the frequent associations between 62 dolphins living off Doubtful Sound, New Zealand. The links represent that dolphins are observed to stay together more often than expected by chance during the years from 1994 to 2001. Four communities identified by our algorithm in this network are shown in Fig. 8 (right panel).
The football network describes football games among Division IA colleges during regular season Fall 2000. As shown in Fig. 9, the 115 nodes in the network represent teams, which are grouped into eleven different conferences, except for five independent teams. The regular season games between each pair of teams are shown as 613 edges of the network. Our algorithm identifies eleven communities within this network, as shown in Fig. 9. Among them, eight conferences are correctly identified. The three remaining communities closely resemble the Conference USA, Sun Belt and Western Athletic conferences. Five independent teams that do not belong to any conference tend to be grouped with the conferences which they are most closely associated.
For comparison, we applied different methods to karate, dolphins, books and football networks, and measured the NMI between the real partitions and those detected by different methods. The average values of NMI over 1000 runs are shown in Table 4. FineTune is deterministic and thus we only run it once. As one can see, LPAf performs fairly well on the karate and the football networks, although not the best. However it does not work well on the other two networks. The reason could be that the known partitions of these two networks do not have the lowest values of , which prevents LPAf from detecting the real communities on these networks.
Network  LPA  nsdLPA  LPAf  Infomap  Louvain  FineTune 

karate  0.689(7)  0.833(3)  0.821(1)  0.751(2)  0.651(1)  0.5925 
dolphins  0.622(3)  0.606(1)  0.520(1)  0.506(1)  0.493(1)  0.4338 
books  0.5494(9)  0.5395(7)  0.5391(3)  0.5414(9)  0.5421(7)  0.4146 
football  0.8834(9)  0.9039(3)  0.9197(3)  0.8994(6)  0.8787(5)  0.9242 
In Table 5, we also reported average modularity of the detected partitions for all networks so as to enable a complete comparison. It is not surprising that Louvain method yields the highest values on almost all networks, because it is based on the optimization of modularity. Therefore, for clarity, we show the results of Louvain method in the rightmost column of the table. We also mark the best results of the rest methods in bold type. As one can see, LPAf achieves the best performance among those methods which do not directly optimize modularity in most cases.
Network  True  LPA  nsdLPA  LPAf  Infomap  FineTune  Louvain 

karate  0.3718  0.344(3)  0.3747(2)  0.4008(1)  0.3994(2)  0.4174  0.4154(2) 
dolphins  0.3787  0.482(1)  0.5239(1)  0.5216(2)  0.5067(5)  0.4547  0.5206(1) 
books  0.4149  0.4959(5)  0.5183(2)  0.52641(6)  0.5163(2)  0.4855  0.52626(6) 
football  0.554  0.5893(4)  0.5673(5)  0.60052(4)  0.5907(2)  0.6005  0.60402(5) 
netsci  0.9028(1)  0.9093(1)  0.9314(1)  0.9313(2)  0.7641  0.95904(2)  
power  0.7175(4)  0.7204(3)  0.8295(2)  0.8297(2)  0.6036  0.93584(7)  
matcond  0.7167(3)  0.7270(2)  0.7695(1)  0.7758(2)  0  0.8479(1) 
In Table 6, we presented average modularity density of partitions detected by different methods. FineTune is based on the optimization of modularity density. Therefore, we show the results of FineTune in the last column of the table and highlight the best results of the rest methods in bold type. As one can see, LPAf performs quite well in terms of in most cases.
Network  True  LPA  nsdLPA  LPAf  Infomap  Louvain  FineTune 

karate  0.1823  0.197(2)  0.1849(7)  0.2168(1)  0.2164(8)  0.2284(3)  0.231 
dolphins  0.1362  0.184(2)  0.1967(7)  0.2060(5)  0.196(1)  0.2009(9)  0.264 
books  0.1267  0.174(1)  0.1952(7)  0.1986(3)  0.193(1)  0.1972(4)  0.2506 
football  0.4281  0.432(3)  0.465(2)  0.482(2)  0.457(2)  0.437(2)  0.4909 
netsci  0.6417(6)  0.6409(4)  0.6136(4)  0.6093(5)  0.5029(3)  0.4866  
power  0.2309(3)  0.2339(3)  0.1527(2)  0.1462(2)  0.02067(7)  0.3106  
matcond  0.3036(3)  0.2979(3)  0.2526(1)  0.2401(2)  0.07047(9)  0.0003 
In Table 7, we compared different methods in terms of . values of the true partitions are presented as references. As seen, LPAf achieves the best performance in most cases. It should be pointed out that the true partitions do not possess the global minimum of . LPAf always obtains a lower than that of the true partition in some networks. This explains why LPAf cannot detect the real communities correctly on these networks.
Network  True  LPA  nsdLPA  LPAf  Infomap  Louvain  FineTune 

karate  4.3408  4.392  4.3483  4.2996(4)  4.319(1)  4.418(1)  4.4018 
dolphins  5.0786  5.068  4.9982(9)  5.099(1)  5.159(2)  5.095(1)  5.7197 
books  6.0373  5.658(2)  5.618(2)  5.5875(7)  5.653(2)  5.603(1)  6.1893 
football  6.3784  6.091(3)  6.311(4)  6.0503(3)  6.104(2)  5.9811(4)  6.055 
netsci  4.135(1)  4.061(1)  3.7949(4)  3.8142(7)  3.9716(5)  6.7201  
power  8.467(1)  8.417(1)  6.8032(6)  6.8549(6)  7.348(1)  10.4736  
matcond  9.419(4)  9.239(3)  8.497(1)  8.608(2)  9.125(2)  13.4179 
Lastly, we further analyzed the two larger networks, power and matcond. For simplicity, we only compared LPA and LPAf. We ran each method 100 times and analyze the conductances of the detected communities at various scales. The results are given in the form of NCP plots, as shown in Fig. 10. NCP plots evaluate the quality of the best community (in terms of conductance) as a function of its size. Previous studies show that many kinds of realworld networks exhibit a common characteristic structure of NCP plots, i.e., initial decreasing and subsequent increasing trend leskovec2009community ().
In the case of power network, LPAf detects communities on a much boarder scale with significant lower conductances, including also larger communities with around 80 nodes. On the matcond network, both LPAf and LPA find the best communities at the same scale (i.e, at around 15 nodes), while the conductances of LPAf are sightly lower than that of LPA. Note that LPA reveals a number of larger communities with significant high conductances in both networks (i.e., blue circles in the top right part of the top two plots of Fig. 10), which could be that many tiebreaks encountered in the label propagation process contributes to the formation of some large communities with high conductances.
3.3 Time complexity
Given a network with nodes and edges, let be the maximum degree of nodes in this network. The time complexity of each step of LPAf is roughly estimated as follows:

Initialization takes time of . Assigning a unique label to each node takes time of .

Label propagation takes time at most . For each node, it iterates through at most neighbors, thus, the upper bound of cost time of this step is .

Merging communities takes time at most . Merging pairs of communities using MSG requires a time of in the worst case (see Ref. Oades2008 () for detailed analysis).
Steps 2 and 3 are repeated, so the time per iteration is . Consequently, the time complexity of LPAf is roughly .
To evaluate the efficiency of LPAf, we have run LPA, nsdLPA, LPAf, Infomap and Louvain method on LFR networks with different sizes. Due to the high time complexity, FineTune is not considered in the benchmark situation. We repeated each experiment 30 times and reported the average running times. As shown in Fig. 11, the time complexity of LPA and nsdLPA is quite lower compared to the rest three methods. Still, all methods exhibit near linear time complexity and can be easily scaled to larger networks.
4 Conclusion
In this paper, we propose a modified label propagation algorithm (LPAf) to detect community structures in networks. In this algorithm, we introduce a new update rule which updates the label of a node by compressing a description of probability flow. Besides, by employing a multistep greedy agglomerative algorithm, we merge pairs of communities so as to escape local minima in space. Furthermore, an incomplete update condition is adopted to accelerate the convergence.
We test the proposed algorithm on both synthetic and realworld networks, and compare its performance with that of the other five widely used methods in terms of modularity, modularity density, NMI, VOI and conductance. Firstly, we find that LPAf performs very well on synthetic networks. In contrast to the Louvain method, LPAf is able to detect small communities in large networks. Secondly, we find that, LPAf detects communities which have lower conductances than that of LPA; by minimizing , LPAf may fail to detect the real community structure which does not have the lowest ; LPAf is generally more stable than LPA. Finally, we analyze the time complexity of LPAf and find that it depends linearly on the network size in sparse networks.
In the future work, we intend to test our algorithm on weighted and directed networks. We also plan to extend our approach to overlapping community detection by allowing each node possess multilabels.
Acknowledgements
This work was in part supported by the Program of Introducing Talents of Discipline to Universities under grant no. B08033, and National Natural Science Foundation of China (Grant No. 11505071).
Author contribution statement
J.H. designed the algorithm, implemented the experiments, and prepared all the figures. J.H., L.Z. and Z.S. analyzed the results. All authors wrote, reviewed and approved the manuscript.
Appendix A: The change of average description length when a node moves from one community to another
From Eq. (4), for undirected and unweighted networks, the change of average description length when a node updates its label from to is given by,
with
where is the total number of edges of the network, and are the nodes in community and respectively, and is the neighbors of . Extension to directed and weighted networks is straightforward.
References
 (1) M.E.J. Newman, Proceedings of the National Academy of Sciences of the United States of America 98, 404 (2001), 0007214
 (2) J. Scott, Social Network Analysis: A Handbook, Vol. 3 (2000), ISBN 0761963391, http://www.amazon.com/dp/0761963391
 (3) D.A. Fell, A. Wagner, Nat Biotech 18, 1121 (2000)
 (4) S. Fortunato, Physics Reports 486, 75 (2010)
 (5) M.E.J. Newman, M. Girvan, Phys. Rev. E 69, 026113 (2004)
 (6) M.E.J. Newman, Phys. Rev. E 74, 036104 (2006)
 (7) S. White, P. Smyth, A Spectral Clustering Approach To Finding Communities in Graph., in SDM (2005), citeseer.ist.psu.edu/734075.html
 (8) J. Reichardt, S. Bornholdt, Phys. Rev. Lett. 93, 218701 (2004)
 (9) S.W. Son, H. Jeong, J.D. Noh, The European Physical Journal B  Condensed Matter and Complex Systems 50, 431 (2006)
 (10) J.M. Kumpula, J. SaramÃ¤ki, K. Kaski, J. KertÃ©sz, The European Physical Journal B 56, 41 (2007)
 (11) H. Zhou, R. Lipowsky, in Computational Science  ICCS 2004, edited by M. Bubak, G. van Albada, P. Sloot, J. Dongarra (Springer Berlin Heidelberg, 2004), Vol. 3038 of Lecture Notes in Computer Science, pp. 1062–1069, ISBN 9783540221166, http://dx.doi.org/10.1007/9783540246886_137
 (12) P. Pons, M. Latapy, in Computer and Information Sciences  ISCIS 2005, edited by p. Yolum, T. GÃ¼ngÃ¶r, F. GÃ¼rgen, C. Ãzturan (Springer Berlin Heidelberg, 2005), Vol. 3733 of Lecture Notes in Computer Science, pp. 284–293, ISBN 9783540294146, http://dx.doi.org/10.1007/11569596_31
 (13) Ochab, J.K., Burda, Z., Eur. Phys. J. Special Topics 216, 73 (2013)
 (14) V.D. Blondel, J.L. Guillaume, R. Lambiotte, E. Lefebvre, Journal of Statistical Mechanics: Theory and Experiment 2008, P10008 (2008)
 (15) Barber, Michael J., Eur. Phys. J. B 86, 385 (2013)
 (16) Waltman, Ludo, Jan van Eck, Nees, Eur. Phys. J. B 86, 471 (2013)
 (17) Xiang, J., Hu, X.G., Zhang, X.Y., Fan, J.F., Zeng, X.L., Fu, G.Y., Deng, K., Hu, K., Eur. Phys. J. B 85, 352 (2012)
 (18) M.E.J. Newman, E.A. Leicht, Proceedings of the National Academy of Sciences 104, 9564 (2007), http://www.pnas.org/content/104/23/9564.full.pdf
 (19) M. Mungan, J.J. Ramasco, Journal of Statistical Mechanics: Theory and Experiment 2010, P04028 (2010)
 (20) W. Ren, G. Yan, X. Liao, L. Xiao, Phys. Rev. E 79, 036111 (2009)
 (21) J.M. Hofman, C.H. Wiggins, Phys. Rev. Lett. 100, 258701 (2008)
 (22) U.N. Raghavan, R. Albert, S. Kumara, Phys. Rev. E 76, 036106 (2007)
 (23) G. TibÃ©ly, J. KertÃ©sz, Physica A: Statistical Mechanics and its Applications 387, 4982 (2008)
 (24) M.J. Barber, J.W. Clark, Phys. Rev. E 80, 026129 (2009)
 (25) X. Liu, T. Murata, Physica A: Statistical Mechanics and its Applications 389, 1493 (2010)
 (26) S. Fortunato, M. BarthÃ©lemy, Proceedings of the National Academy of Sciences 104, 36 (2007), http://www.pnas.org/content/104/1/36.full.pdf
 (27) I.X.Y. Leung, P. Hui, P. Liò, J. Crowcroft, Phys. Rev. E 79, 066107 (2009)
 (28) A. Lancichinetti, S. Fortunato, F. Radicchi, Phys. Rev. E 78, 046110 (2008)
 (29) A. Oades, Physical Review E  Statistical, Nonlinear, and Soft Matter Physics 77, 1 (2008), 0712.1163
 (30) M. Rosvall, D. Axelsson, C.T. Bergstrom, The European Physical Journal Special Topics 178, 13 (2009)
 (31) J. Xie, B. Szymanski, Community detection using a neighborhood strength driven Label Propagation Algorithm, in Network Science Workshop (NSW), 2011 IEEE (2011), pp. 188–195
 (32) M. Chen, T. Nguyen, B.K. Szymanski, On Measuring the Quality of a Network Community Structure, in Social Computing (SocialCom), 2013 International Conference on (2013), pp. 122–127
 (33) B. Bollobas, Modern Graph Theory, Graduate Texts in Mathematics (Springer New York, 1998), ISBN 9780387984889, https://books.google.ca/books?id=SbZKSZ1qrwC
 (34) J. Leskovec, K.J. Lang, A. Dasgupta, M.W. Mahoney, Internet Mathematics 6, 29 (2009)
 (35) D.J. MacKay, Information Theory, Inference and Learning Algorithms (Cambridge University Press, 2003)
 (36) L. Danon, A. DÃazGuilera, J. Duch, A. Arenas, Journal of Statistical Mechanics: Theory and Experiment 2005, P09008 (2005)
 (37) M. MeilÄ, Journal of Multivariate Analysis 98, 873 (2007)
 (38) M. Chen, K. Kuzmin, B.K. Szymanski, IEEE Transactions on Computational Social Systems 1, 46 (2014)
 (39) M. Girvan, M.E.J. Newman, Proceedings of the National Academy of Sciences 99, 7821 (2002), http://www.pnas.org/content/99/12/7821.full.pdf
 (40) A. Condon, R.M. Karp, Random Structures & Algorithms 18, 116 (2001)
 (41) W.W. Zachary, Journal of Anthropological Research 33, 452 (1977)
 (42) D. Lusseau, K. Schneider, O. Boisseau, P. Haase, E. Slooten, S. Dawson, Behavioral Ecology and Sociobiology 54, 396 (2003)
 (43) V. Krebs (2008)
 (44) M.E.J. Newman, SIAM Review 45, 167 (2003), http://dx.doi.org/10.1137/S003614450342480
 (45) L.A. Adamic, N. Glance, The Political Blogosphere and the 2004 U.S. Election: Divided They Blog, in Proceedings of the 3rd International Workshop on Link Discovery (ACM, New York, NY, USA, 2005), LinkKDD ’05, pp. 36–43, ISBN 1595932151, http://doi.acm.org/10.1145/1134271.1134277
 (46) D.J. Watts, S.H. Strogatz, Nature 393, 440 (1998)