Opinion Divergence Reveals the Complexity of Community Structure in Networks

Opinion Divergence Reveals the Complexity of Community Structure in Networks

Ren Ren Jinliang Shao University of Electronic Science and Technology of China, Sichuan, China
Abstract

Community structure plays an important role on the analysis of real-world (biological, social, neural) networks. The observation on realistic organizations shows that communities often exhibit hierarchical and overlapping structures. Here, applying a primary DeGroot model in opinion dynamics to the network, we study the divergence of the agents’ opinions in the progress of reaching consensus and then propose a robust and efficient framework of synchronization to detect hierarchical and overlapping communities. The proposed framework and obtained results suggest a set of ”functional definitions” of the hierarchy and overlapping in modular structures, among which the interpretation and necessity of highly overlapping is firstly reported, shedding a light on the study of the complexity of modular networks.

keywords:
Complex Networks, Opinion Dynamics, Community Detection, Hierarchical Structure, Overlapping Structure
journal: Journal Name

1 Introduction

A remarkable feature found in actual observations of complex networks is the structure of clusters, for example, the karate network Zachary (1977), dolphins’ social network Lusseau (2003), the citation networks Martin et al. (2013) are all reported to contain clusters. To acquire better comprehension of the key feature, the concept community structure was proposed Girvan and Newman (2002). Usually, it believes that communities are groups of nodes linked densely inside their own group but weakly with other groups. Giving hints of latent semantic structure, communities reveal functions and organizations of complex systems. However, uncovering community structure (called community detection) in networks is a great challenge.

Without strict definition accepted commonly, many metrics of networks have been proposed to describe and evaluate communities Girvan and Newman (2002); Lancichinetti et al. (2009); Ahn et al. (2010); Ding et al. (2016); Chakraborty et al. (2017). Therein, the modularity of Newman and Girvan Girvan and Newman (2002) is the most exploited. Given these fitness functions, community detection is changed into a optimization problem. Newman optimized modularity exploiting graph spectra theory Newman (2013). Based on the works of Newman, Smyth maps the nodes into the spectral space and then applies traditional clustering methods to them Smyth and White (2005). Besides the modularity, other fitness functions are also widely used to identify communities.

Apart from static optimization, community structure can also be detected by simulating dynamical processes on the network. Random walk is one of the longest studied dynamics so far Pons and Latapy (2006); Newman and Girvan (2004); Tabrizi et al. (2013). In some approaches, with node similarity or edge betweenness measured during the process, the communities are finally determined by the fitness functions such as modularity. Other methods using no metrics of networks have also been developed, hence offer new understandings on community structure Rosvall and Bergstrom (2008, 2007); Mor?rescu and Girard (2011); Arenas et al. (2006); Quiles et al. (2016); Raghavan et al. (2007), among which Morarescu Mor?rescu and Girard (2011) and Arenas Arenas et al. (2006) detect communities by grouping nodes reaching local consensus in the process of synchronization simulation. More detailed, Arenas identifies communities according to the time of convergence, while Morarescu employs a system with time-decaying confidence field. Unfortunately, those approaches relying on local consensus may be unreliable and unstable Fortunato and Castellano (2012); Said and Johansson (2015), as it shown by the authors and other researchers.

It is not enough to reveal communities of one layer merely. Often presenting a hierarchical structure, community structure has its own complexity, where large communities can contain small sub-communities Clauset et al. (2008); Girvan and Newman (2002), for instance, the network of traffic Guimera et al. (2005). The main issue of detecting hierarchical communities is to find meaningful levels in the network after getting a hierarchical dendrogram of partitions. A series of indicators have been employed to determine good levels, but none of them is accepted widely Ahn et al. (2010); Lancichinetti et al. (2009); Jeub et al. (2017).

In addition to hierarchy, another complexity of communities is that one node can belong to more than one group, leading to overlapping communities Palla et al. (2005). Assigning nodes to several groups simultaneously is even more difficult especially when the communities are highly overlapping Lee et al. (2010). One of the mainstream approaches of detecting overlapping community structure is link clustering Ahn et al. (2010); Evans and Lambiotte (2009), which assumes that the links in networks are rarely overlapping. Therefore, there are some arguments on the efficiency of this kind of approaches Fortunato (2010); Fortunato and Hric (2016). Another mainstream approach is the local expansion methods Lee et al. (2010); Lancichinetti et al. (2009); Evans (2010); Whang et al. (2016), relying on empirical fitness functions to decide the range of communities. Which is better, vertex or link clustering? Are the local benefit functions reasonable? The necessity of highly overlapping structures (both highly overlaps on vertices and edges) is still unclear, though Li et.al provide a ”functional definition” of single overlapping nodes in the framework of synchronization Li et al. (2008).

In fact, the emergence of communities in social networks may be relevant to the evolution of the agents’ opinions. A living example is the karate network with two communities reported by Zachary Zachary (1977). Because of the divergence on fees of the members in the karate club, they split into two groups and hence two communities in the network. Intuitively, an agent is much more influenced by agents in the same groups than those in other groups now that intra-community links are dense while inter-community links are weak. Therefore, the divergence of the opinions between different communities will be large, resulting in the emergence of communities in the end. This interesting case inspired us to focus on the divergence of opinion dynamics on networks to explore the community structure of high complexity.

In this paper, we propose a new method detecting hierarchical and overlapping community structure in complex networks based on opinion dynamics. A robust and practical synchronization method are given at first time. Moreover, we offer a functional interpretation of highly overlapping structure, hence highly overlapping structure in network is of necessity. Last but not least, able to find hierarchical and overlapping communities at the same time, we provide new understandings of community structure from the view of opinion dynamics.

2 Definitions and Declarations

2.1 the DeGroot Model

In opinion dynamics, agents interact with their neighbors and update their opinions coded by real numbers according to a set of rules (protocol) to get global consensus. We use the simplest discrete-time DeGroot model in this work. Given a set of agents , denoting their relations by , thus the set of vertices and the set of edges form the graph describing the opinion dynamics system. The opinion of agent at time is denoted by , . and is the vector of opinions. Considering the discrete-time model, the protocol can be simplified to . In this article, we discuss the simple protocol below:

(1)

where is the neighbors of agent , is the weight of the edge connecting and , and is the weighted degree of node . Writing the system in the form of matrix,

(2)
(3)

where and is the diagonal matrix of , . is the Laplacian matrix, and is called random walk normalized Laplacian matrix. Notice that the graph is connected, according to linear algebra theory, satisfies and , then establishes. So,

(4)

2.2 Propositions

We use the model above to simulate the communication of opinions on the network. To describe the divergence of two agents who communicate directly in the procedure of reaching consensus, we define the accumulative opinion distance:

Definition 1.

The accumulative opinion distance from to if :

(5)

is the initial time of calculating the accumulative opinion distance, specially, is the total opinion distance of node and . For the sake of simplicity, we use ”accumulative opinion distance of an edge ” to express the meaning ”accumulative opinion distance between node and ”. When the system reaches a global consensus, the accumulative opinion distance of each edge converge to a constant. In this case, we declare that:

Proposition 1.

An edge with high accumulative opinion distance have more possibility to be intra-community than inter-community.

Once the proposition is plausible, we can get robust communities by removing edges with high accumulative opinion distance iteratively. Since that there is no strict definition of communities, we give an informal explanation from the view of the spectral methods Smyth and White (2005); Zhang et al. (2007).

Let us extend the equation 5 first. Though is not hermitian, it has real eigenvalues.

(6)

where is the diagonal matrix of eigenvalues, , and . is the matrix whose columns are eigenvectors. For the convenience of statement, we arrange from small to large. Let , then

(7)

Let . Notice that is symmetric, .

(8)

We finally get

(9)

then we have

(10)

where is the eigenvalue of from large to small and . , . Further, we assume that the initial opinions of the agents are stochastic and independent with each other in our simulation. For convenience, we assume .

Since that is positive for all , equals

(11)

where represents the expectation of the random variable . Now we can define normalized accumulative opinion distance only decided by the graph :

(12)

Scott White and Padhraic Smyth embed the nodes into a Euclidean space of dimension using ’s eigenvectors of top eigenvalues ( is the number of communities), each node corresponds to a dimensional vector. After that they employ traditional clustering methods such as K-means to cluster the embedded nodes hence get communities.

As shown in our equation 12, when is large, is probably large, which means node has a long distance with node in the Euclidean space, they are more likely to belong to different clusters.

2.3 A Further Comprehension

To acquire a further understanding of the eigenvectors, we construct a simple model of opinion dynamics, seeing communities as groups of agents with similar opinions.

Only considering non-overlapping communities, the two nodes of can belong to two communities at most. Let and represent the two communities, and the vector indicate the preference of the nodes for communities. means that node prefers , whereas means prefers to . Note that it does not mean we can only deal with the graphs with two communities, since only indicates the preference not belonging. In fact, nodes can be divided into multi-clusters based on the solutions of a series of two-side problems.

In the model there are agents with different opinions, they communicate with others via the edges, giving and receiving influence on opinions. Each agent spreads its influence, i.e., its expectation or hope for others’ opinions according to its own. The model is realistic in the situation that everyone has little information about others. Obviously, can represent the strength of the interaction from agent to . Let be the influence passes to . Considering the meanings of , , and , we have:

(13)

We can simply choose the form of . However, to avoid the influence of different quantity of interactions, we normalize it,

(14)

Consider the expression , where is the vector of opinions, and let be the element of , we have . which is the sum of others’ influence on node omitting the self-loops. Hence we get a linear system, whose balanced state suggests that each node holds an opinion consistent with all others’ expectation, just like the construction of the concept eigenvector centrality of a node Bonacich (1972). , thus satisfies , where is the eigenvector of , , is the matrix of , each column of which indicates a stationary state of the system. For the convenience of our statement, we arrange from large to small, so , .

Based on the model and the expressions above, we rewrite the equation 12,

(15)

One can believe that is the weighted total preference to and in state, thus the term reflects the unbalance of community sizes under partition. Observing the expression 15, when the opinion distance between and is large, must be large for some ’s, which indicate that in our model of community structure, the preference or possibility of node and node to be in the different communities is large. We can declare that:

In our model of community structure, Proposition 1 is reasonable. Moreover, as seen in the formula 15, the more unbalanced the communities’ sizes are, the higher the edges’ opinion distance is.

3 Algorithms

3.1 Detecting Communities

In social relationships, small gaps between of the opinions of the agents in the same party or group also exist while the gaps between different communities are sharp. So we not only study the opinion distance of the edges but also the difference with the former link’s when sorted in ascending order. We can detect communities of complex networks by removing the edges of large opinion distance iteratively. Now we state our algorithm as follows:

  1. Remove the hanging edges recursively.

  2. Initiate randomly, then calculate and the accumulative of each edge according to formula 1 and 5. Repeat the simulation for some times and average the opinion distance. For the stability of the algorithm, update synchronously.

  3. Sort the edges by their accumulative opinion distance in an ascending order, mark the edges on the ”steep tail” of opinion distance.

  4. Delete the marked edges in a descend order if the edge is not a hanging edge after previous removal. Put the part falling off from the network aside.

  5. Repeat the steps above until one of the terminating conditions stated later holds. The connected components we get are communities.

In the early period of simulation, the opinions of agents shake sharply and even depart from the mean value, will avoid the influence of the shake, thus improve the algorithm. The phrase ”steep tail” suggests the accumulative opinion distance of a link is large and much larger than that of the link before in the sorted sequence, see Fig. (b)b. When the intra-community links are very weak after removal, the standard deviation of the converged opinions will be quite large (at least two orders of magnitude larger than before). Iterate for several times more, if there is no subgraph is split, the operation should be terminated. In this situation, the removed links are those with small-degree nodes at the periphery, as the term in formula 12 indicates. We give two terminating conditions for our algorithm.

1. There is no ”steep tail” in the sorted edges. See Fig. (b)b.

Or,

2. the variance of opinions increases sharply but no new connected component subgraph emerges. See Fig. (c)c.

(a) 2 components of Dolphins
(b) opinion distance in iterations
Figure 1: Dolphins are divided into 2 connected components after 3 iterations, orange edges are marked in the first iteration, blue ones in the second iteration, in the last iteration no edges are marked.

We emphasize that the time of updating the agents’ opinions is limited, since it is unnecessary to make the agents converge completely. still, we suggest that the simulation should keep until it has reduced two orders of magnitude of the opinions’ standard deviation at least.

In order to explain our algorithm more clearly. We take the famous Dolphin social network Lusseau (2003) as an example (Fig. 1). As it shown, the dolphins are divided into two groups corresponding to the ground truth. The orange edges are marked in the first iteration, since the opinion distance between node and is less than the opinion distance between node and , node belongs to the left community. And the blue edges are marked in the second iteration, node belongs to the right community. Fig. (b)b shows the ”steep tail” phenomenon, in which we can see that the yellow line of the third iteration has no ”tail”.

3.2 Hierarchical Structure

Most of the published algorithms detect hierarchical communities by adjusting the parameters of the fitness functions in the algorithms. However, the hierarchy is natural in our framework. From top to down, nested communities are identified just by applying our approach to the big subgraphs generated at last level until one of the terminating conditions mentioned above is satisfied. Considering how the hierarchical communities are identified, it is believed that the communities at each level all report a consensus of different extent.

3.3 Overlapping Communities

Overlaps belong to several communities at the same time, in other words, they do not really belong to any group. We found that when the overlapping nodes are connected to each other, the opinion distance between connected overlaps is lower than their distance with main parts of communities, hence some small group of nodes will emerge during edge removal. The nodes in the groups are often weakly linked but have much more external edges. Just opposite to real communities, we call these small groups ”pseudo communities”. By studying the pseudo communities, dense overlaps can be revealed. As for single overlapping vertices, they should have similar opinion distance to more than one community.

To judge the component is pseudo or real, we simply use the proportion , where is the external degree of component , and is the internal degree of . We also design a score describing the preference of single nodes and pseudo communities to real communities to determine the assignments of overlaps. The preference of singles:

(16)

where is a component, is the set of the neighbors of node , and is the initial accumulative opinion distance between and before iterations of edge removal. The denominator makes the score satisfy the normalization condition . Now we can assign preference of each node to all the communities. The additional steps for overlapping community detection:

  1. Calculate for every components we get in non-overlapping community detection, and mark components of small size and high (at least larger than ) as ”pseudo communities” .

  2. For every nodes in the pseudo communities, calculate their preference score to the real communities they connect. If there is any node only connected to pseudo communities, assign its scores by the mean preference scores to other real communities of its component.

  3. For each pseudo community, calculate the mean preference scores of their nodes, and let the average score be the small component’s preference to real communities. Simply, let the preference of a real community to itself be and others .

  4. Calculate for vertices in real communities. If is pseudo, convert the score to preference to real communities by multiplying the fake community’s preferences.

  5. With the preference of each node to all the real communities obtained, remove the rather small scores, for example, less than , then re-normalize the scores so that they sum up to .

4 Results

4.1 Real-world Networks

To demonstrate the effectiveness of our methods on the real-world networks, we test our algorithm on Karate Zachary (1977), Dolphins Lusseau (2003), Football (corrected) Newman and Girvan (2004); Evans (2010). See our results on Dolphins, Karate, Football and Lesmis in Fig. 1-4.

Our method gets completely correct result on Dolphins, almost correct partitions with only one misplacement on Karate, while other algorithms tend to get more communities Blondel et al. (2008); Rosvall et al. (2009); Raghavan et al. (2007); Tabrizi et al. (2013). We detect the communities consisting of non-independent nodes correctly in Football. Besides the dense communities, there is a ”loose” community in the corrected Football network, composed of only five independent nodes.

By the contrast of the networks before and after edge removal, it is obvious that we delete the intra-community links without destroying the components. It is convincing that the opinion distance can embody the relationships between communities in real-world networks, so that the approach unveils community structure efficiently exploiting the dynamics on the network merely.

(a) original Karate network
(b) detected components of Karate
Figure 2: The results of the Karate, the nodes’ colors in (a) show the ground truth, and the colors in (b) show the partitions of the BGLL algorithm Blondel et al. (2008).
(a) corrected Football network
(b) identified components of Football
Figure 3: Results on the corrected Football network-nodes of the same color are in the same community. Note that the white nodes are independents.
(a) original weighted Lesmis network
(b) identified components of Lesmis
Figure 4: Results on the Lesmis, where the nodes’ colors show the partitions of BGLL algorithm Blondel et al. (2008).

4.2 Synthetic Networks

LFR benchmarks Lancichinetti and Fortunato (2009); Lancichinetti et al. (2008) based on the statistical property of complex networks such as power-law degree distribution are widely used to evaluate the performance of community detection algorithms. The key parameter determine the complexity of community detection is the mixing parameter , which is the fraction of a node’s edges shared with members of other groups. Using generally accepted Normalized Mutual Information (NMI Fortunato (2010)) as the performance measure, we generated four LFR networks with as benchmarks. The detailed parameters of the benchmarks are displayed in Table 1, and the comparison between ours and several other famous methods is shown in Table 2.

Networks
LFR1 0.1 1000 20 2.5 1.5 20 200 13
LFR2 0.3 1000 20 2.5 1.5 20 200 20
LFR3 0.5 1000 30 2 1.2 40 200 11
LFR4 0.6 1000 30 2 1.2 40 200 13
Table 1: The parameters of 4 LFR benchmark graphs. : the mixing parameter, : number of nodes, : average degree of nodes, : negative exponent of degree’s power:law distribution, : negative exponent of the community size’s power-law distribution, : the minimum size of communities, : the maximum size of communities, : number of communities.
Our Method Infomap Rosvall et al. (2009) Louvain Blondel et al. (2008) WalkTrap Pons and Latapy (2006)
C Iter NMI C NMI C NMI C NMI
LFR1 13 1 1.00 13 1.00 13 1.00 13 1.00
LFR2 20 2 1.00 20 1.00 20 1.00 20 1.00
LFR3 11 3 1.00 11 1.00 11 1.00 11 1.00
LFR4 13 7 0.88 425 / 13 0.98 13 0.95
Karate 2 2 0.84 3 0.70 4 0.59 2 0.73
Dolphins 3 2 1.00 4 0.50 4 0.48 2 0.82
Table 2: Comparison of several methods on some networks. C: number of detected communities; Iter: number of iterations; NMI: normalized mutual information.

The comparison between our algorithm and other recognized approaches shows that our method is robust and efficient. Moreover, it is noteworthy that our methods is excellent to reveal the true relationships in real-world social networks.

Given the efficiency of our algorithm on real-world and synthetic networks, we propose a novel understanding of communities in complex networks:
A community is a group of agents easy to get local agreement but have significant divergence with agents in other groups in the process of communication.

4.3 Highly Overlapping Networks

Testing our algorithms on some real-world networks, we found some small loose groups, which are identified as connected overlapping nodes. Hence the links in the fragments are overlapping if the two end points have common preference for communities. Our results on the Facebook relationship network of Caltech Traud et al. (2011) (Fig. (a)a) show that this situation is not unusual. We detected communities in the network, whose extent of overlapping is displayed via the nodes’ colors. The deeper a node’s color (both warm and cold) is, the more communities the node belongs to. The nodes belong to communities at most and in average, thus the network is highly overlapping.

Note the community in warm colors emphasized at the right-upper corner of the figure, most of its agents are overlaps. It breaks the general belief that the overlaps are present at the frontier of communities. However, this pattern is not hard to understand. For instance, with the development of the interdisciplinary, a new independent discipline may appear. Concisely, union of the overlaps may lead to new communities. In fact, Yang and Leskovec reported the presence of dense overlaps and the indistinguishability from community cores Yang and Leskovec (2014).

Another interesting structure we found in the Facebook network is the triangle of two overlaps and a node only connecting to them. The two overlaps both belong to communities while the node with only links (specially in black for emphasis) belongs to communities due to the third node receive information from more than groups via the overlaps. It is also acceptable to view the triangle as a independent group when the connection between the three are strong enough. The two structures discussed above reflect the complexity of overlapping structure in networks.

(a) Communities detected in Caltech
(b) Comparison of community sizes
Figure 5: (a): overlapping communities detected in the Facebook100-Caltech network. The colors get deeper with the extent of overlapping increase. The community in warm colors are extremely highly overlapping. The black node only connected with two overlaps belongs to communities. (b): Our method gets 880 communities on the citation network, the GCE algorithm gets 1539 communities using 3-cliques, 1064 communities using 4-cliques.

For comparison, the GCE method Lee et al. (2010) gets communities with default parameters. The two more communities are small, with 17 and 38 nodes. The nodes belong to communities in average. We ignored the two smallest communities, viewing all their nodes as overlaps. The MOSES McDaid and Hurley (2010) method acquires communities with mean overlapping groups for each node.

Apart from dense networks, our algorithm has superior on sparse networks with small communities. On the citation network reported by Newman in 2001 Newman (2001), we detected communities with 3 members and communities with 4 members. To get triangular groups, clique expansion methods must use 3-cliques as seeds for expansion, causing the loss of accuracy. It is recommended to use 4-cliques as seeds in some articles Lee et al. (2010). See the sizes of detected communities in Fig. (b)b for details.

In the experiments, we found some interesting overlapping structures in real-world networks. Our results on some long studied social networks suggest that highly overlaps of both vertices and edges are significant and common. We give a functional interpretation for highly overlapping structure from the view of opinion dynamics:

The overlapping agents connected together have less divergence with each other, but more independence from main groups. The interaction of overlaps partly causes the complexity of overlapping structures in networks.

4.4 Hierarchical Networks

To validate our algorithm on networks with nested communities, we applied it to the real-world social network Jazz Gleiser and Danon (2003). The splits of graphs were at the 2nd, 8th and 14th iteration (), and no other fallen subgraphs except several triangles weak linked with the main part of large communities. Since the amount of iterations between divisions are negligible, it is reasonable to see each split as one level. See the results on Fig. 6.

(a) Jazz
(b) Final results
(c) Detected structure
Figure 6: (a) The ranking colors of links denote the order of removal. The redder, the fewer iterations; the bluer, the more iterations before removal. (b) The nodes in deep red are connected overlaps, the accumulative opinion distance between them are lower the that between the nodes outside their small group since communities scramble for overlaps. (c) The detected hierarchical structure of Jazz, the numbers are sizes of communities.

As for the synthetic hierarchical networks, we introduce a hierarchical benchmark proposed by Fortunato et al Jeub et al. (2017). For authority, we generated a benchmark graph with suggested parameters whose structure is displayed in Fig. 7.

After 5 iterations, the standard deviation of the converged opinions increased from initial to . The smallest community at the high level was separated. We put the small subgraph aside, and the remaining graph was divided into two groups just at the next iteration. Therefore, the three communities we got so far are of the same level. Continuing dealing with the three subgraphs independently, we eventually got all the communities of the lower level, shown in Fig. 8. Fig. (c)c shows the final state of the small communities, where the standard deviation of the opinions after communication is large ( ), while the subgraph is not divided. In this situation, we should terminate the detection, hence there are only two layers of communities.

(a) The structure of the benchmark graph
(b) Orginal benchmark graph
Figure 7: The details of the benchmark graph with 1000 nodes and 14334 edges. The degree of nodes is between minimum 5 and maximum 70, subjecting to power-law distribution with exponent 2.
(a) First group was sperated
(b) Final results
(c) Terminating state
Figure 8: (a) The communities of the upper level. (b) The communities of the lower level. (c) Applying the algorithm on one of the small communities, the nodes appear radially. There is no sub-groups any more.
Figure 9: Some histograms of edges and opinion distance at two levels after each iteration. The edges with zero opinion distance are removed or hanging. Only hanging edges are removed at Iteration 0.

We found out all the communities of all levels with few misplaced nodes, revealing the hierarchy of the network naturally in our framework. For a better understanding of hierarchical structure, we drew the histograms of all kind of edges in some iterations in Fig. 9 now that the communities is acquired. The data on the horizontal axis is the accumulative opinion distance (), and the data on the vertical axis is the number of links belonging to the small intervals of opinion distance. As it shown clearly, the links lying between higher level of communities have larger median opinion distance than links lying between lower communities, while the internal edges have smallest mid-value of opinion distance.

The difference shown in the histograms, on the one hand, is an evidence of the validity of the proposed method, edges between high-level communities are prior to be removed. On the other hand, it provides a new comprehension of hierarchy:

Hierarchy reflects the extent of opinion divergence, or in other words, the difficulty of reaching consensus at distinct levels, that is, communities at different levels have different extent of inner consensus and external divergence. More specifically, the opinion divergence between upper communities are deeper than that between lower communities, thus a great divergence between lower groups may be a small divergence relatively in an upper group. It is in accordance with our experience on government organizations, knowledge systems, etc.

5 Discussion

5.1 Time Complexity

It is quite complex to analyze the time complexity of our algorithms accurately. Generally, the time complexity of our algorithm is , where is the number of iterations, is the time of opinion convergence, is the number of repeats of the simulation to reduce the influence of random, determined by users, is the number of the links in the network, and is the number of nodes (the time complexity of removing hanging edges, detecting connected components, calculating preference scores are all and sorting the edges costs time). is related to the topology and the extent of convergence, to estimate is a challenge. In our practice, we set as a constant for the adequacy of convergence. The number of iterations partly depends on the parameter of the network and the policy of the choice of links to be removed. Let be the fraction of removed edges in average at each iteration, . because of the terminating condition 1. It is believed that the complexity of the algorithm is where is a constant.

5.2 Discussion and Future Works

Presenting a novel method for detecting communities, we provide a new understanding of community structure and its hierarchy from the view of opinion dynamics. Without any external fitness index, the algorithm reveals communities naturally, offering an evidence of community structure in complex networks. And by changing the protocols of opinion dynamics system, this set of approaches can deal with networks with different communication rules. Furthermore, the idea can be used for disease control and network defense now that the key connections are found. In the future, we will study the difference of community structure in networks of different functions.

Extending our algorithm to overlapping community detection, we show that highly overlaps in networks are significant and natural. More importantly, we give a functional definition of highly overlapping structures, shedding some light on the origin of overlaps in networks. Offering some evidence on the presence of overlapping edges, we suggest that link clustering methods assuming that links are rarely overlapping may need further study.

We stress that our approaches are not only able to work independently, but can assist to improve the performance of other methods. Now that the two nodes of high opinion distance are probably to be in different communities, they can be seen as ”guards” to prevent unnecessary merging and expansion in community detection. Moreover, communities are enhanced now that the external links removed much more the internal links.

6 Conclusion

In this paper, we studied the divergence of agents in the process of reaching a global agreement in opinion dynamics. The divergence between communities are much greater than that inside. Exploiting this point, we developed a novel and efficient method to detect hierarchical communities. The practice on real-world networks and synthetic networks testifies that the algorithm can reveal communities conforming to the ground truth robustly. The strength of our method is that it determines hierarchical communities at significative levels naturally. Using no metrics and benefit functions, our innovative approach provides an appealing interpretation of community structure and its hierarchy: A community is a group composed of nodes with light divergence, but diverges greatly with other communities. Furthermore, levels of divergence reveal the hierarchy of communities in complex networks.

Calculating the preference of a vertex to all the communities based on the divergence, we detect highly overlapping communities with a low additional cost. More importantly, we give a functional interpretation for highly overlapping nodes and links from the perspective of opinion dynamics-the opinion divergence between overlaps is lower than that from main groups. The overlapping structure in complex networks is significant and essential.

Unified singly, highly overlapping and hierarchy, we provide a complete framework to understand and identify the complexity of community structure from the prospective of opinion dynamics, unveiling the sociological significance of communities in complex networks.

References

  • Zachary (1977) W. W. Zachary, An Information Flow Model for Conflict and Fission in Small Groups, Journal of Anthropological Research 33 (1977) 452–473.
  • Lusseau (2003) D. Lusseau, The emergent properties of a dolphin social network, Proceedings of the Royal Society B: Biological Sciences 270 (2003) S186–S188.
  • Martin et al. (2013) T. Martin, B. Ball, B. Karrer, M. E. Newman, Coauthorship and citation patterns in the Physical Review, Physical Review E 88 (2013) 1–10.
  • Girvan and Newman (2002) M. Girvan, M. E. J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences 99 (2002) 7821–7826.
  • Lancichinetti et al. (2009) A. Lancichinetti, S. Fortunato, J. Kertész, Detecting the overlapping and hierarchical community structure in complex networks, New Journal of Physics 11 (2009).
  • Ahn et al. (2010) Y. Y. Ahn, J. P. Bagrow, S. Lehmann, Link communities reveal multiscale complexity in networks, Nature 466 (2010) 761–764.
  • Ding et al. (2016) Z. Ding, X. Zhang, D. Sun, B. Luo, Overlapping Community Detection based on Network Decomposition, Scientific Reports 6 (2016) 1–11.
  • Chakraborty et al. (2017) T. Chakraborty, A. Dalmia, A. Mukherjee, N. Ganguly, Metrics for Community Analysis, ACM Computing Surveys 50 (2017) 1–37.
  • Newman (2013) M. E. J. Newman, Spectral methods for community detection and graph partitioning, Physical Review E 88 (2013) 042822.
  • Smyth and White (2005) S. Smyth, S. White, A spectral clustering approach to finding communities in graphs, Proceedings of the 5th SIAM International Conference on Data Mining (2005) 76–84.
  • Pons and Latapy (2006) P. Pons, M. Latapy, Computing Communities in Large Networks Using Random Walks, Journal of Graph Algorithms and Applications 10 (2006) 191–218.
  • Newman and Girvan (2004) M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks, Physical Review E 69 (2004) 26113.
  • Tabrizi et al. (2013) S. Tabrizi, A. Shakery, M. Asadpour, M. Abbasi, Personalized pagerank clustering: A graph clustering algorithm based on random walks, Physica A: Statistical 392 (2013) 5772–5785.
  • Rosvall and Bergstrom (2008) M. Rosvall, C. T. Bergstrom, Maps of random walks on complex networks reveal community structure., Proceedings of the National Academy of Sciences 105 (2008) 1118–23.
  • Rosvall and Bergstrom (2007) M. Rosvall, C. T. Bergstrom, An information-theoretic framework for resolving community structure in complex networks, Proceedings of the National Academy of Sciences 104 (2007) 7327–7331.
  • Mor?rescu and Girard (2011) I. C. Mor?rescu, A. Girard, Opinion dynamics with decaying confidence: Application to community detection in graphs, IEEE Transactions on Automatic Control 56 (2011) 1862–1873.
  • Arenas et al. (2006) A. Arenas, A. Díaz-Guilera, C. J. Pérez-Vicente, Synchronization Reveals Topological Scales in Complex Networks, Physical Review Letters 96 (2006) 114102.
  • Quiles et al. (2016) M. G. Quiles, E. E. N. Macau, N. Rubido, Dynamical detection of network communities, Scientific Reports 6 (2016) 1–10.
  • Raghavan et al. (2007) U. N. U. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect community structures in large-scale networks, Physical Review E 76 (2007) 1–11.
  • Fortunato and Castellano (2012) S. Fortunato, C. Castellano, Community structure in graphs, Computational Complexity: Theory, Techniques, and Applications 9781461418 (2012) 490–512.
  • Said and Johansson (2015) I. Said, A. Johansson, Multi-agent Approach to Community Detection in Complex Networks, Ph.D. thesis, 2015.
  • Clauset et al. (2008) A. Clauset, C. Moore, M. E. J. Newman, Hierarchical structure and the prediction of missing links in networks, Nature 453 (2008) 98–101.
  • Guimera et al. (2005) R. Guimera, S. Mossa, A. Turtschi, L. A. N. Amaral, The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles, Proceedings of the National Academy of Sciences 102 (2005) 7794–7799.
  • Jeub et al. (2017) L. G. S. Jeub, O. Sporns, S. Fortunato, Multiresolution Consensus Clustering in Networks (2017) 1–15.
  • Palla et al. (2005) G. Palla, I. Derényi, I. Farkas, T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435 (2005) 814–818.
  • Lee et al. (2010) C. Lee, F. Reid, A. McDaid, N. Hurley, Detecting highly overlapping community structure by greedy clique expansion (2010) 33–42.
  • Evans and Lambiotte (2009) T. S. Evans, R. Lambiotte, Line graphs, link partitions, and overlapping communities, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 80 (2009) 1–9.
  • Fortunato (2010) S. Fortunato, Community detection in graphs, Physics Reports 486 (2010) 75–174.
  • Fortunato and Hric (2016) S. Fortunato, D. Hric, Community detection in networks: A user guide, 2016.
  • Evans (2010) T. S. Evans, Clique graphs and overlapping communities, Journal of Statistical Mechanics: Theory and Experiment 2010 (2010) 1–21.
  • Whang et al. (2016) J. J. Whang, D. F. Gleich, I. S. Dhillon, Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion, IEEE Transactions on Knowledge and Data Engineering 28 (2016) 1272–1284.
  • Li et al. (2008) D. Li, I. Leyva, J. A. Almendral, I. Sendiña-Nadal, J. M. Buldú, S. Havlin, S. Boccaletti, Synchronization interfaces and overlapping communities in complex networks, Physical Review Letters 101 (2008) 2–5.
  • Zhang et al. (2007) S. Zhang, R.-S. Wang, X.-S. Zhang, Identification of overlapping community structure in complex networks using fuzzy -means clustering, Physica A: Statistical Mechanics and its Applications 374 (2007) 483–490.
  • Bonacich (1972) P. Bonacich, Factoring and Weighting Approaches to Clique Detection, Journal of Mathematical Sociology 2 (1972) 113–120.
  • Blondel et al. (2008) V. D. Blondel, J. L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment 2008 (2008).
  • Rosvall et al. (2009) M. Rosvall, D. Axelsson, C. T. Bergstrom, The map equation, European Physical Journal: Special Topics 178 (2009) 13–23.
  • Lancichinetti and Fortunato (2009) A. Lancichinetti, S. Fortunato, Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 80 (2009) 1–8.
  • Lancichinetti et al. (2008) A. Lancichinetti, S. Fortunato, F. Radicchi, Benchmark graphs for testing community detection algorithms, Physical Review E 78 (2008) 046110.
  • Traud et al. (2011) A. L. Traud, P. J. Mucha, M. A. Porter, Social Structure of Facebook Networks, SSRN Electronic Journal 391 (2011) 4165–4180.
  • Yang and Leskovec (2014) J. Yang, J. Leskovec, Structure and Overlaps of Ground-Truth Communities in Networks, ACM Transactions on Intelligent Systems and Technology 5 (2014) 1–35.
  • McDaid and Hurley (2010) A. McDaid, N. Hurley, Detecting highly overlapping communities with model-based overlapping seed expansion, in: 2010 International Conference on Advances in Social Network Analysis and Mining, pp. 112–119.
  • Newman (2001) M. E. J. Newman, From the Cover: The structure of scientific collaboration networks, Proceedings of the National Academy of Sciences 98 (2001) 404–409.
  • Gleiser and Danon (2003) P. Gleiser, L. Danon, Community Structure in Jazz, Advances in Complex Systems 06 (2003) 565–573.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
148395
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description