On Rich Clubs of Path-Based Centralities in Networks

On Rich Clubs of Path-Based Centralities in Networks

Soumya Sarkar Indian Institute of TechnologyKharagpur soumya015@iitkgp.ac.in Sanjukta Bhowmick University of NebraskaOmaha sbhowmick@unomaha.edu  and  Animesh Mukherjee Indian Institute of TechnologyKharagpur animeshm@cse.iitkgp.ernet.in
Abstract.

Many scale-free networks exhibit a “rich club” structure, where high degree vertices form tightly interconnected subgraphs. In this paper, we explore the emergence of “rich clubs” in the context of shortest path based centrality metrics. We term these subgraphs of connected high closeness or high betweeness vertices as rich centrality clubs (RCC).

Our experiments on real world and synthetic networks highlight the inter-relations between RCCs, expander graphs, and the core-periphery structure of the network. We show empirically and theoretically that RCCs exist, if the core-periphery structure of the network is such that each shell is an expander graph, and their density decreases from inner to outer shells.

We further demonstrate that in addition to being an interesting topological feature, the presence of RCCs is useful in several applications. The vertices in the subgraph forming the RCC are effective seed nodes for spreading information. Moreover, networks with RCCs are robust under perturbations to their structure.

Given these useful properties of RCCs, we present a network modification model that can efficiently create a RCC within networks where they are not present, while retaining other structural properties of the original network.

The main contributions of our paper are: (i) we demonstrate that the formation of RCC is related to the core-periphery structure and particularly the expander like properties of each shell, (ii) we show that the RCC property can be used to find effective seed nodes for spreading information and for improving the resilience of the network under perturbation and, finally, (iii) we present a modification algorithm that can insert RCC within networks, while not affecting their other structural properties. Taken together, these contributions present one of the first comprehensive studies of the properties and applications of rich clubs for path based centralities.

copyright: rightsretained

1. Introduction

In many social networks, the high degree nodes form a densely connected subgraph. This is known as the “rich club” phenomena. In this paper, we extend the definition of rich clubs, from high degree vertices, to shortest path based centralities, particularly high betweeness and high closeness centrality vertices. We term these extended rich clubs as rich centrality clubs (RCC). We present the global topological properties that lead to the formation of RCCs in complex networks (sections 4 and 7). We further show how RCCs can be leveraged for spreading information efficiently and increasing network resilience (section 5). Finally we present a network modification algorithm to create RCCs in a network without disturbing other structural properties of the original network(section 6).

Our study is motivated by the fact that over the last few years several papers(Shin et al., 2016, 2017; Li et al., 2015; Kitsak et al., 2010) have independently reported that vertices in the inner shells of the networks can be leveraged to identify high centrality nodes or serve as seeds for community detection. However, each paper focused on only one type of analysis and there was rarely any overlap between the networks studied in these papers. When we conducted an integrated study over a large set of real-world and synthetic networks, we observed that the reported properties of the vertices in the inner shells hold only for a certain type of networks. This observation impelled us to investigate the topological property of networks where the inner shells contain high centrality nodes.

We observed that the inner shells of networks are typically dense, thus if they contain high centrality nodes, then by virtue of being dense, these cores would form a rich centrality club. However, unlike degree which is a local variable, closeness and betweeness centralities are based on shortest paths which are global variables. Building on this observation we demonstrate that the networks with RCC also maintain a global pattern. Specifically, each shell is an expander graph, and going from the inner to the outer shell,the shells have gradually decreasing density. In other words, visually and quantitatively, networks with RCC expand out from a dense inner core to sparse outer shells (see Figure 1).

The presence of rich centrality clubs confers several favorable properties to the networks. In particular, due to the presence of many high path based centrality vertices within a small subgraph, the vertices in the RCC can be effective seed nodes in quickly spreading information across the network. Moreover, similar to the traditional rich club, the presence of RCC increases the resilience of the networks under edge perturbations.

Given these favorable properties, we posit that, in many cases, the presence of RCC is desirable. To this end, we propose a modification model that can form a RCC in a network where it is absent. Our model is such that other properties of the original network including the power law exponent, the average degree, shortest path based centralities and clustering co-efficient remain unchanged.

The key contributions of our paper are as follows.

  • We study the formation of rich clubs of shortest path based centralities in complex networks and observe that their presence can lead to faster identification of high centrality nodes and communities. We demonstrate theoretically and empirically that in networks containing RCC, the shells are expander-like and the density of the shells decreases from the inner to the outer shells (sections 4 and 7).

  • We empirically show that networks containing RCC have several favorable properties (section 5). Specifically, the vertices within the RCC are effective seed nodes for information spreading and networks containing RCC are resilient to perturbations to their edges.

  • We propose a modification model that can insert RCC into a network, while maintaining other structural properties of the original network (section 6). Our model is reversible in that when operations are applied in reverse (deletion instead of addition of edges), the RCC can be removed from a network, while also maintaining the other structural properties. Our model only requires the information of the degree of the vertices, which is a much faster operation than computing the betweenness and closeness centralities.

Figure 1. Comparing the core periphery of networks generated by (Alvarez-Hamelin et al., 2006). Left: Network of software dependencies demonstrates presence of RCC; shells are arranged as concentric cycles. Right: Network of protein interactions in yeast does not demonstrate the presence of RCC; the innermost cores are not at the center of the network.Color online

2. Definitions and Datasets

We briefly describe the definitions of the network properties and the test suite used for the experiments.

2.1. Definitions of network properties

Definition 1 ().

-core: Given a graph , where are the set of vertices and edges, a k-core is a maximal subgraph such that each node in has degree at least k.

Definition 2 ().

-shell: Given a graph , a -shell is the induced subgraph over the maximal set of nodes such that (1) the -shell does not include nodes from any existing higher shells, and (2) each node in the -shell has at least connections to nodes in the -shell or the higher shells.

The core number of a node is the highest value such that the node is a part of a -core. The -core decomposition (Govindan et al., 2017) is the assignment of core numbers to nodes. Core numbers can be computed with complexity of .

To find a -core, the -core decomposition algorithm recursively removes nodes with degree less than . We will denote all the nodes belonging to the -core by the set . Note that if is connected then is equivalent to the . Subsequent inner cores are part of the outer cores i.e

Note that the -core is the induced subgraph of the union of -shells for . Hence the shells of the graph partitions the set into disjoint sets of vertices.

Definition 3 ().

Expansion property (Malliaros and Megalooikonomou, 2011): Given a graph where is the set of nodes and is the set of edges, the expansion of a set of nodes is a function of the number of nodes in to which is connected. That is, if is the set of nodes to which is connected, then the expansion of is

A graph with a high value of expansion property is known as an expander graph. In an expander graph any subset (where ) will have many neighbors. The expansion of a graph can be measured using the Cheeger constant, . A low Cheeger constant indicates a ”bottle neck”, i.e. the graph can be partitioned by removing very few edges. A high Cheeger constant indicates that no matter how the graph is partitioned, the number of edges across the partitions is always large.

Accurate calculation of the Cheeger constant is NP hard. For d-regular graphs it can be approximated by the second smallest eigenvalue of the spectrum of the normalised graph Laplacian given by . The second smallest eigenvalue , also known as the eigengap, is related to Cheeger constant by which is also know as Cheeger’s inequality. It was shown in (Chung, 1996) that the lower bound also holds for general graphs.

2.2. Test suite of networks

We used a diverse set of networks for our experiments. We used 8 real world networks that are publicly available (Leskovec and Krevl, 2014; Kunegis, 2013). We generated 15 synthetic networks of varying sizes and varying network core structures using the MUSKETEER synthetic network generation tool  (Gutfraind et al., 2015). A summary of the properties of these 23 networks is given in Table 1. The is the scale free exponent obtained after fitting the empirical degree distribution to a power law distribution which is given by where is the fraction of nodes having degree . All the networks are considered to be undirected.

Network Nodes Edges LCN
AS (Leskovec and Krevl, 2014) 6474 13895 1.235 4.29 0.27 0.004 12
Caida (Leskovec and Krevl, 2014) 16493 33372 1.17 4.04 0.27 0.001 20
Bible (Kunegis, 2013) 1707 9059 1.523 10.61 0.31 0.001 15
Software (Kunegis, 2013) 994 4645 1.168 9.32 0.34 0.002 11
Protein (Kunegis, 2013) 1458 1993 2.106 2.73 0.15 0.004 5
Facebook (Leskovec and Krevl, 2014) 7178 10298 2.896 2.86 0.11 0.001 5
Hepth (Leskovec and Krevl, 2014) 2694 4255 1.487 3.15 0.18 0.001 7
Power (Kunegis, 2013) 4941 6594 2.845 2.66 0.05 0.003 5
N1 14212 34901 1.215 16.3 0.25 0.0007 16
N2 10162 25154 1.28 4.95 0.27 0.0008 14
N3 36469 96990 1.237 2.66 0.24 0.003 27
N4 65630 170061 2.29 2.66 0.13 0.003 12
N5 4091 33352 1.438 2.66 0.33 0.003 21
N6 6785 44381 1.421 2.66 0.31 0.003 19
N7 6009 13585 2.016 2.66 0.18 0.003 6
N8 3863 22356 1.268 2.66 0.34 0.003 23
N9 11278 19616 2.737 3.47 0.03 0.003 5
N10 19623 33711 2.832 3.43 0.05 0.00049 5
N11 5980 9501 3.027 3.17 0.05 0.05 6
N12 6045 13592 2.373 4.49 0.11 0.0035 7
N13 7783 35185 2.517 3.61 0.06 0.0033 6
N14 15988 28373 3.029 3.54 0.09 0.004 5
N15 28651 51159 3.073 3.57 0.04 0.0028 6
Table 1. Test suite of networks and their properties. : power-law exponent, : average degree, : average clustering co-efficient, : average betweenness centrality. (LCN): largest core number in the network.

3. Motivating Experiments

We present the experiments that motivated our research. We test whether vertices with high core numbers (i) have high centralities and (ii) can be used as seed nodes for community detection.

3.1. Correlation with other centrality metrics

Several papers (Shanahan and Wildie, 2012; Holme, 2005a; Silva et al., 2008; Lin et al., 2014; Meyer et al., 2014), claim that the vertices with high core numbers should also have high centrality values. To test this claim, we compute the Jaccard coefficient () given by between the set of vertices with highest core numbers () and an equal number of high ranked nodes for each of the centrality metrics ().

The results in Table 2 show a clear separation of the networks. In the first group, all the networks (blue) have high implying significant number of high central nodes also have highest core numbers. In the second group, (brown) the high core numbered nodes do not have high centrality as per the low scores.

Network degree closeness betweenness
AS 0.6 0.75 0.5
Caida 0.56 0.69 0.49
Bible 0.49 0.64 0.43
Software 0.40 0.5 0.23
Protein 0 0 0
Facebook 0.10 0 0
Power 0 0 0
Hepth 0.05 0.05 0.05
N1 0.63 0.86 0.68
N2 0.722 0.530 0.653
N3 0.74 0.78 0.80
N4 0.68 0.81 0.68
N5 0.80 0.82 0.78
N6 0.57 0.71 0.6
N7 0.39 0.48 0.37
N8 0.79 0.79 0.79
N9 0 0 0
N10 0 0 0
N12 0.02 0 0
N13 0.06 0 0
N14 0 0 0
N15 0.002 0 0
Table 2. Jaccard index between nodes with highest coreness and equal number of high centrality nodes. Results clearly separate the two categories of networks into ones that have an RCC (blue) and ones that do not have an RCC (brown).

3.2. High core numbers to detect communities

In existing techniques of community detection utilizing -core structure (Peng et al., 2014), the network is reduced to its -core subgraph, for a predetermined value of , and the communities in the subgraph are computed. The vertices in these communities are used as seed nodes to propagate the community information to other vertices.

We note that this process will succeed only if the communities are well represented in the reduced network. Therefore to test the applicability of this algorithm, we tabulate how the vertices in the innermost core are distributed across the communities.

Figure 2 plots the community ids of the networks in the x-axis and the number of nodes from the innermost core that are members of a particular community in the y-axis. We include all communities whose at least one vertex is in the innermost core.

We again observe a separation between two class of networks. In one group, the vertices from the innermost core are spread over multiple communities, whereas in the other group, the vertices from the innermost core are concentrated in one or two, communities. Clearly, the first group is more suitable for the community detection algorithm described earlier. Figure 3 shows the community distribution in the innermost core of two networks from our test suite. These results demonstrate that vertices with high core numbers are not always distributed across multiple communities, and in some cases can be concentrated in only one community.

(a) AS
(b) Bible
(c) Caida
(d) Software
(e) N6
(f) N7
(g) Power
(h) Protein
(i) Hepth
(j) Facebook
(k) N8
(l) N9
Figure 2. Distribution of innermost core nodes in the different communities of the network. The X-axis indicates the community ids of a network ordered in terms of number of nodes present in that community. Y-axis indicates the number of innermost core nodes in a particular community with the id on the x-axis. The X-axis stretches includes all communities that contain the nodes from the innermost core. Note that for networks with an RCC (top) the high core vertices are more spread out among multiple communities as compared to those that do not have an RCC (bottom).
(a) Software
(b) Protein
Figure 3. Visualization of the subgraph formed using the innermost core nodes. Here nodes belonging to the same community are annotated with the same color and id. In the software network the innermost core clearly has nodes from several communities. In the protein network all nodes in the innermost core belong to the same community.

3.3. Evidence of rich centrality club

These motivating experiments demonstrate the existence two groups of networks. One group consists of networks where the vertices from the inner cores have high correlation with vertices of different high centrality metrics and can be used as seed nodes for community detection. The other group consists of networks where the vertices from the inner cores have no correlation with other high centrality metrics, and are concentrated in one or two communities.

To visualize how rich centrality clubs are formed in the innermost cores, we divide the vertices in the network into two sets of nodes, based on whether they are part of the innermost core or not. We partition the nodes outside the innermost core into their respective communities and combine each community into a single supervertex. The set of nodes of the innermost core are also combined into a supervertex. Two supervertices are connected if there is at least one edge between the nodes comprising them.

(a) Caida
(b) Power
Figure 4. Network formed by two category of supervertices, i.e, communities (denoted by , , ) and the innermost core (denoted by ). Two supervertices are connected if the corresponding nodes from which they are formed are connected by at least an edge. Higher size of supervertex imply higher average centrality of constituent vertices

Figure 4 shows a visualization of the reduced network for two benchmark networks. Each node is labeled as for communities and for the innermost core. The supervertices are ordered by size with respect to average centrality(closeness and betweenness) of constituent vertices. For the network Caida, which is in the first group, the supervertex corresponding to the innermost core has significantly high centrality and is in the centre. For the network, Power, which was in the second group, there are no distinctively high centrality supervertex.

Since in the first group, the high centrality nodes are in the innermost cores and since by their definition these vertices are connected to each other, our experiments demonstrate that rich club of high centrality vertices is formed in these networks. In the next section, we present the topological property of these networks that lead to the formation of RCC.

4. Properties of Networks Containing Rich Centrality Clubs

We define structural properties of networks containing RCC and present empirical results to support our definition. In section 7 we provide theoretical rationale for our definition.

(a)
(b)
(c)
Figure 5. (a) The average degree of, and (b) the number of nodes in, the shell based subgraphs for different buckets of shells for each network. (c) Average eigengap of the shell based subgraphs for different buckets of shells for each network. Results show graceful degradation for the networks with an RCC while an abrupt fall for the networks with no RCC.

4.1. Formal definition and rationale

Let the subgraph induced by the vertices in shell and their neighbors be . Let be the average degree and , the number of nodes in . Let be the second smallest eigen value of the normalized Laplacian matrix of . Let the average distance of a vertex in shell to a vertex an inner shell , , be .

Given these parameters, we state that a network will contain a RCC if the following properties hold.

  1. If for two shells and , , then and .

  2. For all shells ,

  3. For all shells , , where is a high numbered core, with density close to 1.

The first property requires that the shells have progressively smaller number of vertices, and become more dense from outer to inner shells. The second property provides the upper bound of the second smallest eigenvalue, and in turn, to the Cheeger constant at each shell. The higher , the more expander-like the associated shell. If the shell has multiple components, then each of them should maintain this property. The third property states that the hops to travel from the outer shells to inner cores should be small.

The values of the parameters and are determined based on size and density of the whole network. As per our experiments, setting and can clearly distinguish between networks that contain RCC and those that do not.

4.2. Density of shells

The first condition is a feature of the core-periphery structure of almost all scale-free networks, whether they contain RCC or not. To demonstrate this, we first subdivide the vertices into subgraphs . For each, we compute the average degree and the number of nodes. Since the number of shells varies across networks, for uniform presentation of the results we divide our results into three buckets. Starting from innermost we place the first shells in the first bucket(). The next falls in the second bucket() and the final falls in the last bucket(). For each bucket we calculate the mean and the standard deviation of the average degrees and number of nodes classified in that bucket. These values are plotted in Figure 5(a) (average degree) and (b) (number of nodes). As seen from the figure with the exception of slight deviations, the first property is maintained in both sets of networks.

4.3. Eigenvalue of shells

For each , we compute the normalized Laplacian (see section 2), extract its spectrum using eigenvalue decomposition, and compute the eigengap. For each bucket as defined in the previous section, we calculate the average eigengap and the standard deviation. These values are plotted in Figure 5(c).

We observe that in graphs where we assume that RCC exists, there is a slow decline of the average eigengap. The Cheeger constant is high in the inner shells and gradually decreases from the inner to the outer shells. In the other group of networks, there is an abrupt fall in the average eigengap after the first bucket of inner shells. The first group can be bound by a large than the second group, thus corroborating the second property.

4.4. Distance between shells

The third property enforces that on average two vertices in outer shells are more likely to be connected through inner dense shells. We show this in Figure 6, where in networks with an RCC, the average shortest distance of the nodes in the outer shells to the innermost () and the second innermost shells () is low (2-3) compared to networks without an RCC (10-50).

Figure 6. The average shortest distance of a node in the outer shells to a node in the innermost () and the second innermost shell ()

5. Application

In this section, we demonstrate how the presence of RCC can be leveraged in some important applications.

5.1. RCC as influential nodes

Vertices, which when selected as seed nodes can accelerate the diffusion process in networks are known as influential nodes. We hypothesize that the RCC, if present in a network, is a natural choice for the seed nodes for spreading. In order to test this hypothesis we execute a diffusion process adapted from (Maiya and Berger-Wolf, 2014) on two groups networks in our dataset, i.e., those with an RCC and those without.

We choose a seed set size of 00footnotetext: Our experiments with different value of yield similar results. (here we show results for ) and populate this set preferentially on the basis of highly connected nodes which includes high centrality nodes (degree, closeness, betweenness), innermost shell nodes (these are nodes from within the RCC in networks that demonstrate its presence) and a random set of nodes. These set of seed nodes initially have an information and they pass it to all their neighbors using a flooding based broadcast approach (all neighbors of an informed vertex get informed). This approach spreads the message very quickly and hence modified versions have been used in peer-to-peer networks for sending queries and searching (Jiang et al., 2008). Although in real world systems this method is difficult to implement due to scalability issues, our goal here is to study how effective the vertices in the RCC are as seed nodes.

The x-axis in Figure 7 shows the fraction of vertices that have received the message and the y-axis the steps to reach these fraction of vertices. The networks form two groups. In one group that demonstrate the presence of RCC, the vertices from the innermost core are effective seed nodes for broadcasting and the time is comparable to the time when high centrality vertices are selected as seeds. The other group is when the vertices from the innermost core perform very poorly as seed nodes. The time to spread the information is equal to or worse than a random selection. These results show that only vertices in the innermost core in networks that demonstrate the presence of RCC are effective as seed nodes for spreading information.

(a) AS
(b) Caida
(c) Bible
(d) Software
(e) N1
(f) N3
(g) Power
(h) Protein
(i) Hepth
(j) Facebook
(k) N8
(l) N9
Figure 7. Time required in terms of the number of steps for the information to disseminate to nodes in the network. In the top panel, for the networks that demonstrate the presence of RCC, coreness based seed nodes consistently appear to be good choices as message initiators. In the bottom panel, for the networks that do not demonstrate the presence of RCC, coreness based initiators perform equal to or worse than random initiators.

5.2. Robustness

One of the desirable properties of a network is whether it can retain the ranking of its top- high centrality nodes under perturbation of the network. We hypothesize that networks which demonstrate the presence of RCC are robust to minor perturbations in the form of random deletion of edges.

To corroborate this we compute the closeness and the betweenness centrality based rankings of the nodes in the original network and compare the top- ranked nodes with that of the perturbed network. Perturbation is done by randomly deleting 1% to 8% of the edges from the original network. For comparison we use the standard Kendall measure as demonstrated in Laishram et. al. (Laishram et al., 2018). As discussed in this work we also use ; however, experiments with etc., also yield similar results. Results obtained for both categories of networks, i.e., those in which an RCC is present and those in which it is not are illustrated in Fig 8.

Our results show that for minor perturbations of the topology, networks with RCC are robust in terms of preserving the top- high centrality nodes. However, if the perturbation intensity is too high the ranking gets jeopardized. In case of networks without an RCC even a small perturbation substantially disturbs the ranking of the top- high centrality nodes.

(a)
(b)
(c)
(d)
Figure 8. Robustness results in terms of Kendal score comapring the rankings of the top-50 high closeness and betweenness nodes of the original and the perturbed version of the network. Perturbation is done by randomly deleting 1% to 8% of the edges. We average the results over 10 different runs; the error bars therefore are also reported. Results obtained in the case of networks with an RCC (5a, 5b) is compared against networks without an RCC (5c, 5d). The results clearly indicate that networks with RCC are robust to minor perturbations.
Figure 9. Simplified models of a network with (left) and without (right) an RCC. Red vertices have core number 4, green vertices have core number 3 and brown vertices have core number 2. Note that the RCC is formed in the innermost core.

6. Algorithm for Forming RCC

We now present a simple yet effective modification algorithm for inserting RCC into a network and conversely removing RCC from a network containing it.

Rationale for the algorithm: To explain the rationale for our algorithm, we present two simplified models of a network with a RCC and without a RCC (Figure 9). If the network contains RCC, then the inner shells are expander like, and communities meet at the RCC. An example model conforming to this structure would be a large clique in the center surrounded by smaller cliques.

In a network without a RCC, the majority of the communities do not meet through the inner core. This indicates that the inner core is not at any special position with respect to the paths connecting the communities. One example model of such a network would be a ring of cliques of different sizes. The smaller cliques can have connections between them. Here the highest core is at the side of the network rather than the center111We emphasize that these models are only idealized representations of the two types of networks, and more complicated connections occur for real-world networks. Nevertheless the principal idea is maintained, i.e., for networks with a RCC, the innermost core is at the center of the network..

As per the example figure, to introduce a RCC, we can simply connect the high degree vertices across communities. The high degree of the vertices ensures that the clique (or near clique) formed by them will have higher core numbers. Joining communities ensures that the communities connect within this subgraph. In the network without an RCC in Figure 9, we would connect all the vertices in the ring.

Conversely, to destroy the RCC property of the network, we will simply delete the edges in the inner core, such that the connections between the communities are destroyed, and the highest numbered core moves away from the center.

Algorithm for forming RCC: Our proposed approach for connecting the communities via high degree vertices is however very expensive. This is because finding communities itself is a computationally intensive operation. A faster alternative is to simply connect (or disconnect) connections between the high degree vertices. This method works, because in networks without a RCC, high degree vertices within the same community are likely to be already connected. Therefore, any vertex pair connected as part of the modification algorithm will be in different communities.

Moreover, increasing the connections among the high degree nodes also brings all those (usually low degree) nodes that are neighbors of these high degree nodes closer in the network. Thus, the nodes in the innermost cores will have high centrality, as is a characteristic of networks with a RCC. On the converse side, for a network with a RCC, the high degree vertices will be in the inner core, so removing edges between them disconnects the cores.

It might seem from our approach that rich clubs of high degree vertices are also the rich centrality clubs. Figure 9 shows a counter example. The right hand graph has a rich club of high degree vertices but not a dense subgraph of high centrality vertices.

Experiments. The pseudocode of the algorithm is given in Algorithm 1. Figure 10, plots the eigengaps of the networks before (blue lines) and after (green lines) the modification. To clearly compare between the original and modified networks, we plot the eigengaps for each shell, rather than over an aggregate of shells as done in Figure 5(c). Note that for the networks that demonstrate the presence of RCC (AS, Bible and Software), the eigengap of the modified network is smaller, i.e., the green line is lower and has a steeper slope than the original network. For the networks that do not demonstrate the presence of RCC (Power, Protein and Facebook), the green line is higher, showing that the value of the eigengap increased and has a more gradual slope. We report the statistics of the modified network in Table 3. The table clearly shows that our model also preserves the crucial structural properties of the original network, for e.g., the scale-free exponent and the average degree.222We set the model parameter . The results are similar for and . is set to 0.2

N/W —V— —E— LCN
As 6474 12439 1.245 4.07 0.26 0.0012 9
Bible 1773 8600 1.557 10.07 0.298 0.006 11
Software 1003 4400 1.236 8.85 0.339 0.004 9
Protein 1870 2052 1.756 2.89 0.17 0.016 7
Facebook 7178 10349 2.311 2.82 0.125 0.0123 6
Power 4941 6698 2.344 2.71 0.0715 0.017 7
Table 3. Network statistics for the modified graphs. Note that the parameter values are comparable to that of the original networks in Table 1.
Input:
Output:
Parameters : 
1 Sort vertices in G based on decreasing degree;
2 Select the top nodes based on degree;
3 if  then
4      find possible edges that could be formed among the nodes
5else
6      find actual edges that are present among the nodes
7Select edges randomly from where ;
8 if flag == 1 then  ;
9 else ;
10 return ;
Algorithm 1 Algorithm for increasing /decreasing () expansion property.
Figure 10. The outcome of the modification model. The first three networks (top panel) that originally demonstrate presence of RCC, i.e., AS, Bible and Software get transformed to networks with no RCC. The last three networks (bottom panel) that originally do not demonstrate the presence of RCC, i.e., Power, Protein and Facebook get converted to networks with RCC. These plots are similar to the eigen gap chart of Figure 5(c), except we show the eigengap over all the shells rather than in groups. The blue (green) plot shows the eigengap for the original (modified) network.

7. Theoretical insights of definition

We now theoretically demonstrate how the three properties described in section 4 lead to the formation of rich centrality clubs. We consider an ideal network, where the vertices of each shell form a connected component and as per property 2, the values of are large enough such that each shell is an expander graph. Since an expander graph has no bottleneck, or clear partition, random graphs fulfill these criteria. We therefore assume that each subgraph, induced by a shell and its neighbors, is an Erdos-Reyni random graph, with vertices and average degree, and the probability of connection among a pair of vertices ; thus, .

In (Chung and Lu, 2002), the authors prove several bounds on the average path length in a random graph . In particular, they state that if , then the average distance is constant times . Using this result, we assume that the average distance between two vertices in subgraph is .

Now consider a path between two vertices and , with the sequence of vertices . Let the core numbers of these vertices be . Let be the highest numbered shell in this sequence. We assume that for all shortest paths between and , .

This means that the shortest paths travel monotonically from a source low shell to the highest shell required, and then back from the high shell to the destination low shell. Note that this assumption allows the path to remain in the same shell throughout as well. However, paths that zig-zag from a high shell to a low shell and back to a high shell are not allowed. This rationale is based on the fact that since lower valued shells are sparser, it is more likely that the paths will connect through higher shells than lower shells.

With these assumptions in place we can state the following

Lemma 7.1 ().

If there exists a shell , such that, , for all and for at least one , then high centrality vertices are located in core .

Proof.

Let be the lowest numbered shell that satisfies the equation in the lemma. Consider the path between two vertices and . If either or is in , then the path between them has to pass through . If neither of the vertices are in , we have to consider two cases.

First case, the two vertices in the same shell , , then on average, the distance between them will be .

Second case, the vertices from two different shells and , . On average the length of the shortest path will be . This value is greater than the path simply going through .

Thus for both cases, if , the shortest path between any two vertices in the graph is on average going to pass though . Thus the core will contain high closeness and betweeness centrality vertices.

As per property 1, and , where . Therefore the condition will hold for any two shells. To maintain the condition of the lemma, we have to ensure that the distance from the steps to go from one shell to another, , is small enough. In other words, the steps to go from shell to core is smaller than the difference of their average distance. We have observed that for networks with RCC, , where is the innermost or second innermost core, the value is between 2-4. The number of nodes in the outer shells can go upto thousands, thus easily satisfying the condition.

It might seem that because finding the eigen value is an expensive operation, identifying networks with RCC would also be more expensive than simply finding the high centrality vertices. However, note that our lemma is based on the average path per shell. This metric can be computed in parallel for each shell, and is faster than computing the centralities over the whole network.

8. Related Work

In this paper, we bring together several concepts from rich clubs, to core-periphery structure, to its application in information spreading and community detection, to expander graphs. To the best of our knowledge, this is the first paper to combine these different concepts within a single framework.

Rich club: Rich club structure is well studied in the context of infrastructure networks (Zhou and Mondragón, 2004b, a). Rich clubs have also been shown to emerge in biological networks as well, e.g., brain networks, metabololic networks (Cinelli et al., 2018; Bertolero et al., 2017). One of the recurring themes of research on characterizing rich club structure has been focused on distinguishing degree associativity and the rich club structure.

Detection of cores: Algorithmic detection of core periphery introduced by (Seidman, 1983) is one of the most promising new area in network analysis. Several works such as (Holme, 2005b; Rombach et al., 2014) have presented that automatic techniques of separating the core nodes from the sparse periphery. Batagelj and Zaversnik (Batagelj and Zaversnik, 2003) proposed a -core decomposition algorithm that requires runtime and space. In (Cheng et al., 2011; Khaouid et al., 2015) the authors proposed modifications of the previous linear approach which scales the computation to millions of nodes and billions of edges. In many networked systems, we are only concerned with estimating the importance of a subset of nodes instead of the entire network. In (OBrien and Sullivan, 2014), the authors proposed a computation technique for computing the coreness for a node which only takes into account its neighborhood. Model based approaches have been presented in (Rombach et al., 2014; Zhang et al., 2015) where the authors designed objective functions to estimate the coreness of nodes.

Correlation of coreness with centrality measures: Coreness has been shown to be correlated with several centrality metrics. A strong Spearman’s rank correlation between degree and coreness has been presented in (Shin et al., 2016, 2017) and the authors developed an anomaly detection system based on this correlation. In contrast, in(Li et al., 2015) showed that core number has low Pearson’s correlation with centrality metrics such as degree, closeness and betweenness. An explanation could be that while many nodes in the network could potentially have the same core number, they would typically tend to be different in terms of the centrality measures and thus Pearson’s correlation would be low. A more accurate comparison would be to consider the overlap of the top ranked nodes based on core numbers and other centrality metrics, which is what we do here.

Coreness for community detection: -core decomposition outputs an ordered partition of the graph after processing it hierarchically. In (Giatsidis et al., 2014), the authors proposed that this hierarchical information can be utilized by any graph clustering algorithm to obtain more meaningful partitions. In (Peng et al., 2014), the authors proposed a framework to accelerate label computation for nodes by modularity maximization utilizing the -core information. They estimate a maximum speedup of through rigorous experiments.

Coreness for spreading: The core number though derived from degree, is a better indicator of the capacity for information dissemination. Strategically placed nodes, as detected by the -core decomposition are able to spread information to a larger portion of the graph. This result has been shown by several works such as (Kitsak et al., 2010; Bae and Kim, 2014; Pei and Makse, 2013). In (Cohen, 2008; Wang and Cheng, 2012; Rossi et al., 2015), the authors apply -truss decomposition which is a triangle based extension of -core decomposition, with the objective of finding a refined set of influential nodes from all potential high core nodes. The authors show that -truss decomposition extracts influential spreaders which can infect a large portion of target nodes within first few steps of the SIR epidemic model.

Expander graphs: Analyzing graphs using spectral techniques has a long history (Maiya and Berger-Wolf, 2014; Estrada, 2006). The main idea behind these approaches is to consider information about the spectrum of a matrix representation of the graph (mainly, the adjacency matrix or the Laplacian). It has been presented by several researchers (Malliaros et al., 2015; Krivelevich, 2017; Kannan et al., 2004) that expansion properties of the graph provide crucial signals in understanding the degree of cohesion in the underlying subgraph structure. High expansion property results in being simultaneously sparse and tightly connected. A graph with such non-trivial structural property are called expander graphs and a comprehensive review on this topic can be found in Hoory et al. (2006)

9. Conclusion

We study the properties of networks that demonstrate the presence of rich club of shortest path based high centrality nodes. We find that in these network, the vertices of the innermost core constitute the rich centrality club. Our main observations are as follows.

  • The rich centrality clubs, if formed, are located in the inner cores of the network. These nodes can also be used as seed nodes for community detection.

  • The networks with RCC typically have cores with expander-like structures. The density of the cores increase from inner to outer. The average number of hops to travel from an outer core to an inner core is small (2-4 hops).

  • The nodes in the RCC of a network constitute very effective seeds for information diffusion. Further, presence of an RCC makes a network very resilient to small random structural perturbations.

  • A simple model can convert a network with an RCC to a one without an RCC and vice versa. The model has just two global parameters to be tuned and builds on the idea of increasing the density/sparsity of connections among high degree, i.e., the innermost core nodes.

Our experiments provide for the first time a deeper understanding about the rich club of high centrality vertices and their interplay with the core-periphery structure. In future, we aim to study dynamic networks to observe the evolution of RCCs over time.

References

  • (1)
  • Alvarez-Hamelin et al. (2006) Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani. 2006. LaNet-vi in a Nutshell. (2006).
  • Bae and Kim (2014) Joonhyun Bae and Sangwook Kim. 2014. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A: Statistical Mechanics and its Applications 395 (2014), 549–559.
  • Batagelj and Zaversnik (2003) Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
  • Bertolero et al. (2017) MA Bertolero, BTT Yeo, and M D’Esposito. 2017. The diverse club. Nature communications 8, 1 (2017), 1277.
  • Cheng et al. (2011) James Cheng, Yiping Ke, Shumo Chu, and M Tamer Özsu. 2011. Efficient core decomposition in massive networks. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 51–62.
  • Chung (1996) F.R.K. Chung. 1996. Laplacians of graphs and Cheeger inequalities. (1996).
  • Chung and Lu (2002) Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences 99, 25 (2002), 15879–15882.
  • Cinelli et al. (2018) Matteo Cinelli, Giovanna Ferraro, and Antonio Iovanella. 2018. Rich-club ordering and the dyadic effect: Two interrelated phenomena. Physica A: Statistical Mechanics and its Applications 490 (2018), 808–818.
  • Cohen (2008) Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report 16 (2008).
  • Estrada (2006) Ernesto Estrada. 2006. Spectral scaling and good expansion properties in complex networks. EPL (Europhysics Letters) 73, 4 (2006), 649.
  • Giatsidis et al. (2014) Christos Giatsidis, Fragkiskos D Malliaros, Dimitrios M Thilikos, and Michalis Vazirgiannis. 2014. CoreCluster: A Degeneracy Based Graph Clustering Framework.
  • Govindan et al. (2017) Priya Govindan, Chenghong Wang, Chumeng Xu, Hongyu Duan, and Sucheta Soundarajan. 2017. The k-peak Decomposition: Mapping the Global Structure of Graphs. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1441–1450.
  • Gutfraind et al. (2015) Alexander Gutfraind, Ilya Safro, and Lauren Ancel Meyers. 2015. Multiscale network generation. In Information Fusion (Fusion), 2015 18th International Conference on. IEEE, 158–165.
  • Holme (2005a) P Holme. 2005a. Core-periphery organization of complex networks. Phys Rev E 72, 4 (2005), 046111.
  • Holme (2005b) Petter Holme. 2005b. Core-periphery organization of complex networks. Physical Review E 72, 4 (2005), 046111.
  • Hoory et al. (2006) Shlomo Hoory, Nathan Linial, and Avi Wigderson. 2006. Expander graphs and their applications. Bull. Amer. Math. Soc. 43, 4 (2006), 439–561.
  • Jiang et al. (2008) Song Jiang, Lei Guo, Xiaodong Zhang, and Haodong Wang. 2008. Lightflood: Minimizing redundant messages and maximizing scope of peer-to-peer search. IEEE Transactions on Parallel and Distributed Systems 19, 5 (2008), 601–614.
  • Kannan et al. (2004) Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2004. On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51, 3 (2004), 497–515.
  • Khaouid et al. (2015) Wissam Khaouid, Marina Barsky, Venkatesh Srinivasan, and Alex Thomo. 2015. K-core decomposition of large networks on a single PC. Proceedings of the VLDB Endowment 9, 1 (2015), 13–23.
  • Kitsak et al. (2010) Maksim Kitsak, Lazaros K Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H Eugene Stanley, and Hernán A Makse. 2010. Identification of influential spreaders in complex networks. Nat. Phys. 6, 11 (2010), 888–893.
  • Krivelevich (2017) Michael Krivelevich. 2017. Finding and using expanders in locally sparse graphs. arXiv preprint arXiv:1704.00465 (2017).
  • Kunegis (2013) Jérôme Kunegis. 2013. Konect: the koblenz network collection. In WWW. ACM, 1343–1350.
  • Laishram et al. (2018) Ricky Laishram, Ahmet Erdem Sariyüce, Tina Eliassi-Rad, Ali Pinar, and Sucheta Soundarajan. 2018. Measuring and Improving the Core Resilience of Networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 609–618.
  • Leskovec and Krevl (2014) Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).
  • Li et al. (2015) Cong Li, Qian Li, Piet Van Mieghem, H Eugene Stanley, and Huijuan Wang. 2015. Correlation between centrality metrics and their application to the opinion model. The European Physical Journal B 88, 3 (2015), 65.
  • Lin et al. (2014) J-H Lin, Q Guo, W-Z Dong, Li-Y Tang, and J-G Liu. 2014. Identifying the node spreading influence with largest k-core values. Physics Letters A 378, 45 (2014), 3279–3284.
  • Maiya and Berger-Wolf (2014) Arun S Maiya and Tanya Y Berger-Wolf. 2014. Expansion and decentralized search in complex networks. Knowledge and information systems 38, 2 (2014), 469–490.
  • Malliaros and Megalooikonomou (2011) Fragkiskos D Malliaros and Vasileios Megalooikonomou. 2011. Expansion properties of large social graphs. In International Conference on Database Systems for Advanced Applications. Springer, 311–322.
  • Malliaros et al. (2015) Fragkiskos D Malliaros, Vasileios Megalooikonomou, and Christos Faloutsos. 2015. Estimating robustness in large social graphs. Knowledge and Information Systems 45, 3 (2015), 645–678.
  • Meyer et al. (2014) P. Meyer, H. Siy, and S. Bhowmick. 2014. Identifying Important Classes of Large Software Systems Through K-core Decomposition. Advances in Complex Systems 17, 07n08 (2014), 1550004. https://doi.org/10.1142/S0219525915500046 arXiv:http://www.worldscientific.com/doi/pdf/10.1142/S0219525915500046
  • OBrien and Sullivan (2014) Michael P OBrien and Blair D Sullivan. 2014. Locally estimating core numbers. In Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 460–469.
  • Pei and Makse (2013) Sen Pei and Hernán A Makse. 2013. Spreading dynamics in complex networks. Journal of Statistical Mechanics: Theory and Experiment 2013, 12 (2013), P12002.
  • Peng et al. (2014) Chengbin Peng, Tamara G Kolda, and Ali Pinar. 2014. Accelerating community detection by using k-core subgraphs. arXiv preprint arXiv:1403.2226 (2014).
  • Rombach et al. (2014) M Puck Rombach, Mason A Porter, James H Fowler, and Peter J Mucha. 2014. Core-periphery structure in networks. SIAM Journal on Applied mathematics 74, 1 (2014), 167–190.
  • Rossi et al. (2015) Maria-Evgenia G Rossi, Fragkiskos D Malliaros, and Michalis Vazirgiannis. 2015. Spread it good, spread it fast: Identification of influential nodes in social networks. In Proceedings of the 24th International Conference on World Wide Web. ACM, 101–102.
  • Seidman (1983) Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269–287.
  • Shanahan and Wildie (2012) M Shanahan and M Wildie. 2012. Knotty-centrality: finding the connective core of a complex network. PLoS One 7, 5 (2012), e36579.
  • Shin et al. (2016) Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016. CoreScope: Graph Mining Using k-Core Analysis—Patterns, Anomalies and Algorithms. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 469–478.
  • Shin et al. (2017) Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2017. Patterns and anomalies in k-cores of real-world graphs with applications. Knowledge and Information Systems (28 Jun 2017). https://doi.org/10.1007/s10115-017-1077-6
  • Silva et al. (2008) MRd Silva, H Ma, and A-P Zeng. 2008. Centrality, network capacity, and modularity as parameters to analyze the core-periphery structure in metabolic networks. Proc. IEEE 96, 8 (2008), 1411–1420.
  • Wang and Cheng (2012) Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. Proceedings of the VLDB Endowment 5, 9 (2012), 812–823.
  • Zhang et al. (2015) Xiao Zhang, Travis Martin, and Mark EJ Newman. 2015. Identification of core-periphery structure in networks. Physical Review E 91, 3 (2015), 032803.
  • Zhou and Mondragón (2004a) Shi Zhou and Raúl J Mondragón. 2004a. Accurately modeling the Internet topology. Physical Review E 70, 6 (2004), 066108.
  • Zhou and Mondragón (2004b) Shi Zhou and Raúl J Mondragón. 2004b. The rich-club phenomenon in the Internet topology. IEEE Communications Letters 8, 3 (2004), 180–182.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
345869
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description