Understanding Stability of Noisy Networks through Centrality Measures and Local Connections
Networks created from real-world data contain some inaccuracies or noise, manifested as small changes in the network structure. An important question is whether these small changes can significantly affect the analysis results.
In this paper, we study the effect of noise in changing ranks of the high centrality vertices. We compare, using the Jaccard Index (JI), how many of the top- high centrality nodes from the original network are also part of the top- ranked nodes from the noisy network. We deem a network as stable if the JI value is high.
We observe two features that affect the stability. First, the stability is dependent on the number of top-ranked vertices considered. When the vertices are ordered according to their centrality values, they group into clusters. Perturbations to the network can change the relative ranking within the cluster, but vertices rarely move from one cluster to another. Second, the stability is dependent on the local connections of the high ranking vertices. The network is highly stable if the high ranking vertices are connected to each other.
Our findings show that the stability of a network is affected by the local properties of high centrality vertices, rather than the global properties of the entire network. Based on these local properties we can identify the stability of a network, without explicitly applying a noise model.
2016 \setcopyrightacmlicensed \conferenceinfoCIKM’16 ,October 24 - 28, 2016, Indianapolis, IN, USA \isbn978-1-4503-4073-1/16/10\acmPrice$15.00 \doihttp://dx.doi.org/10.1145/2983323.2983692
¡ccs2012¿ ¡concept¿ ¡concept_id¿10003033.10003083.10003095¡/concept_id¿ ¡concept_desc¿Networks Network reliabihttps://preview.overleaf.com/public/jwnzzpmcybsk/images/5619b2e4151c9776c5077cee84255d223d0f1916.jpeglity¡/concept_desc¿ ¡concept_significance¿500¡/concept_significance¿ ¡/concept¿ ¡concept¿ ¡concept_id¿10003752.10003809.10003635¡/concept_id¿ ¡concept_desc¿Theory of computation Graph algorithms analysis¡/concept_desc¿ ¡concept_significance¿300¡/concept_significance¿ ¡/concept¿ ¡concept¿ ¡concept_id¿10003752.10010061.10010069¡/concept_id¿ ¡concept_desc¿Theory of computation Random network models¡/concept_desc¿ ¡concept_significance¿300¡/concept_significance¿ ¡/concept¿ ¡/ccs2012¿
Networks Network reliability \ccsdescTheory of computation Graph algorithms analysis \ccsdescTheory of computation Random networks
Network analysis is a very efficient tool for understanding complex systems of interacting entities that arise in diverse applications. Analysis of the network models provide insights to the properties of the underlying systems.
However, measurements of real-world systems is influenced by the experimental setup. Modeling of this data as networks is also affected by subjective choices. Given these uncertainties in data collection, any network created from real-world data will contain some inaccuracy (or noise). Noise in networks is manifested in the form of extra or missing edges.
Measuring the effect of noise: In recent studies, researchers perturb the network by adding/deleting a specified percentage of edges (noise level). Then they observe by how much the properties of the network alter and correlate this change in the properties with the noise level. A network is deemed stable if the noise does not affect the properties. Most of these studies focus on vertex-based measurements including centrality metrics and core numbers [1, 3, 9, 5, 8].
Despite these studies, there is yet no definite answer to this key question – how can we identify whether a network would be stable under noise? Although studies such as  claim that the stability is affected by global properties, we see in the experiments reported here, that these correlations do not always hold. Instead, our observations lead us to the conclusion that it is the local structure around the high centrality vertices, that significantly affect the stability.
Noise model: We consider an additive model, where a percentage of edges, selected from the complementary graph at random, are added to the existing network. We focus only on edge addition because missing edges can be predicted using link prediction algorithms. Therefore understanding the effect of extraneous edges is the more critical problem.
Key observations and contributions: We observe that two important factors affect the stability of the networks. First, the stability of the rankings depend on the value of . The top- ranked vertices arrange themselves into clusters. Within a cluster, the centrality values are very similar. However, the difference in values between the last vertex of a cluster and the first vertex of the next cluster is large. This phenomena makes the distribution of the high centrality values look like a step function. If the value of falls at the end of a cluster, the results are stable, otherwise they become unstable. Second, we observe that the stability of a network depends on the density of the subgraph induced by high centrality vertices, i.e., a rich-club. This is because the centrality metrics are more affected by changes to their immediate neighbors than to nodes at a larger distance.
These observations highlight that network stability is dependent on localized subgraphs induced by the top- high centrality vertices, and not the global topology. Our findings allow users to identify stable networks with respect to a centrality metric without applying the noise model on the network. Our main contributions are as follows;
Demonstrating that stability is dependent on the value of and that stability over successive is non-monotonic.
Demonstrating that stability of networks is high if the subgraph induced by the high centrality vertices is dense.
Providing a template to identify highly stable networks.
2 Experiments and Observations
Here we describe our experimental setup and report the behavior of the results under the perturbations. We used the following centrality metrics as defined in .
Degree centrality, of a vertex measures the number of its neighbors. Closeness centrality, of a vertex is computed as , where is the length of a shortest path between and . Betweenness centrality of a vertex is defined as ,where is the total number of shortest paths between and , and is the total number of shortest paths between and that pass through . We used the real-world networks listed in Table 1 from [6, 2, 4].
Noise model: Our noise model is as follows. Of all the possible edges in a graph with vertices we pick an edge with probability ; . If the edge is not part of the network it is added to the network.
If the degree of a vertex is , then it has nodes that can get added due to perturbation. Therefore the expected number of edges added to it will be . Thus vertices with higher degree will have fewer edge additions.
Measuring stability: For a given network and a given centrality metric we compute the stability as follows: We apply the noise model to the network with levels (i.e. ) of values . For each network we compute the centrality values. We then compute the Jaccard Index (JI) to see how many of the top vertices in the original network are also among the top vertices in the perturbed network. For two sets and , . The highest value of JI is 1 (two sets have exactly the same elements) and the lowest value is 0 (sets have no elements in common).
We conducted these experiments for each network over 10 perturbed networks per noise level. The JI presented in the results is the mean over the 10 networks. We classified the stability into three groups ; High Stability (JI ); Medium Stability ( JI ) and Low Stability (JI ).
Results: In Figure 1, each line represents a network. The X-axis represents the noise levels (). For ease of visualization we plot only for the even values of from 2 to 10. Y-axis measures the dominant stability, i.e. the longest consecutively occurring stability range for that noise level. For example, 5H denotes that at that noise level, for all the five values of the stability was high. 3M denotes that for three consecutive values of the stability was in the middle range.
Figure 2 shows the changes for the individual networks, per value of , not just the dominant stability. We have included two networks that were consistently in the high range (AS1 and C. elegans), two that were consistently in the low range (Power Grid and Football) and two that changed their stability values according to the centrality metric and noise level (GrQc and Railway). The results show that the stability value can change depending on the value of .
Observations: The results show that even a small amount of noise (average edges added per vertex is 2.5) can significantly change the analysis results. However, the behavior of the three centrality metrics varies as follows:
Degree: The dominant stability decreases monotonically for degree centrality. With the exception of Power Grid (has some middle level stability) and Football (all stabilities are low) all other networks show high stability.
Closeness: Several networks that show predominantly low dominant stability (Power Grid, Football, GrQc and Dolphins). Networks HepTh and Railway start as high stability, but their stability decreases with higher noise levels.
Betweenness: Here, Power Grid, Football and Dolphins have low dominant stability. GrQc goes from high, to medium to low. Railway also starts from high and ends at low.
To summarize, our main observations are as follows;
The dominant stability decreases with increasing levels of noise. However, the individual stability changes non- monotonically with the values of .
Among the centrality metrics, degree is most stable, closeness has a clearer separation between the high and low stability networks and in betweeness the separation is not as clear. The same network (e.g. GrQc at noise level 1.5) can have high (degree), low (closeness) and medium (betwenness) stability based on the centrality metrics.
The global topology of the network is not a deciding factor. In Table 1, the clustering coefficients are very diverse and is between 1.5 and 3. Neither of these parameters seem to strongly correlate with the stability values.
3 Factors Affecting Stability
This section contains our main contribution, where we explain how properties of the network affect the stability.
Stability of centrality metrics:
Consider two nodes and , whose values for a centrality metric, are and respectively. In the original network, , thus has a higher rank than
Our goal is to identify the lower bound on the difference between and , such that after perturbation will remain greater than . We consider the most optimal situation for to become larger than . We assume that has the maximum decrease after perturbation and has the maximum increase, given that on average edges are added per vertex. Our computations for each centrality values are as follows;
Degree centrality: The degree of a vertex will either increase or remain the same. Thus the maximum decrease of is zero. The value of . Therefore, if , then the ranking will not change.
For most networks the difference between the higher ranked vertices is larger than the maximum we set for our experiments, so the ranking of the vertices remain relatively stable.
Closeness centrality: For simplicity, we consider to be the inverse of closeness centrality, i.e.
Since we are adding edges, this value will either increase or remain the same. The change in will depend on where the edges are added.
Assume, due to perturbations, is added to a vertex , which is at distance from . Therefore, , and other vertices whose shortest paths to passed through will have their distance to reduced by . The maximum decrease is , where is the set of nodes that are added to , is distance of from in the original graph and is the number of vertices whose shortest path to passes through . Thus the following has to hold: for the ordering between these two vertices to be stable. will increase with . The values of depends on the depth of the BFS tree originating from .
Betweenness Centrality: By adding edges the betweenness centrality of a vertex can increase if it gets connected to another high centrality vertex. It can also decrease, if new edges lead to alternate or smaller shortest paths.
Assume due to addition of edges to a vertex , there are new pairs of vertices whose shortest paths pass through . Also due to addition of edges in other parts of the network, there are pairs of vertices whose shortest paths used to pass through in the original network, but do not in the perturbed network. There are also pairs of edges, whose length of shortest path does not change, but after perturbation there are new shortest paths between them.
We assume that sees only decrease in its BC value and sees only increase. Therefore , where is the number of new shortest paths for the vertex pair and and .
Therefore, the difference between , must be larger than
The number of elements in will increase as increases. The number of elements in and depend on the length of the shortest paths. If the length of most of the shortest paths through is already low, then there is less chance that they will become even shorter or alternate paths will be found with addition of new edges. Based on these formulas we observe that the stability decreases with higher . For closeness and betweeness centrality, the increase (if any) also depends on network structure.
Stability based on centrality values:
We observe that the relative differences of consecutive centrality values can indicate whether the ordering will be maintained.
Figure 3 plots the change in stability as the noise levels remain constant, and the value of changes (line-graphs) and the values of the top-10 high centrality vertices (scattered plots).
Stable clusters: This phenomena occurs because it is difficult to reverse the ranking between two vertices if they have a large difference in their values. However if the values are very close then slight perturbation can change the rankings.
Therefore, we can use the relative difference between consecutively ranked vertices to group similarly valued vertices into stable clusters. If the value of falls within the cluster, the Jaccard Index is likely to change. On the other hand, if is selected such that it falls at the beginning of the cluster, then the ranking becomes more stable due to the large relative difference. This is borne out in Figure 3.
Identifying stable clusters: To identify the stable groups we compare the difference between the centrality values of the consecutively ordered vertices. The breaks into clusters occurs between the two vertices that have the high relative difference. We continue dividing the vertices into clusters until the difference is lower than a certain threshold. Identifying these stable clusters allows us to have an improved understanding of how the network will behave under various levels of noise. Networks where the clusters are small in size and the clusters have high difference between them should have high stability.
|Top 10 High Ranked Vertices|
|Dense Cluster and High Stability Networks|
|AS20000101||High (.96)||High (1)||.97||.71||High||High|
|AS20000102||High (1)||High (.78)||.95||.71||High||High|
|C. elegans||High (.94)||High (.76)||.82||.66||High||High|
|Les Mis||High (.8)||High (.76)||.66||.46||Medium||Medium|
|Sparse Cluster and Low Stability Networks|
|GrQc||Low (.26)||Medium (.64)||.26||.11||Low||Low|
|Dolphin||Low (.1)||Low (.1)||.36||.24||Low||Low|
|Football||Low (0)||Low (0)||.16||.09||Low||Low|
|Power Grid||Low (0)||Low (0)||.24||.15||Low||Low|
|High (.98)||High (.96)||.31||.24||Low||Low|
|Railway||Medium (.68)||Medium (.68)||.67||.38||Medium||Low|
|HepTh||Medium (.68)||High (.72)||.17||.13||Low||Low|
|LFR5000||High (.78)||High (1)||.77||.44||High||Medium|
|RMAT12||Medium (.58)||Medium (.48)||.04||.04||Low||Low|
|Top 6 High Ranked Vertices|
|Dense Cluster and High Stability Networks|
|AS20000101||High (.86)||High (1)||1||.93||High||High|
|AS20000102||High (.8)||High (.96)||1||.93||High||High|
|C. elegans||High (.96)||High (1)||1||.87||High||High|
|Les Mis||High (.90)||High(1)||.87||.60||Medium||Medium|
|Sparse Cluster and Low Stability Networks|
|GrQc||Low (.22)||Medium (.60)||.20||.13||Low||Low|
|Dolphin||Low (.12)||Low (0)||.47||.40||Low||Low|
|Football||Low (0)||Low (.10)||.27||.07||Low||Low|
|Power Grid||Low (0)||Low (0)||.27||.13||Low||Low|
|High (.90)||High (.76)||.33||.27||Low||Low|
|Railway||Medium (.68)||Medium (.62)||.73||.40||Medium||Low|
|HepTh||Medium (.66)||High (1)||.27||.13||Low||Low|
|LFR5000||High (.72)||High (.84)||.73||.4||High||Medium|
|RMAT12||Medium (.64)||Medium (.6)||.07||.07||Low||Low|
Stability based on network structure: We now see how the network structure affects its stability. The slope of the degree distributions () for most of the networks in our test suite are from 1.5 to 3, their average local clustering coefficient is very varied, and neither of these correlate to the stability of the networks. Therefore, as seen from the earlier equations that the stability seems to be dependent on the local structure of the high centrality vertices.
High ranked common neighbors: We investigated whether the common neighbors of the top nodes also have high rank. For each pair of nodes within the top (=10 and =6) set (for a given centrality metric) we calculated the Jaccard Index between their connections to the top 100, 50, 25, and 10 high ranking nodes, and computed the average JI for each set of neighbors (top 100, top 50, etc.).
As the range of high ranked neighbors decreases (from 100 down to 10), the average JI value increases (Figure 4). This indicates that the top- high rank nodes have more common neighbors among the high-ranked nodes. The curves divide into three regions. The top networks are the ones with high stability (e.g., C. elegans), the networks in the middle have not so high stability (e.g., Email) and the ones at the bottom show low stability (e.g., Football).
Subgraph induced by high ranked vertices: For each metric, we identified the top high ranked vertices and then computed the density of the induced subgraphs from the vertices in this set. Networks that achieve more instances of high stability have more dense subgraphs (Figure 5).
Summary: Table 2 summarizes the results for and . The density of a subgraph is the ratio of the total number of edges in the subgraph by the total possible edges.
If the networks have high stability for the top- vertices, then the subgraph induced by those vertices is also dense ( .60). Conversely, if the network has low stability, then the corresponding subgraphs are sparse ( .40, with Dolphin being the exception). This pattern is also observed when comparing their common neighbors. For high (low) stability networks, the corresponding line in Figure 4 is in the high (low) range. The results are similar for and .
The exceptions are listed under Outliers. For example, Email and HepTh have high stability but low subgraph density. In these cases, a smaller subgroup of the high centrality vertices form a dense cluster, and the remaining high centrality vertices connect to that cluster. Another case is Railway with medium, tending to high stability for BC. Here, the subgraph for betwenness consists of two smaller clusters connected to each other (see Figure 5). Similar characteristics appear for Les Mis and GrQc (high BC, low density)
Table 2, also shows the results of two synthetic networks, RMAT12 (random network created using RMAT generator) and LFR5000 (scale free network created using LFR generator with =.1). The subgraph density of RMAT12 is constant for both centralities. LFR5000 however, shows strong subgraphs for closeness and a strong cluster over a subset of vertices for betweenness centrality. Therefore compared to random graphs, scale-free networks with strong communities are more stable.
Template to detect high stability networks: Based on our observations, we propose a template to identify stable networks as follows; 1. Identify the top- centrality nodes and their values 2. Stability Condition 1: Identify the lower bound between the differences of the centrality values that will maintain the ordering. If the difference in the high centrality nodes is greater than the lower bound, then the network is stable for that range of .
3. Stability Condition 2: Find stable clusters based on the values of high centrality nodes. If falls at the beginning of the cluster, then the network is stable for that range of .
4. Stability Condition 3: Find the subgraph induced by the top- nodes. If the subgraphs are dense and the number of common high ranked neighbors is high, then the network is stable for that range of .
If all these conditions are satisfied, the network should be highly stable. Conversely, if none of these conditions are satisfied the network should have low stability. Note that our method does not require the user to actually perturb the network to estimate its stability.
Our experiments demonstrate two extremely important findings which have so far never been observed. The first is that networks where the high centrality vertices are very well-connected, i.e., they form a “rich-club”, are more stable. The second is that the stability of the rankings of nodes depends on the number of top ranked nodes () being investigated. The top nodes seem to arrange themselves into groups; if the value of is such that it does not split a group then the results are stable, otherwise they are unstable.
Based on these conditions of stability, users can evaluate the stability of their networks, without applying the noise model. They can also use these conditions to improve the stability of their data collection methods.
In future, we plan to extend this study to other forms of noise models and other varieties of network properties. We also plan to develop methods to determine the thresholds automatically. A final direction would be to analyze the performance of the stability detection algorithm for other networks and other application areas.
Sandjukta Bhowmick and Vladimir Ufimtsev thank NSF Award Number 1533881 for funding this project.
- We consider ranking from 1 (high) to (low). The vertex with highest centrality value is ranked 1
- Rank 1 node is not shown due its high value. By plotting it, the relative difference between the other vertices cannot be visualized well.
- A. Adiga and A. K. Vullikanti. How robust is the core of a network? In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Volume 8188, ECML PKDD 2013, pages 541–556, 2013.
- D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner. Graph partitioning and graph clustering. 10th dimacs implementation challenge. http://www.cc.gatech.edu/dimacs10/archive/clustering.shtml, 2012.
- S. P. Borgatti, K. M. Carley, and D. Krackhardt. On the robustness of centrality measures under conditions of imperfect data. Social networks, 28(2):124–136, 2006.
- S. Ghosh, A. Banerjee, N. Sharma, S. Agarwal, and N. Ganguly. Statistical analysis of the indian railway network: a complex network approach. Acta Physica Polonica B Proceedings Supplement, 4:123–137, 2011.
- M. Herland, P. Pastran, and X. Zhu. An empirical study of robustness of network centrality scores in various networks and conditions. In Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on, pages 221–228. IEEE, 2013.
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
- M. Newman. Networks: an introduction. Oxford, 2010.
- S. Tsugawa and H. Ohsaki. Analysis of the robustness of degree centrality against random errors in graphs. In Complex Networks VI, pages 25–36. Springer, 2015.
- D. J. Wang, X. Shi, D. A. McFarland, and J. Leskovec. Measurement error in network data: A re-classification. Social Networks, 34(4):396–409, 2012.