Leveraging percolation theory to single out influential spreaders in networks
Abstract
Among the consequences of the disordered interaction topology underlying many social, technological and biological systems, a particularly important one is that some nodes, just because of their position in the network, may have a disproportionate effect on dynamical processes mediated by the complex interaction pattern. For example, the early adoption by an opinion leader in a social network may change the fate of a commercial product, or just a few superspreaders may determine the virality of a meme in social media. Despite many recent efforts, the formulation of an accurate method to optimally identify influential nodes in complex network topologies remains an unsolved challenge. Here, we present the exact solution of the problem for the specific, but highly relevant, case of the SusceptibleInfectedRemoved (SIR) model for epidemic spreading at criticality. By exploiting the mapping between bond percolation and the static properties of SIR, we prove that the recently introduced NonBacktracking centrality is the optimal criterion for the identification of influential spreaders in locally treelike networks at criticality. By means of simulations on synthetic networks and on a very extensive set of realworld networks, we show that the NonBacktracking centrality is a highly reliable metric to identify top influential spreaders also in generic graphs not embedded in space, and for noncritical spreading.
pacs:
I Introduction
Social, technological and biological systems are often characterized by underlying interaction topologies with complex features Albert and Barabási (2002); Newman (2010). In a complex network, the roles played by individual nodes are highly heterogeneous. Understanding the impact of individual vertices on the global functionality of the system is one of the most fundamental, yet not fully solved, problems of network science. Centrality measures have indeed the purpose of quantitatively gauging the importance of individual vertices Wasserman and Faust (1994). Among the most natural and used ones are degree, betweenness centrality Freeman (1977), kshell (or kcore) index Seidman (1983), and eigenvector centrality Bonacich (1972).
Spreading is at the root of a vast class of phenomena occurring on network substrates: the propagation of contagious diseases PastorSatorras et al. (2015), the diffusion of information or memes Ratkiewicz et al. (2011), the adoption of innovations Bakshy et al. (2009), etc. A large interest has been recently devoted to the identification of influential spreaders (often called superspreaders), i.e., nodes that, if chosen as initiators, maximize the extent of a spreading process. The goal is to identify which of the many centrality metrics, that can be computed using only topological information, is most strongly correlated with the ability of a node to originate massive spreading events.
Probably, a fully universal method, able to perfectly single out the most influential nodes for arbitrary spreading dynamics on arbitrary networks, does not exist. It is in fact reasonable to expect that the predictive power of the different centralities strongly depends not only on the topology of the underlying network but also on the details of the spreading process. Numerical evidence in this sense can be found in BorgeHolthoefer and Moreno (2012); de Arruda et al. (2014). A much more reasonable goal is instead to identify a metric able to optimally solve the problem for specific types of dynamics. Here, we take this path and concentrate our attention on the SusceptibleInfectedRemoved (SIR) model for epidemics. SIR is a paradigmatic model for spreading, and the vast majority of the investigations about the identification of influential spreaders in complex networks have dealt with it. In random networks, classical results on the SIR model relate the epidemic threshold to moments of the degree distribution PastorSatorras et al. (2015). Hence, a naive hypothesis is to assume that the spreading ability is strongly correlated to the degree of the initiator. This view has been challenged by Kitsak et al., who proposed the kcore (also called kshell) index (which singles out nodes belonging to dense, mutually interconnected, subgraphs) as a proper indicator of the spreading ability Kitsak et al. (2010). This seminal paper has been followed by an avalanche of other studies aimed at investigating the issue for the same or different dynamical processes, synthetic or realworld networks, using a wide range of centralities proposed as predictors of the spreading ability of the different vertices Bauer and Lizier (2012); Klemm et al. (2012); Chen et al. (2012); da Silva et al. (2012); Chen et al. (2013); Liu et al. (2013); Ren et al. (2014); de Arruda et al. (2014); Liu et al. (2015a, b, 2016). Many empirical investigations have casted doubts on the ability of the kshell index to identify influential spreaders in various topologies da Silva et al. (2012); Liu et al. (2015a). However in a very recent work, Ferraz de Arruda et al. have reaffirmed the superiority of the kshell index and degree centrality as predictors for top influencial spreaders in nonspatial networks de Arruda et al. (2014). The authors of this paper proposed also an additional centrality metric, the socalled generalized random walk accessibility, to overcome limitations of the kshell index in spatially embedded networks. The picture emerging from all these efforts is not satisfactory: All heuristics proposed are motivated based on physical intuition but involve uncontrolled approximations; No exact result is available even for synthetic idealized but nontrivial topologies. Methods are generally validated numerically on a very limited number of networks, with no complete control of their topological properties. In this paper we fill this gap, presenting a physically grounded method which solves exactly the problem in a nontrivial case, and performs very well in a very broad spectrum of situations.
Our work is based on the connection existing between bond percolation and the static properties of the SIR model for epidemics Grassberger (1983a); Newman (2002); PastorSatorras et al. (2015). Very recent results have pointed out the crucial role played by the spectral properties of the NonBacktracking (NB) matrix in determining the properties of the bond percolation process in complex networks Karrer et al. (2014); Hamilton and Pryadko (2014a); Radicchi and Castellano (2015). Combining these two well established facts, we therefore propose the NB centrality Martin et al. (2014) as the quantity of choice for the identification of influential spreaders in disordered topologies. In particular we show that, on locally treelike networks, the NB centrality provides the exact solution to the problem of finding the best single influential spreader, if critical spreading is considered. We complement this result with a thorough empirical investigation of the problem on a very large set of realworld topologies, of social, technological and biological origin, exhibiting a large variety of size, sparsity, heterogeneity and other topological features. We compare the performance, as predictors of the spreading power of single nodes, of the most important centralities proposed so far. We show that NB centrality turns out to be, in the majority of cases, the best quantity able to single out the most influential initiators of spreading processes in networks.
Ii The problem of influential spreaders
ii.1 The spreading dynamics
To model the spreading dynamics, we consider the SIR model, the simplest and most studied dynamics for epidemics in the presence of acquired immunity PastorSatorras et al. (2015). Each vertex of a network can be either in state S (susceptible), I (infected) or R (interpreted either as recovered or removed). We consider the continuous time version of the dynamics. At each instant of time, two elementary events may occur: (i) \ceI ¿[ν] R, meaning that, at rate , a spontaneous recovery/removal event may turn a node in state I into state R; (ii) \ceI + S ¿[β] 2I, indicating the spreading of the infection, at rate , among pairs of connected nodes in states I and S. Starting from an initial configuration where all vertices are in the state S and only node is in state I, a connected set of contiguous vertices may be infected, but after some time all infected nodes eventually switch to the R state and the outbreak ends. The total number of nodes whose final state is R represents the extent of the spreading event originated by the single seed . The problem of interest here is the identification of influential spreaders, i.e., finding, based only on the topology of the network, what node of the network must be selected as an initiator of the epidemics in order to maximize the average outbreak size. The asymptotic behavior of the SIR model depends on the ratio . If initiators are randomly chosen, then one can define a critical threshold . For smaller than the epidemic threshold , spreading events are of finite (subextensive) size. For instead the infection involves a finite fraction of the whole system. It is therefore reasonable to expect that also the identity and role of influential spreaders depends on the value of . See the Appendix for info about how is determined numerically.
ii.2 Numerical simulations
For a given network, we rank the nodes on the basis of their spreading power. We numerically simulate the SIR dynamical process with a single initial seed in state I and all other nodes in state S. After the dynamics has ended, we record the number of nodes in state R. We then repeat the procedure times, and quantify the spreading power of node as , that is the average size of the outbreak generated from the initial seed . The measure and its associated ranking are the benchmarks against which we compare the centralities proposed to identify influential spreaders. We consider four standard centrality metrics: degree, kcore, eigenvector centrality, and the generalized Random Walk Accessibility (RWA). Eigenvector centrality has been indicated as an effective predictor within mean field analyses, i.e. neglecting dynamical correlations between states of adjacent vertices Klemm et al. (2012). RWA has been recently identified by Ferraz de Arruda et al. as the best predictor for influential spreaders in spatial networks de Arruda et al. (2014). Quantitative comparisons of performance among centrality metrics are based on two complementary measures: the imprecision function Kitsak et al. (2010), and the Jaccard distance . Both measures take as input two sets of nodes. The first is the list of the first actual top spreaders, with and size of the network, as identified from the results of numerical simulations of the SIR model, hence ranked on the basis of the score . The second set is the list of top nodes when nodes are ranked according to the centrality score . Both and return a value ranging between 0, for perfect matching (i.e. the centrality perfectly predicts the spreading influence of the fraction of top spreaders) and 1, for completely failed prediction. The complementarity between the two measures of performance is apparent from their definitions (see Appendix). The Jaccard distance measures the difference among the “identity” of the nodes included in the sets of true and predicted top influencers. The imprecision function is completely insensitive to the identity of the nodes, and is determined instead only by their spreading power.
Iii Results
iii.1 Exact solution on locally treelike networks: NonBacktracking centrality
The Hashimoto or NonBacktracking (NB) matrix is a special representation of the structure of a network Hashimoto (1989). In an arbitrary undirected and unweighted network with edges, the NB matrix is a array defined as follows. Every edge is split in two directed edges and . The generic entry of the NB matrix is , where is the Kronecker symbol. is different from zero and equals one only if the edges and define a nonbacktracking path of length two. is an asymmetric matrix with real and positive principal eigenvalue. The components of the principal eigenvector, namely , can be used to define the NB centrality of vertex as Martin et al. (2014)
(1) 
This centrality is similar to the common eigenvector centrality, but it disregards the contribution of vertex to the centrality of its neighbors, thus avoiding the selfreinforcement effect responsible in some cases for the localization of the eigenvector centrality Martin et al. (2014); PastorSatorras and Castellano (2016a). We remark that the computation of the principal eigenpair of the matrix can be performed using a simple poweriteration method. This allows to estimate the NB centrality of all nodes in a time that scales as . The IharaBass determinant formula may be further used to reduce memory storage in the computation of the NB centrality Bass (1992).
The NB matrix has been shown to play a crucial role in the problem of graphclustering Krzakala et al. (2013) and, more recently, in percolation Karrer et al. (2014); Hamilton and Pryadko (2014b). In particular, the percolation threshold in locally treelike networks is exactly given by the inverse of the largest eigenvalue of the NB matrix for both bond and site percolation Karrer et al. (2014); Hamilton and Pryadko (2014b); Radicchi and Castellano (2015). As a consequence, the probability that node is part of the percolating cluster immediately above the threshold is given by the expression of Eq. (1).
The mapping between the static properties of the SIR model and bond percolation Grassberger (1983b); Newman (2002); PastorSatorras et al. (2015) reveals that SIR epidemic outbreaks coincide with the clusters of the associated bond percolation process, where the bond occupation probability for percolation and the effective spreading rate for SIR are related by . This connection has a very important consequence: the relative size of an epidemic outbreak started from a specific node is proportional to the probability that belongs to the percolating cluster. At the critical point, this probability coincides with the NB centrality, thus
(2) 
As a consequence the top spreaders are the vertices with the highest NB centrality. Note that this is an exact result, provided the network structure is locally treelike ^{1}^{1}1The mapping is strictly valid for a SIR model where the recovery time is fixed. For the SIR version considered in simulations here, the recovery time is nondegenerate and this implies that the mapping is not strictly exact Meyers et al. (2006); Miller (2007); Kenah and Robins (2007).. In Fig. 1 we test numerically the validity of this connection in synthetic networks. Panel (a) confirms that Eq. (2) is generally well obeyed and tends to be more and more precise as the system size grows, thus making the network more and more locally treelike. Panel (b) shows instead that the values of the other node centralities have a lower degree of proportionality with the average outbreak size initiated by them.
The exact equivalence between SIR outbreak size and NB centrality holds only at criticality, i.e. for . As we depart from the equivalence becomes less accurate. The probability of belonging to the percolating cluster is no more equal to . Moreover, below the largest cluster does not dominate the cluster size distribution of percolation. We stress however that criticality is the regime where the identification of influential spreaders really matters: The further we move away from the critical point, the less interesting and nontrivial the problem becomes. For large values of in the supercritical regime, any seed will lead to large outbreaks involving a very large portion of the entire network. In the deeply subcritical regime instead, at very low values of , all spreading events involve a very small neighborhood of the initial seed. Only around criticality the choice of the initiator may have substantial impact on the spreading event, i.e., whether the spreading phenomenon remains confined to a few nodes or it reaches an extensive fraction of the network.
iii.2 Top spreaders in synthetic networks
We now test the implications of the results in the previous section for spreading on locally treelike synthetic networks.
We consider a network with degree distribution decaying as , with , and compare the performance of the various centralities as predictors of the top spreaders in the network. In Fig. 2 we plot the two dissimilarity measures and , for the various centralities, as a function of the fraction of topranking nodes. The imprecision function provides a very clear picture: the outbreaks started in nodes with highest NB centrality are of the same size as those initiated by the best influential spreaders in numerical simulations. The degree, the eigenvector centrality, the generalized random walk accessibility, and, most markedly, the kshell index perform much worse. The plot for gives a similar message, with the difference that the measure does not vanish for the NB centrality. This last observation can be understood by considering that the NB centralities of distinct nodes do not differ much (see Fig. 1). Therefore, it is likely that small uncertainties in the values of calculated numerically may considerably alter the ranking of the nodes, leading to a nonvanishing Jaccard dissimilarity. For the very same reason, the numerical uncertainties have no appreciable effect on the imprecision function , which is very close to 0.
If the same analysis is repeated for other values of the exponent or for ErdősRényi networks, a very similar phenomenology is found (see SM1 and SM2): NB centrality outperforms eigenvector centrality and generalized random walk accessibility in the identification of influential spreaders. Degree and kcore centrality still deliver poor performances.
We conclude that NB centrality is the optimal choice for the selection of influential spreaders on locally treelike networks at criticality. The same considerations extend to the subcritical and supercritical regimes, provided that is not too far away from the critical point (Fig. SM3).
iii.3 Top spreaders in realworld networks
As substrates for the spreading dynamics, we now consider a very large collection of realworld topologies of diverse origin, size and topological features. Many of these networks have a sizeable clustering coefficient, so that they cannot be considered, even approximately, as treelike. We analyze a total of networks. Details can be found in the SM.
In Fig. 3, we present the results for two such networks: a graph of email contacts Kitsak et al. (2010), and the Gnutella peertopeer network Leskovec et al. (2007). It turns out very clearly that, for these structures, kshell centrality and degree perform very badly; RWA performs slightly better, but still poorly; eigenvector and NB centralities are instead very effective in identifying influential spreaders. Among these two, NB centrality provides a slightly more effective recipe for the identification of top influential spreaders. The picture obtained for synthetic networks is then essentially confirmed. We have repeated the same analysis for a very large set of networks with nonspatial embedding (Tables SM1, SM2, and SM3). The set of networks include graphs of different nature (e.g., biological, technological, social) thus with large variability in their topological features (e.g., degree distribution, size, clustering coefficient). Whereas some variation exists depending on the detailed topology, overall the message is clear: the NB centrality of a node is, in about of the networks analyzed, the most accurate predictor of the spreading ability of individual nodes (Fig. 4). NB centrality outperforms all other centrality metrics in real nonspatial networks. Only in graphs with spatial embedding RWA provides better performances (Fig. SM6).
Previous results are obtained for critical spreading, where the susceptibility of the system is maximal. We repeat the analysis for the subcritical regime by setting , and the supercritical phase for (Figs. SM4 and SM5). The overall picture is again similar to the one observed for critical spreading: the NB centrality is in the majority of cases the most accurate predictor to identify influential spreaders.
Iv Conclusions
The present analysis provides convincing evidence that the centrality determined from the NonBacktracking (NB) matrix of a graph represents the best predictor for the identification of SIR influential spreaders in the network. The choice of this centrality measure is motivated by recent theoretical progress in the study of percolation processes in arbitrary locally treelike graphs, and by the equivalence between the SIR model and the bond percolation model at criticality. Even in real networks, where the locally treelike ansatz is violated, NB centrality turns out to greatly outperform other centrality metrics in the task of identifying top influential spreaders. We remark also that NB centrality can be computed in a time that scales almost linearly with the system size, and it is thus applicable to very large networks.
An additional, interesting, result emerging from our systematic analysis of real networks is that kshell centrality generally provides very unsatisfactory performances, not only compared to NB centrality, but also to degree, eigenvector centrality and generalized random walk accessibility. This is at odds with what claimed in the seminal paper by Kitsak and collaborators Kitsak et al. (2010), and more recently remarked by Ferraz de Arruda et al. de Arruda et al. (2014), with the analysis of very small samples of realworld networks. Given the amount of realworld graphs considered in our study, we believe that our message is conclusive: kshell index can be easily outperformed by other simple centrality metrics in the identification of influential nodes in dynamical processes on complex networks. One of the main reasons of the poor ability of the kcore to identify top spreaders is rooted in the very definition of kcore index, which necessarily involves a large degeneracy da Silva et al. (2012); Pei and Makse (2013). The metric is not able to make a distinction among top vertices in the ranking, since, by definition, nodes must be tied at the top position if is the maximal value of the kshell centrality measured in a network. This fact is clearly illustrated for artificial graphs in Fig. 1, where the kcore index is the same for very large groups of vertices, whereas their spreading power is highly heterogeneous. Similar considerations are valid also for real graphs. In Fig. 5, we consider SIR on a substrate given by an air transportation network within the US fli (). The spreading influence of individual nodes is well reproduced by the NB centrality and, more approximately, by the RWA. According to the kshell centrality, several airports are ranked at the top of the list. The top tier is, however, composed of airports with fundamentally different values of their spreading power: for example, “HartsfieldJackson Atlanta International Airport“, the actual top spreader in the network, is tied with “Wilmington International Airport”, despite the latter actually has a spreading power twice smaller than the top spreader.
Beyond their applicability to relevant real situations, these results open new exciting perspectives for other, related, problems. A first question is the validity of the NB centrality solution for other types of spreading dynamics, different from the SIR class. While NB centrality is unlikely to perform well for rumor dynamics BorgeHolthoefer and Moreno (2012); de Arruda et al. (2014), the question is open for more complex modeling frameworks for epidemics, such as metapopulations Grenfell and Harwood (1997); Colizza and Vespignani (2008). Secondly, the problem studied here refers to individual spreaders. Substantially different results may arise in the case of optimal multiple spreaders, i.e. the identification of the subset of network vertices (of a given number of nodes) maximizing the extent of a spreading process seeded in all of them at the same time. As already noted in Ref. Kitsak et al. (2010), starting the process in the best single spreaders often results in suboptimal propagation, because of the overlap among the areas of influence of the best individual spreaders. Finding the best set of multiple spreaders is a different, highly nontrivial, NPcomplete optimization problem Kempe et al. (2003), for which many clever approximation schemes have been proposed Kempe et al. (2003); Altarelli et al. (2013); Morone and Makse (2015), but a scalable and accurate general approach is still not available. The insights provided by the mapping to percolation and the consideration of the NB centrality may pave the way for further progress also in this context. Another exciting line of research regards the identification of influential spreaders from empirical data on realworld spreading phenomena GonzálezBailón et al. (2011); Pei et al. (2014). In this respect, the problem is further complicated by the fact that the spreading dynamics at the microscopic level are not known a priori and may contain additional ingredients not included in the simple models usually considered.
Acknowledgements.
FR acknowledges support from the National Science Foundation (Grant CMMI1552487) and the US Army Research Office (W911NF1610104).Appendix A Measures of performance
a.1 The imprecision function
The imprecision function Kitsak et al. (2010) quantifies the difference between the average size of the spreading events initiated (as single spreaders) by the first vertices according to a given centrality and the analogous size for the actual most efficient spreaders in SIR simulations. is the number of nodes in the network and . More in detail, let us define as the set of the top vertices according to the centrality and the actual top spreaders, as measured in SIR simulations. The quantity
(3) 
is the average size of outbreaks originated in the most highly ranked nodes according to the centrality . If is the same quantity as in Eq. (3) but computed over the set , the imprecision function is defined as
(4) 
If the centrality perfectly identifies the most efficient spreaders, the imprecision function equals zero. High values of indicate that the centrality is not a good predictor of the spreading power of the top spreaders. To account for possible ties in the centrality metric , we average the imprecision function over at least 10 realizations of the set .
a.2 The Jaccard distance
The Jaccard distance is a measure of the dissimilarity between two sets and . This quantity is defined as
(5) 
where stands for the number of elements in the set . Clearly, if the two sets and coincide the distance vanishes, while if they have null intersection, then their distance equals one.
Appendix B Centrality measures
We consider the following centrality measures.

Degree centrality. This is the simplest centrality measure that can be defined for nodes in a network. The degree of node equals the number of neighbors of vertex in the network.

kshell centrality. A kcore is a subset of nodes composed of vertices that have at least neighbors within the set itself. The kshell or kcore index of a node equals the largest value of kcores which the node belongs to.

Eigenvector centrality. The score assigned to each node equals the value of the component of the principal eigenvector of the adjacency matrix of the network.

The score of node based on the generalized Random Walk Accessibility (RWA) is defined as , where , with th power of the random walk transition matrix of the graph de Arruda et al. (2014). The exact computation of the RWA score for all nodes in the network requires the diagonalization of the matrix , an unfeasible task for medium and largesize networks. Good approximations of RWA scores can be obtained by means of agentbased simulations of the random walk dynamics. Our estimates of RWA are based on average values obtained over independent walks of maximal length for every node in the network.
Appendix C Numerical determination of the epidemic thresholds
For a given network, we determine the critical value in the following way. For a given value of , we start from a configuration where all nodes are in state S, and one randomly chosen vertex is in state I. We run the SIR model, and measure the size of the outbreak . We repeat the procedure times, every time choosing at random a node as initial seed of the epidemics, and compute the first and second moment of the size of the outbreak, namely and . The critical value value is determined from the position of the peak of the ratio PastorSatorras and Castellano (2016b). Values of for all networks analyzed in this paper are reported in Tables SM1, SM2, and SM3.
References
 Albert and Barabási (2002) R. Albert and A.L. Barabási, Reviews of Modern Physics 74, 47 (2002).
 Newman (2010) M. Newman, Networks: an introduction (OUP Oxford, 2010).
 Wasserman and Faust (1994) S. Wasserman and K. Faust, Social network analysis: Methods and applications, vol. 8 (Cambridge university press, 1994).
 Freeman (1977) L. C. Freeman, Sociometry 40, 35 (1977).
 Seidman (1983) S. B. Seidman, Social Networks 5, 269 (1983).
 Bonacich (1972) P. Bonacich, Journal of Mathematical Sociology 2, 113 (1972).
 PastorSatorras et al. (2015) R. PastorSatorras, C. Castellano, P. Van Mieghem, and A. Vespignani, Rev. Mod. Phys. 87, 925 (2015).
 Ratkiewicz et al. (2011) J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, S. Patil, A. Flammini, and F. Menczer, in Proceedings of the 20th international conference companion on World wide web (ACM, 2011), pp. 249–252.
 Bakshy et al. (2009) E. Bakshy, B. Karrer, and L. A. Adamic, in Proceedings of the 10th ACM conference on Electronic commerce (ACM, 2009), pp. 325–334.
 BorgeHolthoefer and Moreno (2012) J. BorgeHolthoefer and Y. Moreno, Phys. Rev. E 85, 026116 (2012).
 de Arruda et al. (2014) G. F. de Arruda, A. L. Barbieri, P. M. Rodríguez, F. A. Rodrigues, Y. Moreno, and L. d. F. Costa, Phys. Rev. E 90, 032812 (2014).
 Kitsak et al. (2010) M. Kitsak, L. Gallos, S. Havlin, L. Liljeros, F. an d Muchnik, H. Stanley, and H. Makse, Nature Physics 6, 888 (2010).
 Bauer and Lizier (2012) F. Bauer and J. T. Lizier, EPL (Europhysics Letters) 99, 68007 (2012).
 Klemm et al. (2012) K. Klemm, M. Á. Serrano, V. M. Eguíluz, and M. San Miguel, Scientific reports 2, 292 (2012).
 Chen et al. (2012) D. Chen, L. Lu, M.S. Shang, Y.C. Zhang, and T. Zhou, Physica A: Statistical Mechanics and its Applications 391, 1777 (2012).
 da Silva et al. (2012) R. A. P. da Silva, M. P. Viana, and L. da Fontoura Costa, Journal of Statistical Mechanics: Theory and Experiment 2012, P07005 (2012).
 Chen et al. (2013) D.B. Chen, R. Xiao, A. Zeng, and Y.C. Zhang, EPL (Europhysics Letters) 104, 68006 (2013).
 Liu et al. (2013) J.G. Liu, Z.M. Ren, and Q. Guo, Physica A: Statistical Mechanics and its Applications 392, 4154 (2013).
 Ren et al. (2014) Z.M. Ren, A. Zeng, D.B. Chen, H. Liao, and J.G. Liu, EPL (Europhysics Letters) 106, 48005 (2014).
 Liu et al. (2015a) Y. Liu, M. Tang, T. Zhou, and Y. Do, Scientific reports 5, 9602 (2015a).
 Liu et al. (2015b) Y. Liu, M. Tang, T. Zhou, and Y. Do, Scientific reports 5, 13172 (2015b).
 Liu et al. (2016) J.G. Liu, J.H. Lin, Q. Guo, and T. Zhou, Scientific reports 6, 21380 (2016).
 Grassberger (1983a) P. Grassberger, Mathematical Biosciences 63, 157 (1983a).
 Newman (2002) M. E. J. Newman, Phys. Rev. E 66, 016128 (2002).
 Karrer et al. (2014) B. Karrer, M. E. J. Newman, and L. Zdeborová, Phys. Rev. Lett. 113, 208702 (2014).
 Hamilton and Pryadko (2014a) K. E. Hamilton and L. P. Pryadko, Phys. Rev. Lett. 113, 208701 (2014a).
 Radicchi and Castellano (2015) F. Radicchi and C. Castellano, Nature communications 6, 10196 (2015).
 Martin et al. (2014) T. Martin, X. Zhang, and M. E. J. Newman, Phys. Rev. E 90, 052808 (2014).
 Hashimoto (1989) K. Hashimoto, Adv. Stud. Pure Math. 15, 211 (1989).
 PastorSatorras and Castellano (2016a) R. PastorSatorras and C. Castellano, Scientific Reports 6, 18847 (2016a).
 Bass (1992) H. Bass, Int. J. Math. 3, 717 (1992).
 Krzakala et al. (2013) F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. ZdeborovÃ¡, and P. Zhang, Proceedings of the National Academy of Sciences 110, 20935 (2013).
 Hamilton and Pryadko (2014b) K. E. Hamilton and L. P. Pryadko, Phys. Rev. Lett. 113, 208701 (2014b).
 Catanzaro et al. (2005) M. Catanzaro, M. Boguñá, and R. PastorSatorras, Phys. Rev. E 71, 027103 (2005).
 Grassberger (1983b) P. Grassberger, Math. Biosci. 63, 157 (1983b).
 Meyers et al. (2006) L. A. Meyers, M. E. J. Newman, and B. Pourbohloul, Journal of theoretical biology 240, 400 (2006).
 Miller (2007) J. C. Miller, Phys. Rev. E 76, 010101 (2007).
 Kenah and Robins (2007) E. Kenah and J. M. Robins, Phys. Rev. E 76, 036113 (2007).
 Leskovec et al. (2007) J. Leskovec, J. Kleinberg, and C. Faloutsos, ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 2 (2007).
 (40) Bureau of transportation statistics, http://www.transtats.bts.gov, accessed: 20150118.
 Radicchi (2015) F. Radicchi, Nature Physics 11, 597 (2015).
 Pei and Makse (2013) S. Pei and H. A. Makse, Journal of Statistical Mechanics: Theory and Experiment 2013, P12002 (2013).
 Grenfell and Harwood (1997) B. Grenfell and J. Harwood, Trends in ecology & evolution 12, 395 (1997).
 Colizza and Vespignani (2008) V. Colizza and A. Vespignani, Journal of Theoretical Biology 251, 450 (2008).
 Kempe et al. (2003) D. Kempe, J. Kleinberg, and E. Tardos, in Proceedings of the ninth ACM SIGKDD international conferenc e on Knowledge discovery and data mining (ACM, New York, NY, USA, 2003), KDD ’03, pp. 137–146.
 Altarelli et al. (2013) F. Altarelli, A. Braunstein, L. Dall’Asta, and R. Zecchina, Journal of Statistical Mechanics: Theory and Experiment 2013, P09011 (2013).
 Morone and Makse (2015) F. Morone and H. A. Makse, Nature 524, 65 (2015).
 GonzálezBailón et al. (2011) S. GonzálezBailón, J. BorgeHolthoefer, A. Rivero, and Y. Moreno, Scientific reports 1, 197 (2011).
 Pei et al. (2014) S. Pei, L. Muchnik, J. S. Andrade Jr, Z. Zheng, and H. A. Makse, Scientific reports 4, 5547 (2014).
 PastorSatorras and Castellano (2016b) R. PastorSatorras and C. Castellano, (submitted) (2016b).
network  Sub.  Cri.  Sup.  Refs.  Url  
Social 3  ✓  ✓  ✓  Milo et al. (2004)  url  
Karate club  ✓  ✓  ✓  Zachary (1977)  url  
Protein 2  ✓  ✓  ✓  Milo et al. (2004)  url  
Dolphins  ✓  ✓  ✓  Lusseau et al. (2003)  url  
Social 1  ✓  ✓  ✓  Milo et al. (2004)  url  
Les Miserables  ✓  ✓  ✓  Knuth et al. (1993)  url  
Protein 1  ✓  ✓  ✓  Milo et al. (2004)  url  
E. Coli, transcription  ✓  ✓  ✓  Mangan and Alon (2003)  url  
Political books  ✓  ✓  ✓  Adamic and Glance (2005)  url  
David Copperfield  ✓  ✓  ✓  Newman (2006)  url  
College football  ✓  ✓  ✓  Girvan and Newman (2002)  url  
S 208  ✓  ✓  ✓  Milo et al. (2004)  url  
High school, 2011  ✓  ✓  ✓  Fournet and Barrat (2014)  url  
Bay Dry  ✓  ✓  ✓  Ulanowicz et al. (1998); Kunegis (2013)  url  
Bay Wet  ✓  ✓  ✓  Kunegis (2013)  url  
Radoslaw Email  ✓  ✓  ✓  Michalski et al. (2011); Kunegis (2013)  url  
High school, 2012  ✓  ✓  ✓  Fournet and Barrat (2014)  url  
Little Rock Lake  ✓  ✓  ✓  Martinez (1991); Kunegis (2013)  url  
Jazz  ✓  ✓  ✓  Gleiser and Danon (2003)  url  
S 420  ✓  ✓  ✓  Milo et al. (2004)  url  
C. Elegans, neural  ✓  ✓  ✓  Watts and Strogatz (1998)  url  
Network Science  ✓  ✓  ✓  Newman (2006)  url  
Dublin  ✓  ✓  ✓  Isella et al. (2011); Kunegis (2013)  url  
US Air Trasportation  ✓  ✓  ✓  Colizza et al. (2007)  url  
S 838  ✓  ✓  ✓  Milo et al. (2004)  url  
Yeast, transcription  ✓  ✓  ✓  Milo et al. (2002)  url  
URV email  ✓  ✓  ✓  Guimera et al. (2003)  url  
Political blogs  ✓  ✓  ✓  Adamic and Glance (2005)  url  
Air traffic  ✓  ✓  ✓  Kunegis (2013)  url  
Yeast, protein  ✓  ✓  ✓  Jeong et al. (2001)  url  
Petster, hamster  ✓  ✓  ✓  Kunegis (2013)  url  
UC Irvine  ✓  ✓  ✓  Opsahl and Panzarasa (2009); Kunegis (2013)  url  
Yeast, protein  ✓  ✓  ✓  Bu et al. (2003)  url  
Japanese  ✓  ✓  ✓  Milo et al. (2004)  url  
Open flights  ✓  ✓  ✓  Opsahl et al. (2010); Kunegis (2013)  url  
GRQC, 19932003  ✓  ✓  ✓  Leskovec et al. (2007a)  url  
Tennis  ✓  ✓  ✓  Radicchi (2011)  url  
US Power grid  ✓  ✓  ✓  Watts and Strogatz (1998)  url  
HT09  ✓  ✓  ✓  Isella et al. (2011)  url  
HepTh, 19951999  ✓  ✓  ✓  Newman (2001)  url 
network  Sub.  Cri.  Sup.  Refs.  Url  
Reactome  ✓  ✓  ✓  JoshiTope et al. (2005); Kunegis (2013)  url  
Jung  ✓  ✓  ✓  Šubelj and Bajec (2012); Kunegis (2013)  url  
Gnutella, Aug. 8, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
JDK  ✓  ✓  ✓  Kunegis (2013)  url  
AS Oregon  ✓  ✓  ✓  Leskovec et al. (2005)  url  
English  ✓  ✓  ✓  Milo et al. (2004)  url  
Gnutella, Aug. 9, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
French  ✓  ✓  ✓  Milo et al. (2004)  url  
HepTh, 19932003  ✓  ✓  ✓  Leskovec et al. (2007a)  url  
Gnutella, Aug. 6, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
Gnutella, Aug. 5, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
PGP  ✓  ✓  ✓  Boguñá et al. (2004)  url  
Gnutella, August 4 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
HepPh, 19932003  ✓  ✓  ✓  Leskovec et al. (2007a)  url  
Spanish  ✓  ✓  ✓  Milo et al. (2004)  url  
DBLP, citations  ✓  ✓  ✓  Ley (2002); Kunegis (2013)  url  
✓  ✓  ✓  Kitsak et al. (2010)  url  
Spanish  ✓  ✓  ✓  Kunegis (2013)  url  
CondMat, 19951999  ✓  ✓  ✓  Newman (2001)  url  
Astrophysics  ✓  ✓  ✓  Newman (2001)  url  
✓  ✓  ✓  Palla et al. (2007)  url  
AstroPhys, 19932003  ✓  ✓  ✓  Leskovec et al. (2007a)  url  
CondMat, 19932003  ✓  ✓  ✓  Leskovec et al. (2007a)  url  
Gnutella, Aug. 25, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
Internet  ✓  ✓  ✓    url  
Thesaurus  ✓  ✓  ✓  Kiss et al. (1973); Kunegis (2013)  url  
Cora  ✓  ✓  ✓  Šubelj and Bajec (2013); Kunegis (2013)  url  
Linux, mailing list  ✓  ✓  ✓  Kunegis (2013)  url  
AS Caida  ✓  ✓  ✓  Leskovec et al. (2005)  url  
Gnutella, Aug. 24, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
HepTh, citations  ✓  ✓  ✓  Leskovec et al. (2007a); Kunegis (2013)  url  
CondMat, 19952003  ✓  ✓  ✓  Newman (2001)  url  
Digg  ✓  ✓  ✓  De Choudhury et al. (2009); Kunegis (2013)  url  
Linux, soft.  ✓  ✓  ✓  Kunegis (2013)  url  
Enron  ✓  ✓  ✓  Leskovec et al. (2009)  url  
HepPh, citations  ✓  ✓  ✓  Leskovec et al. (2007a); Kunegis (2013)  url  
CondMat, 19952005  ✓  ✓  ✓  Newman (2001)  url  
Gnutella, Aug. 30, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
Adult IMDB  ✓  ✓  ✓  Kitsak et al. (2010)  url  
Slashdot  ✓  ✓  ✓  Gómez et al. (2008); Kunegis (2013)  url 
network  Sub.  Cri.  Sup.  Refs.  Url  
Gnutella, Aug. 31, 2002  ✓  ✓  ✓  Ripeanu et al. (2002); Leskovec et al. (2007a)  url  
✓  ✓  ✓  Viswanath et al. (2009)  url  
Epinions  ✓  ✓  ✓  Richardson et al. (2003); Kunegis (2013)  url  
Slashdot zoo  ✓  ✓  ✓  Kunegis et al. (2009); Kunegis (2013)  url  
Wikipedia, edits  ✓  ✓  ✗  Brandes and Lerner (2010); Kunegis (2013)  url  
Gowalla  ✓  ✓  ✗  Cho et al. (2011); Kunegis (2013)  url  
EU email  ✓  ✓  ✗  Leskovec et al. (2007a); Kunegis (2013)  url  
Amazon, Mar. 2, 2003  ✓  ✓  ✗  Leskovec et al. (2007b)  url  
DBLP, collaborations  ✓  ✓  ✗  Ley (2002); Kunegis (2013)  url  
Web Notre Dame  ✓  ✓  ✗  Albert et al. (1999)  url  
MathSciNet  ✓  ✓  ✗  Palla et al. (2008)  url  
CiteSeer  ✓  ✓  ✗  Bollacker et al. (1998); Kunegis (2013)  url  
Amazon, Mar. 12, 2003  ✓  ✓  ✗  Leskovec et al. (2007b)  url  
Amazon, Jun. 6, 2003  ✓  ✓  ✗  Leskovec et al. (2007b)  url  
Amazon, May 5, 2003  ✓  ✓  ✗  Leskovec et al. (2007b)  url 
References
References
 Milo et al. (2004) R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. ShenOrr, I. Ayzenshtat, M. Sheffer, and U. Alon, Science 303, 1538 (2004).
 Zachary (1977) W. W. Zachary, Journal of anthropological research , 452 (1977).
 Lusseau et al. (2003) D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral Ecology and Sociobiology 54, 396 (2003).
 Knuth et al. (1993) D. E. Knuth, D. E. Knuth, and D. E. Knuth, The Stanford GraphBase: a platform for combinatorial computing, Vol. 37 (AddisonWesley Reading, 1993).
 Mangan and Alon (2003) S. Mangan and U. Alon, Proceedings of the National Academy of Sciences 100, 11980 (2003).
 Adamic and Glance (2005) L. A. Adamic and N. Glance, in Proceedings of the 3rd international workshop on Link discovery (ACM, 2005) pp. 36–43.
 Newman (2006) M. E. Newman, Physical review E 74, 036104 (2006).
 Girvan and Newman (2002) M. Girvan and M. E. Newman, Proceedings of the National Academy of Sciences 99, 7821 (2002).
 Fournet and Barrat (2014) J. Fournet and A. Barrat, PloS one 9, e107878 (2014).
 Ulanowicz et al. (1998) R. Ulanowicz, C. Bondavalli, and M. Egnotovich, Annual Report to the United States Geological Service Biological Resources Division Ref. No.[UMCES] CBL , 98 (1998).
 Kunegis (2013) J. Kunegis, in Proc. Int. Conf. on World Wide Web Companion (2013) pp. 1343–1350.
 Michalski et al. (2011) R. Michalski, S. Palus, and P. Kazienko, in Lecture Notes in Business Information Processing, Vol. 87 (Springer Berlin Heidelberg, 2011) pp. 197–206.
 Martinez (1991) N. D. Martinez, Ecological Monographs , 367 (1991).
 Gleiser and Danon (2003) P. M. Gleiser and L. Danon, Advances in complex systems 6, 565 (2003).
 Watts and Strogatz (1998) D. J. Watts and S. H. Strogatz, nature 393, 440 (1998).
 Isella et al. (2011) L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J.F. Pinton, and W. Van den Broeck, Journal of theoretical biology 271, 166 (2011).
 Colizza et al. (2007) V. Colizza, R. PastorSatorras, and A. Vespignani, Nature Physics 3, 276 (2007).
 Milo et al. (2002) R. Milo, S. ShenOrr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Science 298, 824 (2002).
 Guimera et al. (2003) R. Guimera, L. Danon, A. DiazGuilera, F. Giralt, and A. Arenas, Physical review E 68, 065103 (2003).
 Jeong et al. (2001) H. Jeong, S. P. Mason, A.L. Barabási, and Z. N. Oltvai, Nature 411, 41 (2001).
 Opsahl and Panzarasa (2009) T. Opsahl and P. Panzarasa, Social networks 31, 155 (2009).
 Bu et al. (2003) D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, et al., Nucleic acids research 31, 2443 (2003).
 Opsahl et al. (2010) T. Opsahl, F. Agneessens, and J. Skvoretz, Social Networks 32, 245 (2010).
 Leskovec et al. (2007a) J. Leskovec, J. Kleinberg, and C. Faloutsos, ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 2 (2007a).
 Radicchi (2011) F. Radicchi, PloS one 6, e17249 (2011).
 Newman (2001) M. E. Newman, Proceedings of the National Academy of Sciences 98, 404 (2001).
 JoshiTope et al. (2005) G. JoshiTope, M. Gillespie, I. Vastrik, P. D’Eustachio, E. Schmidt, B. de Bono, B. Jassal, G. Gopinath, G. Wu, L. Matthews, et al., Nucleic acids research 33, D428 (2005).
 Šubelj and Bajec (2012) L. Šubelj and M. Bajec, in Proceedings of the First International Workshop on Software Mining (ACM, 2012) pp. 9–16.
 Ripeanu et al. (2002) M. Ripeanu, I. Foster, and A. Iamnitchi, arXiv preprint cs/0209028 (2002).
 Leskovec et al. (2005) J. Leskovec, J. Kleinberg, and C. Faloutsos, in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (ACM, 2005) pp. 177–187.
 Boguñá et al. (2004) M. Boguñá, R. PastorSatorras, A. DíazGuilera, and A. Arenas, Physical Review E 70, 056122 (2004).
 Ley (2002) M. Ley, in String Processing and Information Retrieval (Springer, 2002) pp. 1–10.
 Kitsak et al. (2010) M. Kitsak, L. Gallos, S. Havlin, L. Liljeros, F. an d Muchnik, H. Stanley, and H. Makse, Nature Physics 6, 888 (2010).
 Palla et al. (2007) G. Palla, I. J. Farkas, P. Pollner, I. Derenyi, and T. Vicsek, New Journal of Physics 9, 186 (2007).
 Kiss et al. (1973) G. R. Kiss, C. Armstrong, R. Milroy, and J. Piper, The computer and literary studies , 153 (1973).
 Šubelj and Bajec (2013) L. Šubelj and M. Bajec, in Proceedings of the 22nd international conference on World Wide Web companion (International World Wide Web Conferences Steering Committee, 2013) pp. 527–530.
 De Choudhury et al. (2009) M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann, in Computational Science and Engineering, 2009. CSE’09. International Conference on, Vol. 4 (IEEE, 2009) pp. 151–158.
 Leskovec et al. (2009) J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, Internet Mathematics 6, 29 (2009).
 Gómez et al. (2008) V. Gómez, A. Kaltenbrunner, and V. López, in Proceedings of the 17th international conference on World Wide Web (ACM, 2008) pp. 645–654.
 Viswanath et al. (2009) B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, in Proceedings of the 2nd ACM workshop on Online social networks (ACM, 2009) pp. 37–42.
 Richardson et al. (2003) M. Richardson, R. Agrawal, and P. Domingos, in The Semantic WebISWC 2003 (Springer, 2003) pp. 351–368.
 Kunegis et al. (2009) J. Kunegis, A. Lommatzsch, and C. Bauckhage, in Proceedings of the 18th international conference on World wide web (ACM, 2009) pp. 741–750.
 Brandes and Lerner (2010) U. Brandes and J. Lerner, Journal of classification 27, 279 (2010).
 Cho et al. (2011) E. Cho, S. A. Myers, and J. Leskovec, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM, 2011) pp. 1082–1090.
 Leskovec et al. (2007b) J. Leskovec, L. A. Adamic, and B. A. Huberman, ACM Transactions on the Web (TWEB) 1, 5 (2007b).
 Albert et al. (1999) R. Albert, H. Jeong, and A.L. Barabási, Nature 401, 130 (1999).
 Palla et al. (2008) G. Palla, I. J. Farkas, P. Pollner, I. Derényi, and T. Vicsek, New Journal of Physics 10, 123026 (2008).
 Bollacker et al. (1998) K. D. Bollacker, S. Lawrence, and C. L. Giles, in Proceedings of the second international conference on Autonomous agents (ACM, 1998) pp. 116–123.
 Barabási and Albert (1999) A.L. Barabási and R. Albert, science 286, 509 (1999).