The role of centrality for the identification of influential spreaders in complex networks
The identification of the most influential spreaders in networks is important to control and understand the spreading capabilities of the system as well as to ensure an efficient information diffusion such as in rumor-like dynamics. Recent works have suggested that the identification of influential spreaders is not independent of the dynamics being studied. For instance, the key disease spreaders might not necessarily be so when it comes to analyze social contagion or rumor propagation. Additionally, it has been shown that different metrics (degree, coreness, etc) might identify different influential nodes even for the same dynamical processes with diverse degree of accuracy. In this paper, we investigate how nine centrality measures correlate with the disease and rumor spreading capabilities of the nodes that made up different synthetic and real-world (both spatial and non-spatial) networks. We also propose a generalization of the random walk accessibility as a new centrality measure and derive analytical expressions for the latter measure for simple network configurations. Our results show that for non-spatial networks, the -core and degree centralities are most correlated to epidemic spreading, whereas the average neighborhood degree, the closeness centrality and accessibility are most related to rumor dynamics. On the contrary, for spatial networks, the accessibility measure outperforms the rest of centrality metrics in almost all cases regardless of the kind of dynamics considered. Therefore, an important consequence of our analysis is that previous studies performed in synthetic random networks cannot be generalized to the case of spatial networks.
Spreading phenomena are ubiquitous in Nature Hethcote (2000); da F. Costa et al. (2011). Rumors and viruses spread from person to person, worms contaminate computers worldwide and innovations are diffused from place to place. The advent of new technology and modern transportation means has led to radical changes of classical transmission channels, making in much cases natural and manmade systems more prone to contagion processes. On the other hand, new tools have been developed to study such phenomena, for instance, by explicitly dealing with the topology and dynamics of so-called complex networks, which are nothing else but the backbone on top of which information and diseases propagate Barrat et al. (2008); Newman (2010).
Networks are made up by nodes, that represent the elements of the system, and edges, which define the possible interaction patterns among nodes Costa et al. (2007); Boccaletti et al. (2006). A large body of recent studies have verified that the way in which such nodes are organized plays a fundamental role in spreading processes Newman (2002a); Boccaletti et al. (2006). For instance, Pastor-Satorras and Vespignani showed that a disease outbreak takes place when the spreading rate, , is larger than the epidemic threshold Pastor-Satorras and Vespignani (2001), i.e., if , where is the -th moment of the degree distribution. Therefore, most scale-free networks (those for which the degree distribution follows a power law with ) are particularly prone to the spreading of diseases, since when . Additional network properties, such as assortativity Newman (2002b); Boguná et al. (2003) or modular organization Liu and Hu (2005) also play a fundamental role in disease spreading.
One of the most interesting challenges in network science is to understand the relation between the structure of the system and its emergent dynamical properties. This is why finding determinant structural factors is important, as a better knowledge would allow controlling the function of the system, which for the scope of this paper, means determining what network properties are more closely related to information and viruses diffusion. In particular, we will focus our attention in one topological feature: centrality. Since the most central nodes can diffuse their influence to the whole network faster than the rest of nodes, it is expected that such agents are the most influential spreaders. Recently, Kitsak et al. Kitsak et al. (2010) found evidences that confirmed this hypothesis for the case of epidemic outbreaks. The authors verified that the most influential spreaders can be forecasted from the -shell decomposition analysis. Such agents are located within the core of the network and do not need to be the most connected. Silva et al. da Silva et al. (2012) explored the correlations between heterogeneous spread and central attributes of the vertices that were first seeded with a disease, finding that degree and accessibility are measures mostly related to the efficient spread of the disease. On the other hand, Borge-Holthoefer and Moreno Borge-Holthoefer and Moreno (2012) showed that, for standard rumor models, it is not possible to identify the most influential spreaders using the same metrics.
Although many works have provided evidences for the presence of influential spreaders in epidemic spreading, the conclusions are not general. Indeed, there is no general consensus on the definition of network “centrality”, because there are many measures able to quantify the centrality of a node, each one considering specific concepts Newman (2010). For instance, the betweenness and closeness centrality take into account only the shortest distance between pairs of nodes Boccaletti et al. (2006); Newman (2010), ignoring alternative paths. At the same time, the -core decomposition may eliminate important sets of vertices, which can be connected to the main core through nodes with a small number of links Seidman (1983). Thus, to overcome such a lack of a universal definition of node centrality, it is necessary to look at additional measures. In this paper, we study the problem of the identification of influential spreaders using eight centrality measures in order to complement previous studies Kitsak et al. (2010); Borge-Holthoefer and Moreno (2012). Moreover, we introduce a new metric, the generalized accessibility, as a centrality measure that is based on random walks. We observe that in social and scale-free networks, the accessibility, average neighborhood degree and closeness centrality are the measures most related to rumor spreading. Other measures, such as the -core and degree correlate well only with epidemic spreading in social networks, as found previously in Kitsak et al. (2010); Borge-Holthoefer and Moreno (2012).
Another important result is related to the kind of networks studied in this work. Despite the fact that many diffusion processes take place on spatially embedded networks Barthélemy (2011), previous studies have disregarded spatial networks Kitsak et al. (2010); Borge-Holthoefer and Moreno (2012); da Silva et al. (2012). These networks have several topological constraints that greatly influence the way connections are established, and thus, one expect an impact in network centrality metrics and consequently on the spreading dynamics. In this paper, we intend to fill this gap by exploring the role of centrality measures in predicting the spreading capabilities of nodes of spatial networks. Specifically, we consider both real networks (road networks of four countries) and artificial spatial networks with exponential and power-law degree distributions and find that correlations between spreading capacity and centrality measures is spatial networks differs significantly from those observed in non-spatial networks.
This paper is organized as follows. Sec. II presents the centrality measures considered in our investigations. The generalized random walk accessibility is introduced in Sec. III. The analytical expressions for complete graphs, stars and rings are also evaluated in this section. Concepts of epidemic and rumor spreading are discussed in Sec. IV and the databases are described in Sec. V. The analysis of spatial networks is outlined in Sec.VI, where it is shown that the accessibility is strongly correlated to the node capacity for rumor and epidemic spreading. Sec. VII presents the analysis of non-spatial networks, which complements the investigations in Kitsak et al. (2010); Borge-Holthoefer and Moreno (2012). Our final conclusions are developed in Sec. VIII.
Ii Centrality measures
As mentioned before, one can in principle consider several metrics to define the centrality of a node Newman (2010). For completeness, here we provide the basic definitions of those that will be used in the rest of the paper. For more details, we refer the reader to the literature cited.
Basic centrality measures.
The most basic definition of centrality takes into account the number of connections of a node , called node degree, . In this case, the most central node has the largest number of connections. Alternatively, the centrality of a vertex can be defined in terms of the degree of its second neighbors, since strongly connected vertices can surround a central node. In this case, the average degree of the nearest neighbors of is defined as
where is the set of nodes connected to . It has been verified that the average neighborhood degree is related to epidemic spreading in networks Barrat et al. (2008).
It considers that the centrality of each node is the sum of the centrality values of the nodes that it is connected to. The eigenvector centrality is defined by the eigenvector associated to the largest eigenvalue of the adjacency matrix . Formally,
or in the matrix form , where x is the right leading eigenvector Newman (2010) and is the largest eigenvalue.
Distance-based centrality metrics.
Centrality can also be established in terms of the shortest distances between pairs of nodes, since the more central a node is, the lower its total distance to all other nodes is. The closeness centrality of is defined as Newman (2010)
where is the shortest distance between nodes and , and is the number of nodes in the network.
Alternatively, the effective load of a node can also be considered as a centrality measure. Betweenness centrality quantifies the load as the number of times a node acts as a bridge along the shortest path between two other nodes Girvan and Newman (2002). Thus, for a node ,
where is the number of shortest paths connecting vertices and that pass through vertex and is the total number of shortest paths between and . The sum is over all pairs of distinct vertices. In this case, a central node should be crossed by many paths and shows the highest value of .
The clustering coefficient quantifies the occurrence of triangles in the networks. It is defined as Boccaletti et al. (2006)
where is the number of triangles involving the node and is the number of triples centered around . can be also understood as a centrality measure in the sense that if two nodes are connected only via the node , this node can control the information flow Newman (2010). Thus, the clustering coefficient could be thought off as a local version of the betweenness centrality. Note that takes smaller values for more central nodes, in opposite to the other centrality measures.
The -shell decomposition partitions a network into sub-structures and assigns an integer index to each node , , in such a way that if belongs to the -core, but it is not in the -core Seidman (1983). Nodes with low values of are located at the periphery of the network. This measure was adopted recently to detect influential spreaders in networks Kitsak et al. (2010). The most central nodes should have the highest values of coreness, whereas high-degree nodes localized in the periphery of networks should display small values of coreness Kitsak et al. (2010). Therefore, only hubs at the main core of networks present the highest values of .
Random-walk based centrality measures.
The number of visits that a given node receives when an agent travels through the network without a preferential route can also be taken into account to quantify the node centrality. In this case, a possible measure is the Google PageRank Brin and Page (1998). PageRank is calculated as
is the Google matrix, i.e.,
and is the binary vector called dangling node vector ( is equal to one if is a dangling node and 0 otherwise), is a vector of ones of length and is the transition probability matrix of the respective network (, where are the elements of the adjacency matrix). The original version of the algorithm considers Brin and Page (1998). The PageRank of a node , , is given by the -th entry of the dominant eigenvector of , given that . can be understood as the probability of arriving at the node after a large number of steps following a random walk navigation through the network.
Iii Generalized random walk accessibility
The accessibility is related to the diversity of access of individual nodes through random walks Travençolo and da F. Costa (2008). This measure has been considered for identification of the border of complex networks Travençolo et al. (2009). Let be the probability of reaching node by performing random walks of length departing from . The accessibility of the node for a given distance is defined by the exponential of the Shannon entropy Travençolo and da F. Costa (2008), i.e.,
where . The maximum value corresponds to the case in which all nodes are reached with the same probability . Note that this metric was defined in a multilevel fashion, depending on the parameter that defines the scale of the dynamics Travençolo and da F. Costa (2008); Travençolo et al. (2009). In addition, though here we will be constrained to random walks, virtually any other type of dynamics yielding transition probabilities between adjacent nodes can be considered in the accessibility, which makes this measurement adaptable to the dynamics of each problem being studied.
In order to generalize the accessibility, here we introduce a new version of this metric, which is based on the matrix exponential operation Bhatia (1997). This matrix enables the calculus of the probability of transition considering walks of all lengths between any pair of vertices. In this way, if is the transition matrix, the exponential of is defined as
The matrix W is based on a modified random walk, which penalizes longer paths. To construct such stochastic process we consider an usual random walk , where represents the node visited by the agent at time . We take a collection of independent and identically distributed uniform random variables in the interval , i.e. , which represents a kind of “fitness” associated to each step of the walk. Also, we assume independence between the collection of uniform random variables and the random walk. This modified random walk, which we call accessibility random walk (ARW) in the rest of the paper, considers walks through the network such that all associated fitnesses along a trajectory are in ascending order. We say that node is visited by the ARW, at time , if and . We denote by the new process and note that implies , but the opposite is not necessarily true. A quantity of interest is the number of visits that a given node receives when an agent travels through the network according to the ARW. This quantity can be written as , where is the indicator function of the event . We are interested in the mean of this value, by assuming that the agent starts from node , i.e. . In order to compute this value we observe that the term of the sum is the probabilitiy which, by our definition, is equal to . This probability is exactly , where is the probability of transition from to through walks of length . Therefore, the matrix W considered in Eq. (9) is a matrix of mean values associated to the ARW. The element provides the mean number of visits that node receives when the agent starts at node following and follows ARW.
The probability of transition between any pair of vertices through ARW is given by
Note that the matrix W weights all walks by the inverse of the factorial of their lengths. Therefore, this definition penalizes longer walks, i.e., the shortest walks receive more weight than the longest ones. We define the generalized expression for the accessibility as
which we call generalized random walk accessibility. Figure 1 illustrates this measure.
We note that the exponential matrix is also considered in the definition of the communicability Estrada and Hatano (2008); Estrada et al. (2011). The difference is that the accessibility is based on the concept of diversity Hill (1973); Jost (2006) whereas the communicability is associated to the communication between any pair of vertices Estrada et al. (2011). Moreover, the former is related to the probability transition matrix, whereas the latter on the adjacency matrix. In this way, there is no trivial relation between these two measures in irregular graphs.
Let us provide in what follows some exact expression for the metric just introduced. Although the graphs considered below are not representatives of real world networks, we believe that the analysis helps understanding what can be learned from the new metric. In addition, there are structures that already capture some important features of real networks, such as the star graph, which is an extreme example of an heterogenous configuration but that have provided insightful hints about the dynamics under study in other cases Gómez-Gardeñes et al. (2011); Peron and Rodrigues (2012).
iii.1 Accessibility in star graphs
and between the leaves and central node ,
The probability of transition between leaves and is given by
and for ,
Therefore, the general form of the exponential matrix, considering the node number one as the hub of the star graph, is given as
In this way, since , the accessibility of the hub is
where and . For any leaf connected with ,
where , .
We show in Fig. 2 the results obtained for the accessibility on top of different networks and configurations. As it can be seen, Eq. 18 can be considered to be a good predictor of the accessibility of the hubs in scale-free networks. However, as expected from the fact that the star graph does not capture any topological aspect of homogeneous networks, the star-graph approximation is not accurate for random Erdös-Rényi networks.
iii.1.1 Eigendecomposition analysis
The exact values of accessibility in star graphs can also be calculated by the eigen-decomposition analysis of . The exponential matrix, Eq. 9, can be obtained as
where is a matrix whose columns are the eigenvectors of the matrix W and is a matrix whose diagonal presents the exponential of each eigenvalue of ,
where is the transition matrix, is its eigenvalue and is the associated eigenvector.
In this way, for the star graph, the transition matrix is sparse and its characteristic polynomial, , is calculated by the Laplace rule as
whose solutions are , and , . Therefore, using the definition of an eigenvalue and eigenvector problem, it is possible to obtain the following equations for the eigenvectors. For ,
where is the -th element of the eigenvector associated with the eigenvalue . For ,
finally, for where , which has multiplicity ,
which yields the matrix
whose inverse is
Note that we used non-unit vectors to construct the matrices . This is not necessary since is also multiplied by and the non-unit norms are compensated. Substituting in matrices 26 and 27 in Eq. 20, after some algebra, we recover Eq. 17. The accessibilty of hubs and leafes are calculated by Eqs. 18 and 19, respectively.
iii.2 Accessibility in ring graphs
The generalized random walk accessibility can also be calculated exactly in rings, that are a special case of K-regular graphs, where . The probability transition matrix has the form
where and the associated elements of the eigenvector can be expressed as
where is just a normalization factor. This set of eigenvectors diagonalizes the matrix as , where is the diagonal matrix with the eigenvalues of (Eq. 29), is the matrix whose columns are the eigenvector of and is the conjugate transpose of . We can write the closed expression for P as
which is a closed form for the evaluation of P in ring graphs. Furthermore we can use some graph spectra properties to separate the first eigenvalue from the summation
Figure 2 shows the comparison between network models and the analytical solutions for the regular structures. Note that the solution for the ring does not depend on the network size. The results for the line graph, which again do not depend on the network size, are also presented in this figure. We also remark that the extremes of the line present the lowest values of accessibilility, whereas the nodes in the center have the highest values.
iii.3 Accessibility in complete graphs
The generalized random walk accessibility can also be calculated exactly for a complete graph, in which every pair of nodes is connected without self connections. In this way, the probability of transition between any pair of nodes is and the exponential matrix (see Eq. 9) is given by
The main diagonal of P, which considers the paths starting and ending at the same node, is expressed as
Therefore, the general form of the exponential matrix is given as
The accessibility of each node is
In the complete graph all nodes present the same value of accessibility and, since a random walker needs just one step to reach any other node, this value is the upper bound of the maximum value of accessibility for a network with nodes. Figure 2 shows the variation of the accessibility in complete graphs as a function of the network size.
iii.3.1 Eigendecomposition analysis
The exact values of accessibility in complete graphs can also be obtained by the eigen-decomposition analysis, the graph spectra and its eigenvectors, as performed for the star graph. In this way, we get the following system (from Eq. 21)
where is the -th element of the eigenvector associated with the eigenvalue .
The eigenvalues of for a complete graph is the spectrum of the adjacency matrix multiplied by , i.e., , Mieghem (2011). Therefore, for we have that
The solution is . On the other hand, for , where ,
The respective solution is . Note that both solutions are not unique, whereas the Eq. 20 has a unique solution. Without loss generality, we assume
whose inverse is
As a practical comment about the matrix exponential, it is important to mention that it should be computed by the Padé approximation Golub and Loan (1996); Higham (2005) and not by the truncated Taylor series or by Eqs. 20. The former method is more precise and has a lower computational cost. However, Eq. 20 is important for theoretical analysis, since it transforms the calculus of accessibility into a eigenvector and eigenvalue problem, which is well studied in the literature.
Iv Epidemic and rumor spreading
Many mathematical models have been developed to study epidemic spreading in complex networks Keeling and Eames (2005); Keeling and Rohani (2008). A particularly important model is the susceptible-infectious-recovered (SIR), in which each node can be in one of three states: (i) susceptible, (ii) infected, or (iii) recovered. Susceptible nodes are healthy and can catch the disease, whereas infected individuals are the ones actually transmitting the disease. Finally, individuals in the recovered state are immune to the disease and, therefore, play no role on the dynamics. The transitions between the first two states, i.e., from healthy to infected subjects, occurs via contacts between individuals. At each time step, the infectious nodes spread the disease to their susceptible neighbors with probability and an infected node becomes recovered with probability . This is a spontaneous process and does not depend on any contact. The epidemic spreading process terminates when there is no infected node in the network and the disease cannot propagate anymore.
Rumor dynamics are in some aspects similar to epidemic spreading Daley et al. (2001); Castellano et al. (2009). Rumor diffusion is simulated considering that nodes are spreaders, ignorants, or stiflers. Spreaders are those individuals that know the rumor and want to spread it to ignorants, whereas stiflers are those that know the rumor but are not interested on the information anymore. The main difference between rumor and epidemic spreading is that spreader turns into a stifler by a process that involves contacts, whereas infected nodes become recovered by a spontaneous process. The fraction of ignorants (), spreaders (), and stiflers () at time are defined such that . The process starts with one spreader and ignorants, where is the number of nodes in the network. At each time step, spreaders try out to spread the rumor to their ignorant neighbors at a rate . On the other hand, if a spreader contacts another spreader or a stifler, such spreader becomes a stifler at rate . This process corresponds to the model proposed by Maki and Thompson (MT model) Castellano et al. (2009). In the version proposed by Daley and Kendall (DK model), two interacting spreaders become stiflers at rate Castellano et al. (2009). Moreover, Monte Carlo simulations of a rumor spreading dynamics can be performed in two different ways. In a contact process (CP), only one random neighbor of a spreader is contacted at each time step. In the truncated process (TP) the neighbors of a spreader are contacted in a random way until all of them are contacted or the spreader turns into a stifler. The rumor dynamics terminates when there is no spreader in the network and the rumor cannot propagate anymore.
Here, we consider that the spreading dynamics begin in a single seed node, whereas the remaining nodes are in the susceptible (or ignorant) state. In the SIR model, the spreading potential of each vertex is quantified in terms of the total prevalence of the epidemic process. The spreading capacity of is the fraction of recovered vertices at the end of the process given that the dynamics started in , i.e., . Similarly, the spreading capacity of a node in rumor dynamics is quantified by the percentage of stiflers at the end of the process given that the spreading started at , i.e., .
We performed numerical simulations of epidemic and rumor spreading processes on top of real-world and artificial networks. Table 1 presents some network properties of the road maps and networks generated by the spatial models.
v.1 Network models
Barabási and Albert proposed a model which considers growth and preferential attachment rules Albert (1999). In this case, a network is generated starting with a set of connected vertices. After that, new vertices with edges are included in the network. The probability of the new vertex to connect with a vertex in the network is proportional to the number of connections of , i.e.,
The most connected vertices have greater probability of receiving new vertices. In this way, networks generated by this model present a power-law degree distribution, , where in the thermodynamic limit () Albert (1999), being the number of nodes.
We also take into account two spatial models. The model proposed by Waxman Waxman (1988) considers that nodes are uniformly distributed into a square of unitary area and each pair of nodes is connected according to a probability, that depends on their distances, i.e.,
where is a parameter that controls the average degree and is the Euclidean distance between nodes and . Such model generates networks with an exponential degree distribution, which means that the probability of a node having a degree different than decays exponentially.
The model introduced by Barthélemy Barthélemy (2003), on the other hand, produces scale-free networks embedded in space. Considering a regular dimensional lattice with length , the algorithm has three main steps. Initially, initial active nodes are selected at random. Next, an inactive node is randomly selected, and connected to an active node with probability
where is the number of connections of node , is a finite scale parameter and is the Euclidean distance between nodes and . Finally, the node becomes active and the second and third steps are repeated until all nodes are active. For each node, the second and third steps are repeated times in order to set the average connectivity as Barthélemy (2003). The parameter controls the clustering coefficient Watts (1999) and assortativity Newman (2002b) of the network. Here we considered , and . These values are similar to those used in the original paper Barthélemy (2003).
v.2 Road networks
The road networks have been extracted from the maps available as portable format (pdf) at the United Nations website 111http://www.un.org. Initially, the maps have been pre-processed in order to eliminate irrelevant information and keep only the main roads. After that, the skeletonization procedure has extracted the so called skeleton of the image Costa and Cesar Jr (2000). The node identification has been performed by applying a 8-connected hit-or-miss convolution filter Dougherty (1992). Finally, a label propagation procedure has been implemented from each node. When two pair of labels and find each other, a connection is established between them. Here, we have considered the networks extracted from maps of Germany, Japan, England and United States.
v.3 Social networks
The social networks considered here are: (i) the email contact network obtained from messages exchanged between users within the Universitat Rovira i Virgili Guimera et al. (2003); (ii) the political blogs network, composed of hyperlinks between web blogs obtained over the period of two months preceding the U.S. Presidential Election of 2004 Adamic and Glance (2005) ; (iii) the advogato network, which is an online community dedicated to free software development launched in 1999 Massa et al. (2009); kon (2014a) and (iv) the Google+ network, which is composed by users connected according to their circles of friendships McAuley and Leskovec (2012); kon (2014b). Avogato, political blogs and Google+ networks are directed networks. Morevoer, advogato is also a weighted network. However, here we consider only the unweighted and undirected versions of these networks. In addition, our analysis uses only the nodes in the giant component.
Vi Spatial networks
As outlined in Section II, we have studied different centrality metrics: the degree (), clustering coefficient (), betweenness centrality (), average neighborhood degree (), PageRank (), eigenvector centrality (), k-core index (), closeness centrality () and accessibility (). We have considered only the unweighted and undirected versions of these measures. Table 1 presents the average values obtained for the road maps and networks generated by the Waxman and scale-free spatial models. Spatial networks are sparse, have large characteristic path lengths and non-zero clustering coefficients. In addition, scale-free spatial networks have the smallest average geodesic distance due to the presence of hubs.
We have conducted numerical simulations of the SIR (epidemic) and MT (rumor) models to inspect correlations between nodes’ centrality (as given by the different metrics above) and the final dynamical outcome of the system, the latter being measured by the density of removed and stiflers after the dynamics has come to an end, respectively. Such correlations have been determined by the Spearman rank correlation coefficient, which is defined as the Pearson correlation coefficient between the ranked variables Wolfe and Hollander (1973). The reason of our choice is that the Spearman coefficient quantifies monotonic relationships, whereas the Pearson correlation measures linear relationships. As shown below, these correlations can be monotonic, but not necessarily linear.
Figures 3 and 4 show the scatter plots for the epidemic and rumor dynamics in the US road network, respectively. The strongest correlation correspond to the degree centrality, while for other metrics, correlations are weak and positive, though not zero. On the contrary, the clustering coefficient leads to a negative correlation because the more central a node is, the smaller its clustering coefficient is. On the other hand, Figures 6 and 6 show that the correlations between the generalized random walk accessibility and the potential of rumor and epidemic spreading processes are almost linear and positive for all road networks analyzed.
Furthermore, Table 2 shows that for both spreading processes, the highest correlations between a centrality measure and the impact of the disease or rumor correspond to the case of the generalized accessibility centrality, which values often higher than 0.7. Interesting, the -core centrality yields small correlation values, contrary to what has been observed in Kitsak et al. (2010), which considered networks not embedded in space. However, this result agrees with Borge-Holthoefer and Moreno (2012), in the case of rumor dynamics. The node degree is highly correlated with the final fraction of recovered nodes, but less if we look the results corresponding to the final fraction of stiflers, mainly for the case of a MT model simulated using a contact process setting, again as found in Borge-Holthoefer and Moreno (2012). Moreover, PageRank, closeness and betweenness centrality metrics do not show significant correlations with disease and rumor spreading capabilities except when the parameter in rumor models is small, in which case the closeness gives high correlation. It is also worth noticing that the eigenvector centrality shows high correlation only for the spatial scale-free network model.
vi.1 Road networks
Focusing on real networks, Figure 7 shows results obtained for the generalized random walk accessibility of each node for the road networks of Japan, England, United States and Germany. In Japan, the most influential spreaders are the cities of Nagoya, Osaka and Hiroshima. Tokyo is highly connected, but does not have the same spreading capability of these cities, since it is a peripheral hub. London, Liverpool and Manchester have the highest values of accessibility in England, while in the US, the cities with the highest accessibility are New York, Houston, Dallas and Chicago interesting enough, these cities are also air transportation hubs. Finally, Berlin, München and Düsseldorf have the highest accessibility in Germany. Note that nodes at the border of the countries present the smallest values of accessibility. Therefore, this measure can be considered for identification of border of networks, as previously pointed out for the original definition of accessibility in Travençolo et al. (2009).
Figure 8 presents the probability distribution of the accessibility. For all cases, the distribution is asymmetric, presenting a long tail for higher values of accessibility, and centered at the same value. It is interesting to note that Germany and England has the smallest variation in the accessibility, whereas Japan has the highest one. This fact can be related to the rough of Japan, which influence directly how highways are distributed.
Vii Non-spatial networks
We have also studied what happens for non-spatial networks using the same set of measurements considered in Secs. VI and II. Table 1 presents the average values of these measures calculated in the social networks and in synthetic BA networks. Table 3 presents the Spearman correlation coefficient calculated between the centrality metrics and final fraction of stiflers or recovered nodes in the epidemic and rumor processes, respectively. The results agree with the analysis of epidemic spreading presented in Kitsak et al. (2010) and with the study of rumor diffusion in Borge-Holthoefer and Moreno (2012). In the case of the SIR model, the -core and degree centralities are the most correlated with the final fraction of recovered nodes. Thus, the main hubs on the social networks are located in the center of the network, because they have the highest coreness, suggesting that such networks tend not to present peripheral hubs. Moreover, correlations are stronger when the parameter is decreased. On the contrary, the random walk accessibility yields the highest Spearman correlation for BA networks and for political blogs (for ), although the correlation values are close to those obtained for the degree and -core. All the remainder metrics exhibit smaller correlation coefficients than the -core, and .
With respect to the rumor dynamics, the CP and TP cases present different results. In the first case, the eigenvector and accessibility centralities are strongly correlated with the final fraction of stiflers, whereas, for the second case, closeness centrality and average neighborhood degree show the highest correlations. Considering the TP case with a stifling rate , if the spreading rate is high, the average neighborhood degree is more related to the dynamics. However, for lower spreading rates the distance from one node to the rest of the network is more critical. This property is evinced in Table 3. Note that presents higher correlations for higher spreading rates, whereas the closeness centrality is more correlated when spreading rates are smaller. Such analysis suggests that shortest paths get more important for information propagation proportionally to the inverse of the spreading rate. Furthermore, the -core and degree centralities have not been found to exhibit strong correlations with the final fraction of stiflers, supporting the results in Borge-Holthoefer and Moreno (2012). Finally, we note that at variance with previous cases, for the rumor dynamics on non-spatial networks, there is no single metric that has yielded the highest correlations for all the networks analyzed. In particular, the accessibility centrality does not seem to be in this case as distinct as before, likely because, as seen in Figure 9, the distributions of accessibility in non spatial networks are asymmetric, with different mean values and characterized by a long tail distribution.
In this paper we have studied the relation between the centrality of a node and the outcome of epidemic and rumor processes initiated in that node by means of extensive numerical simulations on top of several complex networks. We have considered eight network centrality metrics and two different kinds of networks: spatial and non-spatial ones. Networks generated by the Barabási-Albert, Waxman and scale-free spatial models have also been considered. We have proposed a generalization of the accessibility measure introduced in Travençolo and da F. Costa (2008), which allows the quantification of the potential of each node in accessing in a balanced and homogeneous manner other nodes. Such generalization takes into account walks of all lengths weighted by the inverse of the factorial of their lengths.
Our results have shown that the generalised accessibility is the best metric to measure a node’s spreading capacity in spatial networks. On the contrary, in non-spatial networks, the best correlations between a centrality metric and the dynamical outcome depends on the process. Thus, the degree and coreness (as given by the -core) are the ones more suited when it comes to analyze epidemic spreading, confirming the results in Kitsak et al. (2010). However, these measures are not the best when a rumor model is considered. Indeed, for the latter case, the average neighborhood degree, the closeness centrality and accessibility gives higher correlations.
We verified that the generalised accessibility is more related to spreading processes in spatial networks than in non-spatial networks. Indeed, Table 3 shows that this metric is the structural property that exhibits the highest correlation in most of the cases when the underlying network is spatial. Figs. 10 and 11 show that the relationship between the accessibility and centrality measures are almost linear in spatial networks, whereas in non-spatial networks, such relationship is also almost linear, but only for values below a given threshold. Beyond that value, the fraction of stiflers and recovered nodes reaches a plateau, which is the maximum value of the dynamic measure in the networks. Such plateau reduces the Spearman correlation between the accessibility and the fraction of stiflers, since the relationship between these structural and dynamical measures is better defined for low values of accessibility. Therefore, due the higher distances in spatial networks, the value of accessibility does not saturate (i.e., there is no plateau), resulting in higher correlations.
The previous conclusions can be understood by looking with more care to the meaning of the new metric here discussed. The definition of the accessibility in terms of random walks is strictly related to the spreading processes Pinto et al. (2012) and it is defined in terms of the diversity index of order one Hill (1973). Thus, the higher the number of neighbors that a node can access with similar probability, the higher the expected number of infected nodes. In this way, the accessibility quantifies how many nodes can be effectively accessed during the spreading process. As reported in Viana et al. (2012) this quantity is maximum whenever the exploration time is minimum. Thus, nodes presenting higher values of accessibility propagate viruses or rumors to the whole network faster than the nodes with smaller values, which results in a higher fraction of infected nodes before they become recovered. In summary, nodes with higher accessibility values should be the most influential spreaders.
The analysis presented here can be extended by considering other definitions of the accessibility in terms of other diversity indices Jost (2006). The role of the generalized random walk accessibility in other types of dynamical process, such as social dynamic models Castellano et al. (2009) and synchronization are also possible further researches. Ultimately, one important conclusion of our study, beyond the fact that the new metric appears to be the best way to detect influential spreaders in spatial networks, is that previous claims about whether a class of nodes are influential depends on both the metric used and most importantly, on the kind of network under study.
FAR acknowledge CNPq (grant 305940/2010-4), Fapesp (grant 2011/50761-2 and 2013/26416-9) and NAP eScience - PRP - USP for financial support. LFC would like to acknowledge CNPq and Fapesp for the financial support. PMR is supported by Fapesp (grant 2013/03898-8) and CNPq (grant 479313/2012-1). GFA acknowledges Fapesp and ALB acknowledges CAPES for sponsorship provided. YM is partially supported by the EC FET-Proactive Project MULTIPLEX (grant 317532).
- Hethcote (2000) H. W. Hethcote, SIAM Review 42, 599 (2000).
- da F. Costa et al. (2011) L. da F. Costa, O. Oliveira Jr, G. Travieso, F. A. Rodrigues, P. R. V. Boas, L. Antiqueira, M. P. Viana, and L. E. C. Rocha, Advances in Physics 60, 329 (2011).
- Barrat et al. (2008) A. Barrat, M. Barthlemy, and A. Vespignani, Dynamical processes on complex networks (Cambridge University Press New York, NY, USA, 2008).
- Newman (2010) M. Newman, Networks: an introduction (Oxford University Press, Inc., 2010).
- Costa et al. (2007) L. Costa, F. Rodrigues, G. Travieso, and P. Boas, Advances in Physics 56, 167 (2007), ISSN 0001-8732.
- Boccaletti et al. (2006) S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. Hwang, Physics Reports 424, 175 (2006).
- Newman (2002a) M. E. Newman, Physical review E 66, 016128 (2002a).
- Pastor-Satorras and Vespignani (2001) R. Pastor-Satorras and A. Vespignani, Physical Review Letters 86, 3200 (2001).
- Newman (2002b) M. E. Newman, Physical Review Letters 89, 208701 (2002b), ISSN 1079-7114.
- Boguná et al. (2003) M. Boguná, R. Pastor-Satorras, and A. Vespignani, Physical Review Letters 90, 028701 (2003).
- Liu and Hu (2005) Z. Liu and B. Hu, EPL (Europhysics Letters) 72, 315 (2005).
- Kitsak et al. (2010) M. Kitsak, L. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. Stanley, and H. Makse, Nature Physics 6, 888 (2010).
- da Silva et al. (2012) R. A. P. da Silva, M. P. Viana, and L. da Fontoura Costa, Journal of Statistical Mechanics: Theory and Experiment 2012, P07005 (2012).
- Borge-Holthoefer and Moreno (2012) J. Borge-Holthoefer and Y. Moreno, Physical Review E 85, 026116 (2012).
- Seidman (1983) S. Seidman, Social networks 5, 269 (1983).
- Barthélemy (2011) M. Barthélemy, Physics Reports 499, 1 (2011).
- Girvan and Newman (2002) M. Girvan and M. Newman, Proceedings of the National Academy of Sciences 99, 7821 (2002).
- Brin and Page (1998) S. Brin and L. Page, in Computer Networks and ISDN Systems (Elsevier Science Publishers B. V., 1998), pp. 107–117.
- Travençolo and da F. Costa (2008) B. A. N. Travençolo and L. da F. Costa, Physics Letters A 373, 89 (2008).
- Travençolo et al. (2009) B. Travençolo, M. Viana, and L. d. F. Costa, New Journal of Physics 11, 063019 (2009).
- Zachary (1977) W. Zachary, Journal of anthropological research 33, 452 (1977).
- Bhatia (1997) R. Bhatia, Matrix analysis, vol. 169 (Springer Verlag, 1997).
- Estrada and Hatano (2008) E. Estrada and N. Hatano, Physical Review E 77, 036111 (2008).
- Estrada et al. (2011) E. Estrada, N. Hatano, and M. Benzi, CoRR abs/1109.2950 (2011).
- Hill (1973) M. O. Hill, Ecology 54, 427 (1973).
- Jost (2006) L. Jost, Oikos 113, 363 (2006).
- Gómez-Gardeñes et al. (2011) J. Gómez-Gardeñes, S. Gómez, A. Arenas, and Y. Moreno, Physical Review Letters 106, 128701 (2011).
- Peron and Rodrigues (2012) T. K. D. Peron and F. A. Rodrigues, Physical Review E 86, 016102 (2012).
- LeVeque (2007) R. J. LeVeque, Finite difference methods for ordinary and partial differential equations - steady-state and time-dependent problems. (SIAM, 2007), ISBN 978-0-89871-629-0.
- Mieghem (2011) P. V. Mieghem, Graph Spectra for Complex Networks (Cambridge University Press, New York, NY, USA, 2011), ISBN 9780521194587.
- Golub and Loan (1996) G. H. Golub and C. F. V. Loan, Matrix Computations (The Johns Hopkins University Press, 1996), 3rd ed.
- Higham (2005) N. J. Higham, SIAM J. Matrix Analysis Applications 26, 1179 (2005).
- Keeling and Eames (2005) M. J. Keeling and K. T. Eames, Journal of the Royal Society Interface 2, 295 (2005).
- Keeling and Rohani (2008) M. J. Keeling and P. Rohani, Modeling infectious diseases in humans and animals (Princeton University Press, 2008).
- Daley et al. (2001) D. J. Daley, J. Gani, and J. M. Gani, Epidemic modelling: an introduction, vol. 15 (Cambridge University Press, 2001).
- Castellano et al. (2009) C. Castellano, S. Fortunato, and V. Loreto, Reviews of Modern Physics 81, 591 (2009).
- Albert (1999) R. Albert, Science 286, 509 (1999).
- Waxman (1988) B. Waxman, Selected Areas in Communications, IEEE Journal on 6, 1617 (1988).
- Barthélemy (2003) M. Barthélemy, EPL (Europhysics Letters) 63, 915 (2003).
- Watts (1999) D. Watts, Small Worlds: The Dynamics of Networks Between Order and Randomness (1999).
- Costa and Cesar Jr (2000) L. d. F. Costa and R. Cesar Jr, Shape analysis and classification: theory and practice (CRC Press, Inc., 2000).
- Dougherty (1992) E. R. Dougherty, An introduction to morphological image processing (SPIE Optical Engineering Press Bellingham, Wash., USA, 1992), ISBN 081940845.
- Guimera et al. (2003) R. Guimera, L. Danon, A. Díaz-Guilera, F. Giralt, and A. Arenas, Physical Review E 68, 065103 (2003).
- Adamic and Glance (2005) L. A. Adamic and N. Glance, in Proceedings of the 3rd international workshop on Link discovery (ACM, 2005), pp. 36–43.
- Massa et al. (2009) P. Massa, M. Salvetti, and D. Tomasoni, in Proc. Int. Conf. Dependable, Autonomic and Secure Computing (2009), pp. 658–663.
- kon (2014a) Advogato network dataset – KONECT (2014a), URL http://konect.uni-koblenz.de/networks/advogato.
- McAuley and Leskovec (2012) J. McAuley and J. Leskovec, in Advances in Neural Information Processing Systems (2012), pp. 548–556.
- kon (2014b) Google+ network dataset – KONECT (2014b), URL http://konect.uni-koblenz.de/networks/ego-gplus.
- Wolfe and Hollander (1973) D. A. Wolfe and M. Hollander, Nonparametric statistical methods (1973).
- Pinto et al. (2012) P. C. Pinto, P. Thiran, and M. Vetterli, Physical Review Letters 109, 068702 (2012).
- Viana et al. (2012) M. P. Viana, J. L. Batista, and L. d. F. Costa, Physical Review E 85, 036105 (2012).