Characteristic exponents of complex networks
We present a novel way to characterize the structure of complex networks by studying the statistical properties of the trajectories of random walks over them. We consider time series corresponding to different properties of the nodes visited by the walkers. We show that the analysis of the fluctuations of these time series allows to define a set of characteristic exponents which capture the local and global organization of a network. This approach provides a way of solving two classical problems in network science, namely the systematic classification of networks, and the identification of the salient properties of growing networks. The results contribute to the construction of a unifying framework for the investigation of the structure and dynamics of complex systems.
pacs:89.75.Hc, 05.45.-a, 05.45.Tp
Networks are the fabric of complex systems, and network science has provided a deeper understanding of the basic mechanisms underlying the functioning and the evolution of diverse biological, technological and social systems, from the human brain to the Internet Strogatz2001 (); Barabasi2002rev (); Newman2003rev (); Boccaletti2006 (); Barrat2008 (); Newman2010 (). Recently, networks have been successfully employed for the study of dynamical systems. The basic idea consists into transforming a time series into a graph, by means of state-space proximity and recurrence Zhang2006 (); Xu2008 (); Donner2010a (), transition probabilities Shirazi2009 (); campanharo11 () or visibility relationships Lacasa2008 (), and then inferring information about the time series from the analysis of the corresponding network. These studies have revealed the existence of intimate connections between the statistical properties of a time series and the topology of the network constructed from it Luque2009 (); Lacasa2010 (); Donner2011 (); Nunez2012 (). However, apart from a few exceptions tadic04 (); campanharo11 (); Shimada2012 (); Lacasa2013 (), little attention has been devoted to the dual problem, i.e. studying the structure of complex networks by analyzing time series associated to them.
In this Letter we aim at bridging this gap, by showing that a standard analysis of the statistical properties of time series constructed from random walks on graphs allows to characterize the topology of complex networks. In particular, the study of fluctuations in time series corresponding to different node properties, such as the degree, the average degree of nearest neighbours and the clustering coefficient, can reveal the existence of local and global correlations in the underlying graph. In this way it is possible to associate to each network a set of characteristic exponents which describe the scaling of fluctuations of each node property and capture the intrinsic complexity of a graph in a concise way. We show that these exponents can be employed to check the stability of the structure of growing networks and also allow to construct a taxonomy of networks, thus providing a quantitative, effective way of discriminating social from biological and technological systems by looking only at their structural properties.
Let be a connected undirected graph consisting of nodes and edges, and denote by the adjacency matrix of , whose entry if there is an edge between node and node , while otherwise. Let us consider a random walk on described by a time-invariant transition matrix . At each time step, a walker moves from the current node to node with a probability . The probabilities satisfy the normalization condition .
According to this definition, a walk on corresponds to a discrete time-invariant Markov chain defined by the transition matrix on the state space . Let us now consider an instance of the walk defined by on , and a real-valued property of node , . If we indicate as the sequence of nodes visited by , we can construct the time series . For instance, if , we get the time series of the degrees of the visited nodes.
We denote such time series as , because it depends on the node property , and on the specific order in which the nodes are visited by the walk. However, if the walk defined by the transition matrix on is irreducible, then the topology of completely determines which sequences of values can be produced by the walk and with which frequency Cover1991 (). Hence, any two time series and constructed from two walkers and on corresponding to the same walk rule and the same node property will have, for , the same statistical properties, and will carry the same information about the structure of . We can therefore indicate any time series produced by a transition matrix and by node property as . We will now show that the analysis of the time series produced by different node properties can provide useful insights on the microscopic structure of a complex network and about its overall organization. We focus on the case of classical random walks, i.e. we set . Notice that with this rule the walkers visit each edge of a connected graph with uniform probability, so that the time series constructed from random walks on contain information about the distribution and correlations of the chosen node property throughout the network. We consider three possible choices of , namely the node degree, , the average degree of first neighbours of a node, , and the node clustering coefficient, , where is the number of closed triads centered on divided by the total possible number of such triads. We decided to focus on these three node properties because broad-tailed degree distributions (, ), the presence of non-trivial degree correlations () and the abundance of triangles () are the basic features of most complex networks Barabasi2002rev (); Newman2003rev (); Boccaletti2006 ().
In fig. 1(a) and 1(b) we respectively report the autocorrelation function (ACF) and the power spectrum (PS) of the degree-based time series () obtained in an Erdös-Renyi random graph (ER), a scale-free graph (SF) constructed by the configuration model Newman2010 (), and two real-world complex networks, namely the Internet at the level of Autonomous Systems (Internet) Pastor-Satorras2001 () and the network of co-authorship in condensed matter (SCN) Newman2001 (). As expected, the ACF of ER and SF decays pretty fast and the corresponding PS is almost flat, indicating the absence of degree correlations. Conversely, the degree-based time series obtained from real-world networks exhibit broad tails both in the ACF and the in the PS, a clear indication of the presence of long-range degree correlations. The peaks at even values of in the ACF of Internet are due to the presence of disassortative degree correlations Note1 (). An iterative surrogate analysis Schreiber1996 () has also confirmed that these time series are highly non-linear. A non-parametric statistical test dedomenico2010fast (), not depending on delay embedding reconstruction, suggested that such time series are non-linear or non-stationary with high confidence level.
In the following we report the results of the multifractal Detrended Fluctuation Analysis (DFA) kantelhardt2002multifractal (), a standard non-linear analysis technique which allows to detect the presence of long-range correlations and to quantify the self-affinity of a time series, even if generated by a non-stationary process. Given a time series we consider time-windows of length ; then, we remove the local linear trend in each time-window to obtain the detrended time series and we compute the local variance of the detrended fluctuations. We evaluate the structure function by averaging over all time-windows whose length is equal to , and we plot as a function of . The procedure can be generalized to build a set of structure functions depending on a parameter Hurst51 (); Hurst56 (); heneghan2000establishing (), but here we focus on , allowing a physical interpretation of the results in term of diffusivity.
If the graph is -uncorrelated, i.e. if the probability to find the edge connecting node to node does not depend on the values and , then the fluctuations of the corresponding time series obtained from a random walk on will be indistinguishable from an uncorrelated Gaussian noise, for which we have . Conversely, a scaling behaviour with is a clear signal of the existence of -correlations in the original graph , and the value of is a proxy for the magnitude of such correlations.
In fig. 1(c) we report the results of the DFA of for the same four networks considered in panel (a) and (b). As expected, degree fluctuations in ER and SF are compatible with Gaussian noise (), since the node degrees in these networks are uncorrelated. Conversely, plots corresponding to time series generated by walkers on the Internet and on the SCN appreciably deviate from Gaussian noise and are characterized by two different regimes 111To observe the two scaling regimes, it is necessary to analyse sufficiently long time-series. In particular, for a network with nodes we suggest to generate time-series of length at least equal to , or better , in order to guarantee that each edge of the graph has been traversed a sufficiently high number of times.. In the first regime, corresponding to small values of , both time series are super-diffusive, i.e. with ( for Internet and for SCN), while for large values of their behaviour is almost Gaussian ( with for Internet and for SCN). In fig. 2 we report the results of the DFA of time series generated by , and in six real-world networks of different nature Note2 (). The same two-regime behavior shown in fig. 1 for degree-based time series, is also found for the time series generated by and . The two scaling regimes are a signature that the networks look different, with respect to degree, degree correlations and clustering, when observed at a local or at a global scale. On the one hand, the super-diffusive behaviour observed for small values of () indicates that a walker which explores the network for relatively short time intervals will observe correlated fluctuations in the properties of the nodes it visits, a clear signal of the presence of -correlations. On the other hand, the almost-Gaussian behaviour corresponding to large values of () suggests that at a larger scale (i.e., if the walk continues for a sufficiently long time), the network appears uncorrelated. The transition point that separates the two regimes corresponds to the typical scale of –correlations, i.e. the typical walk length above which local heterogeneities and correlations in the values of become less important and all the walks on the network can be considered a homogeneous representation of the typical -fluctuations of the graph. We notice that in some cases the exponent can be substantially larger than , like in the case of the US power grid Watts1998 (), for which we have for all the three time series. In this particular case, the super-diffusive behavior for large values of is due to the fact that the network is embedded in a 2D space and has a strongly self-similar structure Daqing2011 ().
Although the presence of two scaling regimes seems to be a ubiquitous feature of different real-world networks, independently of their origin and nature, fig. 2 indicates that the actual values of the two exponents and may vary a lot for different node properties of the same network and, more importantly, for the same node property across different networks. In the following we show that these scaling exponents capture some key properties of a graph and can be employed to construct a taxonomy of networks Estrada2007 (); Onnela2012 ().
We considered a data set of 39 medium-to-large sized ( to ) real-world networks representing different social, biological and technological systems. We assigned to each graph a point identified by the values of the six scaling exponents obtained from the DFA of time series of degree, clustering coefficient and average degree of first neighbours. Then, we performed a hierarchical clustering on the resulting set of points, subsequently merging together at each step the two clusters whose points were separated by the smallest distance in . In fig. 3(a) we report the resulting dendrogram, where the six large clusters identified (highlighted with different colors) correspond to networks with different functions. From left to right: the green cluster contains all the co-authorship (Newman2001 (); Leskovec2007a ()), trust (PGP Boguna2004 ()) and collaboration networks (IMDb co-starring network Watts1998 ()); the blue cluster includes spatial networks (US power grid Watts1998 () and the Pennsylvania road network Leskovec2009 ()); the bright-cyan cluster contains information networks, such as the WWW Leskovec2009 (), citation networks Gehrke2003 (), and email communication networks Guimera2003 (); Leskovec2009 (); the dark-cyan cluster includes online social networks Richardson2003 (); Leskovec2009 (); Cho2011 () and proteomes Watts1998 (); Colizza2005 (); the purple cluster contains technological networks, including snapshot of the Internet sampled at different times by different institutions Pastor-Satorras2001 (); Leskovec2005 (); COSIN () and the Gnutella peer-to-peer file-sharing network Ripeanu2002 (). Finally, the networks of US airports at two different times Colizza2007 () are put together in the yellow cluster. The accuracy of characteristic exponents in classifying networks of different nature is quite remarkable 222We also tried to perform hierarchical clustering of the 39 networks by using, for each time-series, only the exponent corresponding to the scaling of for small values of . However, the resulting classification is not as neat and as clear as the one reported in fig. 3., and becomes evident by comparing the results of fig. 3(a) with those of hierarchical clustering based on the mean and standard deviations of , and , reported in fig. 3(b). While in the former case clusters represent homogeneous groups of networks, in the latter case each cluster always contains networks of different nature.
The results shown in fig. 2 and fig. 3 suggest that the scaling exponents of the time series produced by random walkers visiting a complex network are indeed a key feature to characterize the network. Hence, we name them characteristic exponents of the network (Table 1 reports the characteristic exponents of all the complex networks considered in this study).
It is also interesting to investigate how the characteristic exponents of growing graphs change over time. In fig. 4(a) and 4(b) we show the temporal evolution of the characteristic exponents of , and respectively for the collaboration network of authors in APS Physical Review E (PRE) and for the Internet. Both networks have grown by a factor in the considered time intervals. However, while in PRE the characteristic exponent for and exhibits a clear decrease over time, the characteristic exponents of the Internet have remained constant in the considered 10-years interval. The different temporal behaviour of the characteristic exponents is probably due to the peculiar dynamics of edge formation in the two networks. In fact, in a co-authorship network a node continues to accumulate edges over time, even if the majority of these edges correspond to collaborations which are not active any more. Evidently, the continuous addition of edges drives the network towards a homogenization of degree and clustering correlations. Conversely, the number of neighbours of a node in the Internet cannot increase indefinitely, due to technological and economical constraints. In fact, connecting to more peers usually implies handling more Internet traffic, which in turn requires more bandwidth and new hardware, and translates into an economical investment. These constraints are mostly independent from network size, thus having the same impact on the network growth at different times. This might explain why the structure of correlations has remained stable over time.
Finally, we check whether the position of the cut-off of the structure function does depend on the size of the graph, and to which extent. To this aim, we show in fig. 5 the approximate value of for the PRE and Internet networks, as a function of time. Notice that as the networks grow the corresponding values of change slightly for all the time-series, but we observe opposite trends in the two cases. In particular, usually decreases for PRE and increases in Internet. This means that is not simply determined by the size of the network (otherwise we should have observed a similar behaviour in both networks), but is instead intimately related to the local organization of the graph.
It is also worth noticing that, despite the presence of these trends, usually remains of the same order while both networks have grown by an order of magnitude in the considered time intervals. For instance, in the time-series of degrees of PRE [see fig. 5(a)] remains in the range , while for the time-series of it is in the range . If we take into account the fact that a random walk on any of the snapshots of the PRE collaboration network typically requires time-steps in order to visit all the nodes at least once, and that this network has strong communities and a high value of clustering coefficient (and both these factors contribute to keep a walker confined on a small set of nodes), then we realise that corresponds indeed to the exploration of a relatively small region of the graph, which usually includes no more than a few hundred nodes. Similarly, in the Internet network [fig. 5(b)] is in the interval for and in for and , which again correspond to visiting a relatively small portion of the graph. These results suggest that the values of the cut-off in the scaling of the structure functions tend to remain practically stable over time, even when the network undergoes substantial expansion.
Summing up, in this work we reported on the discovery of an intimate connection between the structure of a network, the properties of time-series extracted from it, and the capability of such time-series to carry useful information about the overall organization of the network. We have shown that the characteristic exponents corresponding to degree, node clustering coefficient and average degree of first neighbours can be used to cluster networks, and to distinguish social, collaboration, biological, information, transportation and spatial networks only by looking at their structure. The procedure described in this work is quite general, and can be used to extract characteristic exponents corresponding to any desired node or link property, thus allowing for a finer and more accurate classification of complex networks.
Acknowledgements.VN and VL acknowledge support from the EU-LASAGNE Project, Contract No.318132 (STREP) funded by the European Commission. MDD is supported by the FET-Proactive project PLEXMATH (FP7-ICT-2011-8; grant number 317614) and MULTIPLEX (317532) funded by the European Commission.
- (1) Strogatz, S. H. Nature 410, 268 (2001).
- (2) Albert, R. and Barabasi, A.-L. Rev. Mod. Phys. 74, 47 (2002).
- (3) Newman, M. E. J. SIAM Review 45, 167 (2003).
- (4) Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. and Hwang, D.-U. Phys. Rep. 424, 175 (2006)
- (5) Barrat, A., Barthlemy, M. and Vespignani, A. Dynamical processes on complex networks Cambridge University Press (2008).
- (6) Newman, M. Networks: An Introduction Oxford University Press (2010).
- (7) Zhang, J. and Small, M. Phys. Rev. Lett. 96, 238701 (2006).
- (8) Xu, X., Zhang, J. and Small, M. Proc. Natl. Acad. Sci. USA 105, 19601 (2008).
- (9) Donner, R. V., Zou, Y., Donges, J. F., Marwan, N. and Kurths, J. New J. Phys. 12, 033025 (2010).
- (10) Shirazi, A. H., Jafari, G. R., Davoudi, J., Peinke, J., Tabar, M. R. R. and Sahimi, M. J. Stat. Mech. 2009, P07046 (2009).
- (11) Campanharo A.S.L.O. , Sirer M.I., Malmgren R.D., Ramos F.M., and Amaral L.A.N. PLoS ONE 6, e23378 (2011).
- (12) Lacasa, L., Luque, B., Ballesteros, F., Luque, J. and Nu no, J. C. Proc. Natl. Acad. Sci. USA 105, 4972 (2008).
- (13) Luque, B., Lacasa, L., Ballesteros, F. and Luque, J. Phys. Rev. E 80, 046103 (2009).
- (14) Lacasa, L. and Toral, R. Phys. Rev. E 82, 036120 (2010).
- (15) Donner, R. V., Small, M. M., Donges, J. F., Marwan, N., Zou, Y., Xiang, R. and Kurths, J. Int. J. Bifurcat. Chaos 21, 1019 (2011).
- (16) Nuñez, A., Lacasa, L., Valero, E., Gómez, J. P. and Luque, B. Int. J. Bifurcat. Chaos 22, 1250160 (2012).
- (17) Tadic B. and Thurner S. Physica A 332, 566 (2004).
- (18) Shimada, Y., Ikeguchi, T. and Shigehara, T. Phys. Rev. Lett. 109, 158701 (2012).
- (19) Lacasa, L. and Gómez-Garde nes, J. Phys. Rev. Lett. 110, 168703, (2013).
- (20) Cover, T. M. and Thomas, J. A. Elements of Information Theory Wiley (1991).
- (21) Pastor-Satorras, R., Vazquez, A. and Vespignani, A. Phys. Rev. Lett. 87, 258701 (2001).
- (22) Newman, M. E. J. Phys. Rev. E 64, 016131 (2001).
- (23) In disassortative networks the degree of the neighbours of a hub tend to be small, while the degree of the neighbours of small-degree nodes tends to be large. Thus, the degrees of nodes which are at distance multiple of two tend to be positively correlated.
- (24) Schreiber, T. and Schmitz, A. Phys. Rev. Lett. 77, 635 (1996).
- (25) De Domenico M. and Latora V, Europhys. Lett. 91, 30005 (2010).
- (26) Kantelhardt J.W., Zschiegner S. A., Koscielny-Bunde E. , Havlin S., Bunde A. and Stanley H. E. Physica A 316, 87 (2002).
- (27) Hurst H.E. Trans. Am. Soc. Civil Engrs. 116, 770 (1951).
- (28) Hurst H.E. Proc. Am. Soc. Civil Engrs. 5, 519 (1956).
- (29) Heneghan C. and McDarby G Phys. Rev. E 62 , 6103 (2000).
- (30) For each of the networks and for each node property we constructed time-series of length . Each characteristic exponent was obtained as the average of the fit of the structure function of the corresponding time-series. The standard deviation of the values of characteristic exponents was always smaller than of the corresponding mean value.
- (31) Colizza, V., Pastor-Satorras, R. and Vespignani, A. Nat. Phys. 3, 276 (2007).
- (32) Sun S., Ling L., Zhang N., Li GF. and Chen R. Nucleic Acids Res. 31, 2443 (2003).
- (33) Watts, D. J. and Strogatz, S. H. Nature 393, 440 (1998).
- (34) Gehrke J., Ginsparg P., Kleinberg J. M. SIGKDD Explorations 5, 149 (2003).
- (35) Leskovec J., Lang K., Dasgupta A. and Mahoney M. Internet Math. 6, 29 (2009).
- (36) Daqing, L., Kosmidis, K., Bunde, A. and Havlin, S. Nat. Phys. 7, 481 (2011).
- (37) Onnela, J.-P., Fenn, D. J., Reid, S., Porter, M. A., Mucha, P. J., Fricker, M. D. and Jones, N. S. Phys. Rev. E 86, 036104 (2012).
- (38) Estrada, E. Phys. Rev. E 75, 016103 (2007).
- (39) Leskovec J., Kleinberg J. and Faloutsos C. “Graph Evolution: Densification and Shrinking Diameters”. ACM Transactions on Knowledge Discovery from Data (ACM TKDD) 1, (2007)
- (40) Bogu ná, M., Pastor-Satorras, R., Díaz-Guilera, A. and Arenas, A. Phys. Rev. E 70, 056122 (2004).
- (41) Guimerà, R., Danon, L., Díaz-Guilera, A., Giralt, F. and Arenas, A. Phys. Rev. E 68, 065103 (2003).
- (42) Richardson M., Agrawal R. and Domingos P. “Trust Management for the Semantic Web”, In Proceedings of The Semantic Web - ISWC2003, Lecture Notes in Computer Science 2870, 351 (2003).
- (43) Cho E., Myers S. A., Leskovec J. “Friendship and Mobility: User Movement in Location-Based Social Networks”, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2011) 1082 (2011).
- (44) Colizza, V., Flammini, A., Maritan, A. and Vespignani, A. Physica A 352, 1 (2005).
- (45) Leskovec J., Kleinberg J. and Faloutsos C. “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations”, in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2005) 177 (2005).
- (46) COSIN web page http://www.cosin.org
- (47) Ripeanu M., Foster I. and Iamnitchi A. IEEE Internet Comput. 6, 50 (2002).