Cover time for random walks on arbitrary complex networks
We present an analytical method for computing the mean cover time of a random walk process on arbitrary, complex networks. The cover time is defined as the time a random walker requires to visit every node in the network at least once. This quantity is particularly important for random search processes and target localization in network topologies. Based on the global mean first passage time of target nodes we derive an estimate for the cumulative distribution function of the cover time based on first passage time statistics. We show that our result can be applied to various model networks, including Erdős–Rényi and Barabási–Albert networks, as well as various real-world networks. Our results reveal an intimate link between first passage and cover time statistics in networks in which structurally induced temporal correlations decay quickly and offer a computationally efficient way for estimating cover times in network related applications.
Random walks have been studied extensively for more than a century and emerged as an efficient descriptive model for spreading and diffusion processes in physics, biology, social sciences, epidemiology, and computer science masuda_random_2016 (); berg_random_1993 (); klafter_first_2011 (); barrat_dynamical_2008 (); newman_networks:_2010 (); oksendal_stochastic_1992 (). Because of their wide applicability and relevance to dynamic phenomena, random walk processes on complex networks have become a topic of interest masuda_random_2016 (). In this context, the calculation of a resistor network’s total resistance newman_networks:_2010 (), synchronization phenomena in networks of coupled oscillators barrat_dynamical_2008 (), the global spread of infectious diseases on the global air traffic network iannelli_effective_2017 () and ranking the importance of single websites in the world wide web page_pagerank_1999 () are just a few examples of systems that have been investigated based on concepts derived from random walk theory.
Especially important is the understanding of temporal aspects of stochastic processes and how different network structures influence the equilibration process. Consequently, a lot of theoretical work focused on understanding the connection between network structure and relaxation time scales or first passage times (FPTs), i.e. the time it takes a single walker to travel from one node to another. Both, relaxation and first passage times quantify different aspects but fail to capture the characteristic time a walker requires on average to visit every node in a network, which is captured by the cover time of the process. This quantity, however, has important practical applications from biology to computer science, for instance, for estimating how long it will take to distribute a chemical or a certain commodity to every node in a network. Researchers have been able to obtain analytical, asymptotic results for the cover time for model networks, e.g. complete graphs, Erdős–Rényi (ER) and Barabási–Albert (BA) networks. Despite these results, analytical results, approximate results or heuristics concerning the mean cover time of real-world networks is lacking and it is unclear how a real-world network’s mean cover time is related to other temporal features of random walks on networks and how higher order structures of a real network may impact the mean cover time.
In the following, we present a theoretical approach that predicts the cover time on arbitrary complex networks using only FPT statistics. We show that for networks on which random walks equilibrate quickly, the cover time can be estimated accurately by the maximum of a set of FPTs drawn from the ensemble of FPT distributions of all target nodes. Our method’s predictions are in excellent agreement with results provided by computer simulations for a variety of real-world networks, as well as for ER networks, BA networks, complete graphs and the configuration model (uniform degree random networks). We furthermore show that our method produces results with significant deviations for lattice-like networks (e.g. subway networks roth_long-time_2012 () and two-dimensional regular lattices) that violate the assumption of rapid relaxation.
ii.1 Random walks and first passage times
We investigate a simple discrete time random walk on complex networks. Given an unweighted, undirected, network composed of nodes, links and adjacency matrix with if node and are connected and otherwise, a walker starts at on any node . Then it chooses one of ’s neighbors at random and jumps to , completing the transition from time to time . This processes is repeated indefinitely and governed by the master equation
where is the probability that the walker is at node at time , is the degree of node and is the transition probability of the walker going from node to node in one time step. We assume that the network has a single component, so every node can be reached, in principle, from every other node. Under relatively general conditions this process will approach the equilibrium
Central questions for random walks are often connected to first passage times (FPT), e.g. the mean first passage time (MFPT) between two nodes and . This time is defined as the mean number of steps it takes a random walker starting at node to first arrive at node . Another important quantity is the global mean first passage time of node (GMFPT),
The GMFPT can be used as a measure of centrality for node since a node which is quickly reachable from the whole network may be interpreted to be “important”. Those passage times are well analyzed and often efficiently computable from network properties. One of those is the graph Laplacian, which emerges naturally when investigating diffusion on networks and is defined as
where denotes Kronecker’s delta. Using this operator, the MFPT between two nodes can be computed by inversion of a reduced Laplacian (note that the operator above is singular and hence not invertible) newman_networks:_2010 (). A method described in lin_mean_2012 () computes the GMFPT from the graph Laplacian’s spectrum. Given its eigenvalues in increasing order and corresponding orthonormal eigenvectors , one can compute node ’s exact GMFPT as
A computationally more efficient method estimates the GMFPT by its lower bound, which is given by
if the process relaxates quickly, e.g. in time , as derived in lau_asymptotic_2010 (). In the same study it was shown that this result holds remarkably well for Erdős–Rényi (ER) and Barabási–Albert (BA) networks, as well as for a variety of real-world networks.
Another temporal characteristic with practical relevance is the mean cover time , defined as the mean number of steps it takes a random walker starting at node to visit every other node at least once. For various network models simple heuristics concerning the asymptotic scaling of the mean cover time with growing node number have been derived cooper_cover_2007 (); cooper_cover_2007-1 (); lovasz_random_1996 (). For ER, BA and fully connected graphs it was shown that
with network specific prefactor . Such scaling relationships are useful for comparative analyses, e.g. when networks for different sizes of the same class are compared. They are less helpful when actual expected cover times need to be computed for empirical networks where is fixed and comparative or relative statements are insufficient.
Unfortunately, a general procedure for estimating the actual cover time for arbitrary complex networks, as well as the connection between the mean cover time and FPT observables is lacking. In the following we present a method that estimates the cover time using passage time statistics.
ii.2 Cover Time
Recently it has been found that if a random walk process equilibrates quickly, i.e. the initial concentration of random walkers approaches the equilibrium concentration in a small number of time steps , the information about the start node is lost lau_asymptotic_2010 () and the first passage time at destination is (for larger times ) distributed asymptotically according to
where is the GMFPT of Eq. (1). can differ between nodes and depends on the topological features of the network only. Note that in the following paragraphs we will often refer to the FPT decay rate instead of the GMFPT, simplifying the notation.
In order to find the mean cover time from the collection of distributions we proceed as illustrated in Fig. 1. Excluding the starting node , we pick a FPT for each target node from their respective FPT distribution at random, resulting in a set of FPT that we call . Consequently, the cover time is given as the maximum element of . In order to find the distribution of this maximum, we compute the probability that a time is equal to the maximum of this set as the probability that no element of is larger than , yielding
We further approximate our result by assuming a continuous time distribution, easing the computations without significantly changing the outcome. Then, the probability that any time is lower than or equal to is
and the expected cover time
We can find the global mean cover time by averaging over target nodes as
However, as argued in App. B, without introducing too much error for , we will make use of a simpler integral to find
Here, is the set of all nodes and is the set of all possible subsets of excluding the empty set. Even though one can solve the integral Eq. (9) analytically to obtain the result above, in practice it is more feasible to solve the integral numerically than iterating over which has elements and hence becomes very large rather quickly.
Now, the estimation of the global mean cover time reduces to an efficient estimation of the FPT decay rates . There are two ways to estimate the decay rates with the GMFPTs as described in Sec. II.1. Using the estimation of the lower bound Eq. (3), the estimated global mean cover time is given as
The advantage of this method is that only the network’s degree sequence needs to be known in order to estimate the global mean cover time. However, this method can obviously only account for a lower bound. Secondly, we can compute the exact GMFPTs using Eq. (2). Then the computed global mean cover time is
|First author’s facebook friends network maier_b.f._2017 ()||329||11.9||11.61||8.45||0.37||12.36||0.061|
|C. Elegans neural network watts_collective_1998 ()||297||14.6||8.64||9.15||0.06||8.69||0.006|
|E. Coli protein interaction shen-orr_network_2002 ()||329||2.8||5.57||4.27||0.30||7.24||0.231|
|Intra-org. contacts - Cons. (info) cross_hidden_2004 ()||43||15.3||2.34||2.41||0.03||2.38||0.018|
|Intra-org. contacts - Cons. (value)||44||16.0||2.00||2.02||0.01||2.07||0.036|
|Intra-org. contacts - Manuf. (awareness)||77||25.5||3.39||3.47||0.02||3.46||0.021|
|Intra-org. contacts - Manuf. (info)||76||23.3||2.35||2.29||0.03||2.37||0.009|
|Social interaction in dolphins lusseau_bottlenose_2003 ()||62||5.1||4.79||4.46||0.07||4.86||0.015|
|American college football girvan_community_2002 ()||115||10.7||1.37||1.27||0.08||1.40||0.017|
|Food web of grassland species dawah_structure_1995 ()||75||3.0||4.66||3.97||0.17||5.17||0.099|
|Zachary’s Karate club zachary_information_1977 ()||34||4.5||3.01||3.29||0.09||3.05||0.015|
|Interactions in “Les Misérables” knuth_stanford_1993 ()||77||6.6||6.75||6.21||0.09||7.21||0.063|
|Matches of the NFL 2009 aicher_learning_2015 ()||32||13.2||1.20||1.27||0.06||1.21||0.015|
|Network of associations between terrorists krebs_mapping_2002 ()||62||4.9||4.63||4.47||0.04||4.87||0.049|
|Connections between 500 largest US airports colizza_reaction-diffusion_2007 ()||500||11.9||12.29||10.30||0.19||13.18||0.067|
ii.3 Cover time of networks with equal GMFPTs
Let us consider a network where all nodes have approximately the same GMFPT and the structure is sufficiently random that we can estimate the mean cover time using Eq. (6). We find
where is the Euler-Mascheroni constant and the polygamma function.
An example for networks fulfilling the conditions above is the configuration model where all nodes have identical degree. This includes, e.g. the complete graph. The cover time of the complete graph scales as for large lovasz_random_1996 (), a result which is reproduced by Eq. (12) since the GMFPT for each node is (see App. A) and for large .
We compared the predictions of Eqs. (10) and (11) with simulation results for single component ER, BA and real-world networks, as well as Eq. (12) for random networks with a strongly peaked degree distribution. On every node we placed a walker at time . Subsequently, we let each walker do a random walk as described in Sec. II.1. Each walker proceeded until it visited each node at least once, completing total coverage and marking cover time . Then was computed as the average of all . For a more detailed description of the numerical methods as well as the used code, see App. C.
For both ER and BA networks, we generated networks with nodes, ER networks with integer mean degree , and BA networks with parameter (new links per node at creation). In order to test Eq. (12), we generated networks using the configuration model with nodes and degree sequence , scanning integer degrees After creation of each network, we extracted the giant component, deleted self-links and multiple occurences of unique links, ran the random walks and estimated the cover time using Eqs. (10), (11) and Eq. (12), respectively, for 1000 networks each. The theoretic results are in excellent agreement with the simulation results, as can be seen in Fig. 2. The relative error made by our method decreases with increasing number of nodes as well as increasing mean degree and quickly reaches values below 1%. Unsurprisingly, our method performs better compared to the results of cooper_cover_2007 (); cooper_cover_2007-1 () due to the asymptotic nature of the latter.
We furthermore simulated random walks on the giant component of 15 real-world networks, listed in Tab. 1. Some of those are directed networks that we converted to undirected networks by converting every directed link to an undirected link. For weighted networks we assigned an undirected link if a weight was . For the intra-organizational networks from cross_hidden_2004 (), employees had to fill out questionnaires regarding their relationships to co-workers. Here, we assigned an undirected link if both and marked something else than “I do not know this person”. As can be seen in Tab. 1, our method produces results that are very close to the simulated values (mostly relative errors of ). One exception is the computed cover time for the E. coli protein interaction network shen-orr_network_2002 () with a relatively high relative error of . Generally, the more exact result of GMFPTs calculated via the graph Laplacian gives results with lower relative error than using lower bound GMFPTs, as expected.
Additionally, we performed simulations on two-dimensional square lattices of nodes. For those networks the GMFPT is roughly equal for all nodes, however, information of the starting node is very important for picking the right FPT of targets. Simply picking one FPT from a distribution is not a valid procedure anymore, due to high correlation of FPTs with lattice distance. Hence, we suspect that our method will not perform well for lattice-like networks. Indeed, as can be seen in Fig. 4, the relative error between simulation and theoretic estimations increases with increasing , up to for the estimated GMFPTs and for the exact GMFPTs. Similar results are obtained for lattice-like real-world networks such as subway networks roth_long-time_2012 () (shown in Tab. 2 and Fig. 3). Here, the estimation from estimated GMFPTs systematically underestimates the cover time while using exact GMFPTs yielded an overestimation of the cover time by .
We studied the cover time of simple discrete time random walks on complex networks with nodes. Treating each target node as independent from the start node, we were able to find the cumulative distribution function of the cover time by finding the maximum of drawn FPTs from the target nodes’ FPT distributions which solely depend on their GMFPT. Using this method, the complexity of finding the mean cover time of an arbitrary complex network is heavily decreased since the problem is practically reduced to finding the nodes’ GMFPTs using simple estimations or spectral methods, which is computationally much more feasible than simulating random walks starting on every node, especially for large networks.
We showed that this procedure yields reliable estimations of the mean cover time for a variety of networks where random walks decay quickly, namely ER networks, BA networks, random graphs with heavily peaked degree distributions, and a collection of real-world networks. We furthermore showed that for lattice-like networks our method does not produce reliable results since the information about the start node is important for picking the right FPT from the FPT distibution of the target nodes. While the spatial correlation of nodes is responsible for this deviation, it still remains unclear how to determine under which conditions our method is safely applicable. This is a task for future investigations.
Finally, note that even though we derived our results for unweighted networks, they can be easily made applicable to weighted networks and subclasses of directed networks, as they only depend on the calculation of GMFPTs. Those, in turn, depend solely on the transition matrix which is similarly defined for weighted and directed networks.
Appendix A The GMFPT of a complete graph
Suppose a random walker starts at any node . The probability to reach any other node of the network in one time step is . Looking at a single target node we want to calculate the probability that is first passaged at time , which is given as
Hence, the GMFPT for every target node is
Appendix B Approximation of mean cover time integral
defining . Note that the cover time cdf is given by Eq. (8), s.t. both
meaning that for both integration limits, the integrand approaches 0. In the following we assume that the distribution of decay rates is relatively homogeneous in the region of small rates (implying that there are some rates that are of the same order as ). This is a relatively safe assumption for most network models and real-world networks as in most cases there are more nodes with small degree (hence small decay rates) than nodes with high degree (hence high decay rates). Now suppose the integration approaches a time where , implying that there are still some terms , such that Furthermore, there will already be a majority of terms which leads to approaching . Hence, we can safely assume that for a network with a larger number of nodes the integrand approaches zero at all times while the global mean cover time grows quickly and thus the relative error of Eq. (9) is approaching
Appendix C Simulations and numerical evaluations of the mean cover time
The code used for the random walk simulations is a standard implementation of discrete time random walks on networks and available online as a C++/Python/Matlab package, see https://github.com/benmaier/cNetworkDiff.
In order to evaluate the mean cover time, the integrals Eqs. (10) and (11) were solved numerically. Note that for the numerical integration, an upper integration bound of infinity can be problematic when the decay region of the integrand is unknown. Hence, we chose an upper integration limit where . The code is available online as a Python package, see https://github.com/benmaier/nwDiff. For solving the integral Eq. (11) we computed the GMFPT of each node via Eq. (2) with eigenvalues and -vectors computed using the NumPy implementation jones_scipy:_2001 () of the standard algorithm for eigenvalue and -vector computation of Hermitian matrices strang_linear_1980 ().
- (1) N. Masuda, M. A. Porter, and R. Lambiotte, “Random walks and diffusion on networks,” arXiv:1612.03281 [cond-mat, physics:physics], Dec. 2016. arXiv: 1612.03281.
- (2) H. C. Berg, Random walks in biology. Princeton, NJ: Princeton University Press, 1993.
- (3) J. Klafter and I. M. Sokolov, First steps in random walks: from tools to applications. Oxford ; New York: Oxford University Press, 2011. OCLC: ocn714724924.
- (4) A. Barrat, M. Barthelemy, and A. Vespignani, Dynamical processes on complex networks. Cambridge, UK ; New York: Cambridge University Press, 2008. OCLC: ocn231581094.
- (5) M. Newman, Networks: An Introduction. New York, NY, USA: Oxford University Press, Inc., 2010.
- (6) B. Oksendal, Stochastic Differential Equations: An Introduction with Applications. Universitext, Berlin: Springer, 1992. OCLC: 246776666.
- (7) F. Iannelli, A. Koher, D. Brockmann, P. Hövel, and I. M. Sokolov, “Effective distances for epidemics spreading on complex networks,” Physical Review E, vol. 95, p. 012313, Jan. 2017.
- (8) L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web.,” Technical Report 1999-66, Stanford InfoLab, Nov. 1999.
- (9) C. Roth, S. M. Kang, M. Batty, and M. Barthelemy, “A long-time limit for world subway networks,” Journal of The Royal Society Interface, 2012.
- (10) Y. Lin, A. Julaiti, and Z. Zhang, “Mean first-passage time for random walks in general graphs with a deep trap,” The Journal of Chemical Physics, vol. 137, p. 124104, Sept. 2012.
- (11) H. W. Lau and K. Y. Szeto, “Asymptotic analysis of first passage time in complex networks,” EPL (Europhysics Letters), vol. 90, no. 4, p. 40005, 2010.
- (12) C. Cooper and A. Frieze, “The cover time of sparse random graphs,” Random Structures and Algorithms, vol. 30, pp. 1–16, Jan. 2007.
- (13) C. Cooper and A. Frieze, “The cover time of the preferential attachment graph,” Journal of Combinatorial Theory, Series B, vol. 97, no. 2, pp. 269–290, 2007.
- (14) L. Lovász, “Random Walks on Graphs: A Survey,” in Combinatorics, Paul Erdős is Eighty (D. Miklós, V. T. Sós, and T. Szőnyi, eds.), vol. 2, pp. 353–398, Budapest: János Bolyai Mathematical Society, 1996.
- (15) B. F. Maier, “B.F. Maier’s FB friends network, https://github.com/benmaier/BFMaierFBnetwork,” Apr. 2017.
- (16) D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’ networks,” Nature, vol. 393, pp. 440–442, June 1998.
- (17) S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon, “Network motifs in the transcriptional regulation network of Escherichia coli,” Nature Genetics, vol. 31, no. 1, pp. 64–68, 2002.
- (18) R. Cross and A. Parker, The Hidden Power of Social Networks. Boston, MA: Harvard Business School Press, 2004.
- (19) D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, “The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations,” Behavioral Ecology and Sociobiology, vol. 54, pp. 396–405, Sept. 2003.
- (20) M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, pp. 7821–7826, June 2002.
- (21) H. A. Dawah, B. A. Hawkins, and M. F. Claridge, “Structure of the Parasitoid Communities of Grass-Feeding Chalcid Wasps,” Journal of Animal Ecology, vol. 64, no. 6, pp. 708–720, 1995.
- (22) W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of Anthropological Research, vol. 33, pp. 452–473, 1977.
- (23) D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing. Reading, MA: Addison-Wesley, 1993.
- (24) C. Aicher, A. Z. Jacobs, and A. Clauset, “Learning latent block structure in weighted networks,” Journal of Complex Networks, vol. 3, pp. 221–248, June 2015.
- (25) V. Krebs, “Mapping networks of terrorist cells,” Connections, vol. 24, pp. 43–52, 2002.
- (26) V. Colizza, R. Pastor-Satorras, and A. Vespignani, “Reaction-diffusion processes and metapopulation models in heterogeneous networks,” Nature Physics, vol. 3, pp. 276–282, Apr. 2007.
- (27) E. Jones, T. Oliphant, P. Peterson, and others, SciPy: Open source scientific tools for Python. 2001.
- (28) G. Strang, Linear algebra and its applications. Belmont, CA: Thomson, Brooks/Cole, 2nd ed ed., 1980.