Distribution of shortest cycle lengths in random networks

# Distribution of shortest cycle lengths in random networks

Haggai Bonneau Racah Institute of Physics, The Hebrew University, Jerusalem 91904, Israel    Aviv Hassid Racah Institute of Physics, The Hebrew University, Jerusalem 91904, Israel    Ofer Biham Racah Institute of Physics, The Hebrew University, Jerusalem 91904, Israel    Reimer Kühn Department of Mathematics, King’s College London, Strand, London WC2R 2LS, UK    Eytan Katzav Racah Institute of Physics, The Hebrew University, Jerusalem 91904, Israel
###### Abstract

We present analytical results for the distribution of shortest cycle lengths (DSCL) in random networks. The approach is based on the relation between the DSCL and the distribution of shortest path lengths (DSPL). We apply this approach to configuration model networks, for which analytical results for the DSPL were obtained before. We first calculate the fraction of nodes in the network which reside on at least one cycle. Conditioning on being on a cycle, we provide the DSCL over ensembles of configuration model networks with degree distributions which follow a Poisson distribution (Erdős-Rényi network), degenerate distribution (random regular graph) and a power-law distribution (scale-free network). The mean and variance of the DSCL are calculated. The analytical results are found to be in very good agreement with the results of computer simulations.

###### pacs:
64.60.aq,89.75.Da

## I Introduction

Network models provide a useful conceptual framework for the study of a large variety of systems and processes in science, technology and society Havlin2010 (); Newman2010 (); Estrada2011 (); Barrat2012 (). These models consist of nodes and edges, where the nodes represent physical objects, while the edges represent the interactions between them. Unlike regular lattices in which all the nodes have the same coordination number, network models are characterized by a degree distribution , , with a mean degree denoted by . An important distinction is between networks which exhibit a narrow degree distribution (such as the Poisson distribution), and those which exhibit a broad degree distribution, which is typically a power-law distribution of the form . The latter networks are called scale-free networks. They exhibit some highly connected nodes, called hubs, which are essential for the integrity of these networks, and play a dominant role in dynamical processes.

While pairs of adjacent nodes exhibit direct connections, the interactions between most pairs of nodes are mediated by intermediate nodes and edges. A pair of nodes, and , may be connected by many paths of different lengths. However, the distance, , between nodes and , is given by the length of the shortest path between them. The mean distance between all pairs of nodes in a network is denoted by . A central feature of random networks is the small-world property, namely the fact that the mean distance scales like where is the network size Milgram1969 (); Watts1998 (); Chung2002 (); Chung2003 (). Moreover, it was shown that scale-free networks may be ultrasmall depending on the exponent . In particular, for , their mean distance scales like Cohen2003 ().

The distribution of shortest path lengths (DSPL) between all pairs of nodes in a network is a fundamental property of the network structure. The DSPL regulates the temporal evolution of dynamical processes on networks, such as signal propagation Maayan2005 (), navigation Dijkstra1959 (); Delling2009 (); Abraham2013 () and epidemic spreading Satorras2001 (); Satorras2015 (). Properties of the DSPL have been studied in different types of networks Newman2001 (); Blondel2007 (); Dorogotsev2003 (); Hofstad2007 (); Hofstad2008 (); Esker2008 (); Shao2008 (); Shao2009 (). However, in spite of its importance it has not attracted nearly as much attention as the degree distribution.

Recently, an analytical approach was developed for calculating the DSPL Katzav2015 () in the Erdős-Rényi (ER) network, which is the simplest mathematical model of a random network Erdos1959 (); Erdos1960 (); Erdos1961 (). The study of the DSPL was later extended to other network models Nitzan2016 (); Melnik2016 (); Steinbock2017 (). Using recursion equations, analytical results for the DSPL were obtained in different regimes, including sparse and dense networks of small as well as asymptotically large sizes. The resulting distributions were found to be in good agreement with the results of computer simulations.

ER networks are random graphs which exhibit a Poisson degree distribution, with no degree-degree correlations between pairs of adjacent nodes. In fact, ER networks can be considered as a maximum entropy ensemble, under the constraint that the mean degree is fixed. Moreover, the broader class of configuration model networks generates maximum entropy ensembles under conditions in which the entire degree distribution is constrained Newman2010 (); Newman2001 (); Fronczak2004 (); Molloy1995 (); Molloy1998 (). For any given degree distribution, one can produce an ensemble of configuration model networks and perform a statistical analysis of its properties. Therefore, the configuration model provides a powerful platform for the analysis of random networks. It is the ideal model to use as a null model when one tries to analyze an empirical network of which the degree distribution is known. To this end, one constructs configuration model networks of the same size and the same degree distribution as the empirical network. Properties of interest such as the DSPL Giot2003 (), the betweenness centrality Goh2003 () and the abundance of network motifs Milo2002 (); Caldarelli2004 (); Klemm2006 () are compared between the two networks. The discrepancies provide a rigorous test of the systematic features of the empirical network versus the corresponding ensemble of random networks.

In addition to open paths between pairs of distinct nodes, networks may exhibit cycles, namely closed paths which return to their initial nodes. The length of a cycle is given by the number of edges (or nodes) which reside along the cycle. The shortest possible cycle is the triangle, of length . The longest possible cycle is a Hamiltonian cycle of length . Some nodes in a network may not reside on any cycle. Other nodes may reside on one or more cycles. In the latter case, the shortest among these cycles is of particular importance. The shortest cycle on which a given node resides provides the shortest feedback loop for signals originated from that node and the strongest correlations between signals reaching the node via different links. Therefore, the distribution of shortest cycle lengths (DSCL) provides useful information on chemical networks Gleiss2001 (), biological networks Klamt2009 (), feedback processes Zanudo2017 (), oscillations Vladimirov2012 (); Goldental2015 (); Goldental2017 () and synchronization Barrat2012 () in complex networks, as well as for ranking of nodes Kerrebroeck2008 (); Giscard2017 (). Moreover, the partition functions of statistical physics models on networks can be expressed in terms of the combinatorial properties of the cycles, using high temperature expansions and low temperature expansions Yeomans1992 ().

An important class of networks consists of tree networks, in which any pair of nodes is connected by a single path. Thus, in tree networks the shortest path between any pair of nodes is the only path between them and there are no cycles. Tree structures appear in the dilute limit of random networks such as the ER network and the configuration model network, below the percolation transition. Above the percolation transition long cycles start to emerge in the giant cluster. As the network becomes more strongly connected, the size of the giant cluster increases and the cycles become more numerous and shorter.

In this paper we present analytical results for the DSCL in configuration model networks. We first calculate the probability that a random node resides on at least one cycle. We then calculate the DSCL for all the nodes which reside on at least one cycle. We apply this approach to networks with Poisson, degenerate and power-law degree distributions. It is found that the analytical results are in very good agreement with numerical simulations. Using the tail-sum formula we calculate the mean and the variance of the DSCL for these networks.

The paper is organized as follows. In Sec. II we present the configuration model. In Sec. III we consider the percolation transition and the giant cluster in configuration model networks. In Sec. IV we consider properties of the DSPL to be used in the calculation of the DSCL. In Sec. V we present analytical results for the fraction of nodes which reside on at least one cycle. In Sec. VI we present analytical results for the DSCL of configuration model networks, expressed in terms of the degree distributions and the DSPL. In Sec. VII we apply these results to ER networks, regular graphs and scale-free networks. The results are discussed in Sec. VIII and summarized in Sec. IX. In Appendix A we present the short-distance behavior of the DSPL between pairs of nodes of given degrees. In Appendices B, D and E we summarize the properties of the giant clusters in ER networks, random regular graphs and scale-free networks, respectively. In Appendix C we provide some explicit expressions for the probabilities that random nodes of given degrees reside on at least one cycle.

## Ii The configuration model

The configuration model is a maximum entropy ensemble of networks under the condition that the degree distribution is imposed Newman2001 (); Newman2010 (). Here we focus on the case of undirected networks, in which all the edges are bidirectional. To construct such a network of nodes, one can draw the degrees of all nodes from a desired degree distribution, , producing a degree sequence of the form (where must be even). The mean degree over the ensemble of networks is . For brevity, in the rest of the paper we use a more compact notation, in which is replaced by , except for a few places in which the more detailed notation is needed for clarity.

A convenient way to construct a configuration model network is to prepare the nodes such that each node, , is connected to half edges Newman2010 (). Pairs of half edges from different nodes are then chosen randomly and are connected to each other in order to form the network. The result is a network with the desired degree sequence but no correlations. Note that towards the end of the construction the process may get stuck. This may happen in the case in which the only remaining pairs of half edges belong to the same node or to pairs of nodes which are already connected to each other. In such cases one may perform some random reconnections in order to enable completion of the construction.

## Iii The percolation transition and the giant cluster

Configuration model networks generically consist of many connected components. In some cases the size of the largest component scales linearly with the network size, . In such cases, the largest component is called a giant cluster. All the other components are non-extensive and are called finite or isolated components, and below are referred to as non-giant components. The size of the giant cluster is determined by the degree distribution, . Some families of degree distributions can be parametrized such that in a certain range of parameters there is no giant cluster, while in the complementary range there is a giant cluster. On the boundary between these two domains in the parameter space there is a phase transition, which is referred to as a percolation transition.

Consider a configuration model network of nodes with a given degree distribution . In this paper we will employ two different sampling procedures. The degrees of nodes which are sampled randomly from the network follow the overall degree distribution . However, nodes which are sampled as random neighbors of random nodes follow a modified degree distribution, which takes the form

 ˜P(k)=k⟨K⟩P(k). (1)

This is due to the fact that such nodes are selected proportionally to their degrees. Each one of these degree distributions has a generating function associated with it. The generating function of is

 G0(x)=∞∑k=0P(k)xk, (2)

while the generating function of is

 G1(x)=∞∑k=0˜P(k)xk−1. (3)

From the definitions of and in Eqs. (2) and (3), respectively, we find that and . In some networks there are no isolated nodes (of degree ) and no leaf nodes (of degree ). In such networks only for . For these networks we find that and . This implies that in such networks both and are fixed points of both and .

In what follows we review the well known analysis of the percolation probability in configuration model networks, following Refs. Havlin2010 (); Newman2010 (). Our main motivation for doing so is that it allows us to highlight two lesser known facts about the problem, which we will need in our evaluation of the DSCL below. These concern the degree-dependent probabilities of randomly chosen nodes and randomly chosen neighbors of randomly chosen nodes to belong to the giant cluster. The probability that a random node resides on the giant cluster is denoted by . In the case in which a giant cluster exists, , while in the case in which there is no giant cluster, . To obtain the probability , one needs to first calculate the probability that a random neighbor of a random node, , belongs to the giant cluster in the reduced network, which does not include the node . In the thermodynamic limit, , the probability is given as a solution of the self-consistency equation Havlin2010 ()

 1−~g=G1(1−~g). (4)

The left hand side of this equation is the probability that a random neighbor of a random node does not reside on the giant cluster. The right hand side represents the same quantity in terms of its neighbors, namely as the probability that none of the neighbors of such node resides on the giant cluster. Once is known, the probability can be obtained from

 g=1−G0(1−~g). (5)

This relation is based on the same consideration as Eq. (4), where the difference is that the reference node is a random node rather than a random neighbor of a random node.

Below we consider the more specific case of nodes of a given degree. The probability that a random node of a given degree, , resides on the giant cluster is denoted by . Using the degree distribution, , the probability, , that a random node of an unspecified degree resides on the giant cluster can be expressed in terms of by

 g=∞∑k=0P(k)gk. (6)

Such a node resides on the giant cluster if at least one of its neighbors resides on the giant cluster. Therefore,

 gk=1−(1−~g)k. (7)

Thus, high degree nodes are more likely to reside on the giant cluster than low degree nodes. Similarly, the probability that a random neighbor of a random node resides on the giant cluster can be expressed in the form

 ~g=∞∑k=0˜P(k)~gk, (8)

where is the probability that a random neighbor of a random node resides on the giant cluster, under the condition that its degree is . Using similar considerations, we find that the probability is given by

 ~gk=1−(1−~g)k−1. (9)

In Appendices B, D and E we apply these considerations to the analysis of the giant clusters in ER networks, random regular graphs and scale-free networks, respectively.

## Iv The distribution of shortest path lengths

Consider a pair of random nodes, and , in a network of nodes. Assuming that the two nodes reside on the same connected component, they may be connected to each other by a large number of paths. The distance between the two nodes is equal to the length of the shortest among these paths (possibly more than one). Below we briefly review the approach introduced in Ref. Nitzan2016 () for the calculation of the DSPL in configuration model networks of a given size, , and a given degree distribution, . The DSPL can be expressed in the form of a tail distribution, where is the probability that the shortest path length between a random pair of nodes is larger than . The tail distribution can be expressed as a product of the form

 PPL(L>ℓ)=ℓ∏ℓ′=1PPL(L>ℓ′|L>ℓ′−1), (10)

where is the conditional probability that the distance between a random pair of nodes is larger than conditioned on it being larger than . In the analysis below we use different types of tail distributions for the DSPL. In Table I we summarize these distributions and list the equations from which each one of them can be evaluated.

A path of length from node to node can be decomposed into a single edge connecting node and node (where is the set of all nodes directly connected to ), and a shorter path of length connecting and . Thus, the existence of a path of length between nodes and can be ruled out if there is no path of length between any of the nodes , and . For sufficiently large networks, the argument presented above translates into the recursion equation Nitzan2016 ()

 PPL(L>ℓ|L>ℓ−1)=G0[˜PPL(L>ℓ−1|L>ℓ−2)], (11)

where the generating function is given by Eq. (2). Here we distinguish between the conditional probability between nodes and and the probability between a node and node , on the reduced network from which node was removed. The reason for this distinction is that the former probability involves two random nodes, while the latter probability involves a node, , which is a random neighbor of a random node, and a random node, . The conditional probability satisfies the recursion equation

 ˜PPL(L>ℓ|L>ℓ−1)=G1[˜PPL(L>ℓ−1|L>ℓ−2)], (12)

where is given by Eq. (3), which is valid for .

The case of deserves special attention. On a network of size (sufficiently large), the probability that two random nodes are not connected is given by Nitzan2016 ()

 PPL(L>1|L>0)≃1−⟨K⟩N−1+O(1N2), (13)

while the probability that a random neighbor of a random node and a random node are not connected is given by

 ˜PPL(L>1|L>0)≃1−⟨K2⟩−⟨K⟩⟨K⟩(N−1)+O(1N2). (14)

The difference between Eqs. (13) and (14) is due to the fact that the degree distribution of random neighbors of random nodes, given by Eq. (1), is generically distinct from the degree distribution of random nodes.

Actually, there are two other types of DSPLs in random networks, which are needed for the analysis of shortest cycles. One of them is the DSPL between a random node and a random neighbor of a random node, denoted by . The other one is the DSPL between two random neighbors of random nodes, denoted by . The DSPL between a random node and a random neighbor of a random node is expressed as a product of the form

 ˜PPL(L>ℓ)=ℓ∏ℓ′=1˜PPL(L>ℓ′|L>ℓ′−1), (15)

where is obtained by iterating Eq. (12), using Eq. (14) as an initial condition.

The DSPL between two random neighbors of random nodes, , requires a careful attention. The initial condition in this case, namely the probability that two such nodes are not connected on a network of size is

 ˆPPL(L>1|L>0)=∞∑k=0˜P(k)∞∑k′=0˜P(k′)[1−k′−1(N−1)⟨K⟩]k−1. (16)

Using a binomial approximation and performing the summations, we obtain

 ˆPPL(L>1|L>0)=1−⟨K⟩N−1(⟨K2⟩−⟨K⟩⟨K⟩2)2+O(1N2). (17)

This initial condition is fed into the recursion equation

 ˆPPL(L>ℓ|L>ℓ−1)=G1[ˆPPL(L>ℓ−1|L>ℓ−2)]. (18)

The DSPL between random neighbors of random nodes is then obtained as a product of the conditional probabilities:

 ˆPPL(L>ℓ)=ℓ∏ℓ′=1ˆPPL(L>ℓ′|L>ℓ′−1). (19)

In the analysis above, we considered only pairs of nodes which reside on the same cluster. Since not all pairs of random nodes reside on the same cluster, the DSPL needs to be adjusted. Taking a random pair of nodes, and , the probability that they reside on the same cluster is negligible, unless they both reside on the giant cluster. The probability that both nodes reside on the giant cluster is . Therefore, the probability that the distance between them is infinite is . This implies that the DSPL between all pairs of nodes in the network (without assuming that they reside on the same cluster) is

 QPL(L>ℓ)=g2PPL(L>ℓ)+(1−g2). (20)

Using a similar argument for the DSPL between a random node and a random neighbor of a random node, we find that the DSPL between all such pairs is given by

 ˜QPL(L>ℓ)=g~g˜PPL(L>ℓ)+(1−g~g). (21)

Similarly, the DSPL between all pairs of random neighbors of random nodes is given by

 ˆQPL(L>ℓ)=~g2ˆPPL(L>ℓ)+(1−~g2). (22)

In cases where , the overall DSPLs, , and , approach a non-zero asymptotic value at large , unlike the original DSPLs, , and , which decay to zero.

To obtain the DSPL between random pairs of nodes of known degrees, consider two random nodes, and , of degrees and , respectively, which do not share any neighbors and thus the distance between them satisfies . Since node has neighbors and node has neighbors, the probability that the distance between them is longer than is equal to the probability that the distance between any neighbor of to any neighbor of is longer than . Therefore,

 PPL(L>ℓ|k,k′)=[ˆPPL(L>ℓ−2)]kk′, (23)

where is the DSPL between two random neighbors of random nodes, given by Eq. (19). Similarly, the DSPL between a random node, of degree , and a random neighbor of a random node, of degree , is given by

 ˜PPL(L>ℓ|k,k′)=[ˆPPL(L>ℓ−2)]k(k′−1). (24)

The DSPL between pairs of random neighbors of random nodes, under the condition that their degrees are and , is given by

 ˆPPL(L>ℓ|k,k′)=[ˆPPL(L>ℓ−2)](k−1)(k′−1). (25)

It is important to note that Eqs. (23)-(25) are valid for . The corresponding equations for the conditional probabilities , and , with are presented in Appendix A.

Using the results presented above we now provide the overall DSPLs, between random pairs of nodes of known degrees. Considering two random nodes of degrees and , we obtain

 QPL(L>ℓ|k,k′)=gkgk′PPL(L>ℓ|k,k′)+(1−gkgk′), (26)

where is given by Eq. (7). Similarly, the DSPL between a random node of degree , and a random neighbor of a random node, of degree is given by

 ˜QPL(L>ℓ|k,k′)=gk~gk′˜PPL(L>ℓ|k,k′)+(1−gk~gk′), (27)

where is given by Eq. (9). Lastly, the DSPL between pairs of random neighbors of random nodes, conditioned on their degrees, and , is given by

 ˆQPL(L>ℓ|k,k′)=~gk~gk′ˆPPL(L>ℓ|k,k′)+(1−~gk~gk′). (28)

The moments of provide useful information about the network. The moment, , can be obtained using the tail-sum formula Pitman1993 ()

 ⟨Ln⟩PL=N−2∑ℓ=0[(ℓ+1)n−ℓn]PPL(L>ℓ). (29)

Note that the sum in Eq. (29) does not extend to because the longest possible shortest path in a network of size is . The mean distance in configuration model networks has been studied extensively Newman2001 (); Chung2002 (); Chung2003 (); Hofstad2007 (); Esker2008 (). It was found that

 ⟨L⟩PL≃lnNln(⟨K2⟩−⟨K⟩⟨K⟩)+O(1). (30)

The width of the distribution can be characterized by the variance .

## V The fraction of nodes which reside on at least one cycle

In this section we calculate the probability that a random node, , resides on at least one cycle. To do so, we first calculate the conditional probability, , that a node of a given degree, , resides on at least one cycle. Actually, this probability can be expressed by . Clearly, nodes of degree or cannot reside on any cycle and thus

 P(i∉cycle|0)=P(i∉cycle|1)=1. (31)

For a node of degree to reside on a cycle, two of its neighbors must be connected to each other on the reduced network from which is removed. The probability that a neighbor of a random node , on the reduced network from which is removed, is part of the giant cluster of the reduced network is equal to . The probability that a given pair of neighbors will reside on the giant cluster of the reduced network is . This pair of neighbors will reside on the same component only if this component is the giant cluster (up to negligible probability). Hence, the probability that a given pair of neighbors of is not connected is . Since there are pairs of neighbors of node , the probability that none of these pairs are connected on the reduced network from which node is removed is

 P(i∉cycle|k)=(1−~g2)(k2). (32)

Note that this result is based on the assumption that the paths between different pairs of neighbors of are independent. This assumption is expected to hold in ensembles of uncorrelated networks, such as the configuration model or any other network model in which the clustering coefficient is small.

Using the arguments discussed above we find that the probability that a random node of unspecified degree resides on at least one cycle is given by

 P(i∈cycle)=1−[P(K=0)+P(K=1)+∞∑k=2P(k)P(i∉cycle|k)], (33)

where is given by Eq. (32). Note that in the case of one can show, using Eq. (5) that as well, meaning that there is no giant cluster. Eq. (33) shows that under these conditions . This reflects the fact that for a network below the percolation threshold, in the thermodynamic limit, the number of cycles does not scale with Takacs1988 (); Bollobas2001 (). Thus, essentially all the components are trees.

One should point out that Eq. (32) does not take into account certain correlations between pairs of neighbors of node . To demonstrate this point, consider the neighbors of node . We will denote their degrees by . These degrees are independent of each other and are all drawn from the same distribution, . However, the probability that a node, , resides on the giant cluster depends on its degree, , and is given by [Eq. (7)]. Since each neighbor of participates in such pairs, the probabilities that different pairs reside on the giant cluster are not independent. Each one of these nodes, may connect to each of the other neighbors, with a probability which depends on its degree. Therefore, these probabilities are not independent, unlike the assumption made in Eq. (32). To account for these correlations we express in the form

 P(i∉cycle|k)=∑k1,k2,…,kkk∏r=1˜P(kr)∏m

where the product runs over all pairs of neighbors of node .

In summary, we have presented two approaches to the calculation of . The simpler approach of Eq. (32) provides a good approximation in most cases. For highly heterogeneous networks one may need the more detailed approach of Eq. (34), which is much more elaborate to implement. More specifically, it requires summation over all possible degree sequences of length , which becomes prohibitive when is large.

## Vi The distribution of shortest cycle lengths

Consider a random node, , in a configuration model network of size with degree distribution . A node of degree may reside on one or more cycles. Here we focus on the shortest among these cycles. More specifically, we calculate the distribution of lengths of the shortest cycles on which a random node of degree resides. We denote the neighbors of node by . A cycle of length on which resides, consists of the edges connecting to two of its neighbors, and and a path of length connecting and . The number of possible shortest cycles is , namely the number of pairs of neighbors of . In Fig. 1 we present an illustration of the cycles on which a random reference node (black filled circle) resides. This node has neighbors (empty circles). The edges between the reference node and its neighbors are shown by dashed lines. The paths connecting pairs of neighbors are shown by solid lines. The shortest among thse paths is shown by a thick solid line (blue) of length , thus the shortest cycle on which the reference node resides is of length . The other paths, of lengths and are shown by thin solid lines (red).

The tail distribution of the lengths of shortest cycles on which random nodes of degree reside is denoted by . In order that the shortest cycle will be longer than , the distances between all pairs of neighbors must satisfy . Therefore

 PCL(L>ℓ|k)=ˆQPL(L>ℓ−2)(k2), (35)

where is given by Eq. (22). This equation is based on the assumption that the distances between all pairs of neighbors of node are independent of each other. This assumption is expected to be satisfied in configuration model networks. Note that nodes of degrees and do not reside on any cycle, and thus for any value of .

For a random node, , of unknown degree, the DSCL is obtained by averaging over all possible degrees according to

 PCL(L>ℓ)=∞∑k=0P(k)PCL(L>ℓ|k). (36)

Writing this equation in a more explicit form, we obtain

 PCL(L>ℓ)=P(K=0)+P(K=1)+∞∑k=2P(k)ˆQPL(L>ℓ−2)(k2). (37)

This equation is expected to provide an accurate description of the DSCL of configuration model networks, when the degree distribution is not too broad. The corresponding probability distribution function, can be easily obtained by

 PCL(L=ℓ)=PCL(L>ℓ−1)−PCL(L>ℓ). (38)

Similarly to the discussion of in the previous section, to obtain more accurate results for in a network which exhibits a broad degree distribution, one needs to take into account the heterogeneity of the network. Consider the first shell around the random node, , which consists of the nodes , of degrees . The distribution of shortest path lengths between a pair of neighbors, and depends on their degrees, and . Therefore, in this analysis one should use the conditional probabilities . The shortest cycle on which resides, consists of the shortest path among all the paths connecting the pairs of neighbors of . Since each neighbor, such as , of degree , participates in such pairs, these conditional distributions are not independent. Thus, one should properly condition on the degrees of pairs of neighbors. Implementing these considerations, one can replace Eq. (35) by

 PCL(L>ℓ|k)=∑k1,k2,…,kkk∏r=1˜P(kr)∏mℓ−2|km,kn) (39)

where is given by Eq. (28). Actually, for this equation coincides with Eq. (34). This is due to the fact that the maximal length of a cycle is . Hence, the probability that the length of the shortest cycle is larger than is equivalent to the probability that there is no cycle. Plugging Eq. (39) into Eq. (36), we obtain a more accurate expression for the DSCL.

In practice, for networks with broad degree distributions, the summation over the whole range of values of and may be impractical. In such cases, one can evaluate Eq. (39) using Monte Carlo methods Newman1999 (). The simplest approach is to draw the degree from the distribution and then draw the degree from the distribution . One then calculates for all the combinations of degrees, and , and multiplies them to obtain one data point for . In Fig. 2 we present flow charts illustrating the sequence of intermediate steps in the calculation of the DSCL. The simpler approach of Eq. (35) is shown in Fig. 2(a) and the more detailed approach of Eq. (39) is shown in Fig. 2(b).

The mean of the DSCL is given by the first moment

 ⟨L⟩CL=N−1∑ℓ=2PCL(L>ℓ). (40)

The variance of the DSCL is given by

 σ2CL=⟨L2⟩CL−⟨L⟩2CL, (41)

where

 ⟨L2⟩CL=N−1∑ℓ=2(2ℓ+1)PCL(L>ℓ). (42)

Similarly, higher order moments can be obtained using the tail-sum formula, as in Eq. (29).

## Vii Applications to specific network models

Here we apply the approach presented above for the calculation of the DSCL in three examples of configuration model networks, namely ER networks, random regular graphs and scale-free networks.

### vii.1 Erdős-Rényi networks

The Erdős- Rényi (ER) network is the simplest kind of a random network, and a special case of the configuration model, in which only the mean degree, , is constrained. ER networks can be constructed by independently connecting each pair of nodes with probability . In the thermodynamic limit the resulting degree distribution follows a Poisson distribution of the form

 P(k)=e−cckk!. (43)

In Appendix B we briefly summarize the properties of the giant cluster of the ER network and present a closed form expression for as a function of . More generally, in ER networks there is no distinction between the statistical properties of a random node and a random neighbor of a random node. As a result, and the different DSPLs are identical, namely . Similarly, for the overall DSPLs we obtain . Inserting the degree distribution of Eq. (43) into the generating functions and in Eqs. (11) and (12), respectively, one obtains the conditional probabilities . Inserting them into Eq. (10), one obtains the tail DSPL between pairs of nodes which reside on the same cluster, denoted by . This DSPL essentially accounts only for pairs of nodes which both reside on the giant cluster, because for a pair of nodes on the non-giant components it is extremely unlikely that they reside on the same non-giant component. In order to obtain the overall DSPL between all pairs of nodes, one needs to adjust the results for the fraction of pairs of nodes in which both of them reside on the giant cluster, which is given by . Inserting the probability into Eq. (20), one obtains the overall DSPL, .

In Fig. 3 we present the tail distributions , for ER networks of nodes, with mean degree (a), (b) and (c). The analytical results (solid lines), obtained from Eq. (20), are found to be in very good agreement with the results of computer simulations (circles). The tail distributions exhibit the characteristic shape of a monotonically decreasing sigmoid function between two plateaus. Their inflection points coincide with the peaks of the corresponding probability distribution functions. The tail distributions exhibit non-zero asymptotic values at large distances, which account for the probability that two randomly selected nodes do not reside on the same cluster, and thus the distance between them is . As is increased, the inflection point shifts to the left, which means that distances in the network become shorter. This can be understood in the framework of small-world theory, where the mean distance is given by . Concurrently, the asymptotic value of decreases, due to the increasing size of the giant cluster.

Using Eqs. (32) and (33), the probability that a random node in an ER network resides on at least one cycle can be expressed in the form

 P(i∈cycle)=1−∞∑k=0e−cckk!(1−g2)k(k−1)/2, (44)

where is given by Eq. (67). In Fig. 4 we present the probability as a function of the mean degree, , for ER networks of nodes. The analytical results (solid lines), obtained from Eq. (44), are found to be in very good agreement with the results of computer simulation (circles). It is found that for there are no cycles and thus . As is increased above , the probability increases sharply.

To obtain more accurate results, we consider a random node of a given degree, , and express the probability that it resides on at least one cycle in the form

 P(i∈cycle|k)=1−∑k1,k2,…,kkk∏r=1˜P(kr)∏m

In the case of an ER network, where is a Poisson distribution, and , where is the degree of the neighbor of node on the reduced network from which was removed. Therefore, in the case of an ER network

 P(i∈cycle|k)=1−∑k1,k2,…,kkk∏r=1P(kr)∏m

The evaluation of this product requires moments of , which can be expressed in a closed form as

 ⟨gnk⟩=n∑r=0(nr)(−1)re−c[1−(1−g)r]. (47)

The two lowest order moments are

 ⟨gk⟩=1−e−cg=g, (48)

and

 ⟨g2k⟩=1−2e−cg+e−cg(2−g)=g2+(1−g)2(ecg2−1). (49)

Inserting these moments into Eq. (46) we find that the probability that a node of degree resides on at least one cycle is

 P(i∈cycle|K=2)=g2. (50)

Incidentally, this result coincides with the simpler form which comes from Eq. (32). For nodes of degree

 P(i∈cycle|K=3)=3g2−3g2⟨g2k⟩+⟨g2k⟩3. (51)

At this order the result already deviates from those obtained from the simpler approach of Eq. (32). Analytical expressions for with and are presented in Appendix C.

In Fig. 5 we present the conditional probability , that a random node of degree resides on at least one cycle as a function of the mean degree , for (a), (b) and (c). The analytical results obtained from the simpler approach of Eq. (32) are shown in dashed lines. The analytical results obtained from the more detailed approach of Eq. (34) are given explicitly in Eqs. (50), (51), (68) and (69). These results are shown in solid lines. Incidentally, the two analytical curves coincide for , while for and , the more detailed theory is found to be in a better agreement with the results of computer simulations (circles).

The DSCL of an ER network is given by

 PCL(L>ℓ)=(1+c)e−c+∞∑k=2e−cckk!QPL(L>ℓ−2)(k2). (52)

In Fig. 6 we present the tail distributions for ER networks of nodes, where (a), (b) and (c). The analytical results (solid lines), obtained from Eq. (52), are in good agreement with computer simulations (circles). The tail distribution exhibits a monotonically decreasing sigmoid shape from the plateau on the left to on the right, since the height of the second plateau represents the fraction of nodes which do not reside on any cycle. This fraction decreases as the mean degree, , is increased, namely the probability that a random node resides on at least one cycle increases as is increased. The peak of the corresponding probability distribution function, , shifts to the left as is increased. These results imply that as the network becomes more strongly connected the shortest cycles become more numerous and shorter.

In Fig. 7 we present the conditional tail distribution for an ER network of nodes and , where (a), (b) and (c). The analytical results obtained from the simpler approach of Eq. (35) are shown in dashed lines, while the analytical results obtained from the more detailed approach of Eq. (39) are shown in solid lines. The two analytical curves are almost indistinguishable for , and are both in very good agreement with the results of computer simulations (circles). For and , the more detailed theory provides a better agreement with the results of computer simulations (circles).

The conditional tail distribution retains the qualitative features of the sigmoid shape. The asymptotic value at large is , which decreases as is increased, which means that the probability that a random node of degree resides on at least one cycle increases as is increased. The peak of the corresponding probability distribution function, , shifts to the left as is increased, which means that for node of higher degree the shortest cycle is shorter.

The probability that a random node, , of degree resides on at least one cycle is a monotonically increasing function of . The length of the shortest cycle tends to decrease as a function of . This is due to the fact that the length of the shortest cycle is determined by the shortest path among all the paths connecting neighbors of , and the number of such pairs increases quadratically with .

In Fig. 8 we present analytical results for the mean, , of the DSCL as a function of the mean degree, , for ER networks of nodes (solid line). The results are in very good agreement with computer simulations (circles). The mean, is a monotonically decreasing function of . It exhibits a sharp decrease in the dilute network limit, which becomes more moderate as the network becomes more dense. For comparison, we also present analytical (dashed line) and numerical () results for the mean, , of the DSPL as a function of (dashed line). It is found that for the entire range of values of , the mean of the DSCL is slightly larger than the mean of the corresponding DSPL. This can be understood as follows. The length of the shortest cycle on which a random node, , resides, consists of the shortest path between a pair of its neighbors, plus for the two edges connecting to these neighbors. This suggests that should be longer by about two units than . However, the shortest path between neighbors of which is incorporated in the shortest cycle is the shortest among the shortest paths connecting all pairs of neighbors of . Thus, it tends to be shorter than the path between two random nodes. As a result, the difference is smaller than .

In Fig. 9 we present the standard deviation of the DSCL, as a function of the mean degree, , for ER networks of nodes. For small values of , the analytical results (solid line) under-estimate the standard deviation, as can be seen from the comparison with the results of computer simulations (circles). We also show the analytical (dashed line) and numerical () results for the standard deviation of the DSPL, , for the same networks, which exhibits the same qualitative features.

### vii.2 Random regular graphs

In a random regular graph with the giant cluster encompasses the whole network. Therefore, (for more details see Appendix D). Moreover, in this case the DSPLs and the overall DSPLs are identical since all pairs of nodes reside on the giant cluster. The generating functions for the random regular graph are given by Eqs. (72) and (73). The DSCL is given by

 PCL(L>ℓ)=[ˆPPL(L>ℓ−2)](c2). (53)

In order to proceed we shall first calculate the conditional probabilities using the recursion equation (18) and the initial condition (17). This yields

 ˆPPL(L>ℓ|L>ℓ−1)=[1−(c−1)2(N−1)c](c−1)ℓ−1. (54)

Assuming that the size of the network is large , we can approximate the above to

 lnˆPPL(L>ℓ|L>ℓ−1)≃−(c−1)ℓ+1cN+O(1N2). (55)

By inserting the conditional distribution into Eq. (19) we can obtain the tail distribution

 ˆPPL(L>ℓ)≃exp[−(c−1)2cN(c−1)ℓ−1c−2]. (56)

We can use this DSPL inside Eq. (53), to get

 PCL(L>ℓ)≃exp[−(c−1)32N(c−1)ℓ−2−1c−2]. (57)

In Fig. 10 we present the DSCL for random regular graphs of nodes with (a), (b) and (c). The analytical results (solid lines), obtained from Eq. (57), are found to be in excellent agreement with the results of computer simulations (circles). Since Eq. (57) is based on exact results for the DSPL, we conjecture that it is an exact result for the DSCL of the random regular graph.

### vii.3 Scale free networks

Consider a configuration model network with a power-law degree distribution, , given by

 P(k)=k−γζ(γ,kmin)−ζ(γ,kmax+1), (58)

where and is the Hurwitz zeta function Olver2010 (). Here we focus on the case in which , in which the mean degree is bounded even for . We further restrict our analysis to the case in which , namely the network does not include isolated nodes and leaf nodes. Under these conditions , namely the giant cluster encompasses the entire network (for more details see Appendix E). As a result, . Thus, the DSCL can be expressed in the form

 PCL(L>ℓ)=∞∑k=2P(k)ˆPPL(L>ℓ−2)(k2), (59)

where is calculated using Eqs. (17)-(19), where and are given by Eqs. (74) and (75), respectively.

In Fig. 11 we present the tail distribution , for a configuration model network of nodes and a power-law degree distribution with and (a), (b) and (c). The analytical results obtained from the simpler approach of Eq. (35) are shown in dashed lines, while the analytical results obtained from the more detailed approach of Eq. (39) are shown in solid lines. The results of the more detailed approach were obtained from Monte Carlo samplings of the degrees . Both results are found to be in very good agreement with the results of computer simulations (circles), except for one data point of the simpler approach, for , at , which is significantly lower than the simulation result. It is observed that as is increased, the distances in the network become shorter.

## Viii Discussion

An important distinction in network theory is between networks which exhibit a tree structure and networks which include cycles. In network growth models, the existence of cycles is determined by the growth rules of the network. For example, in the Barabási-Albert model Barabasi1999 (); Albert2002 (), the existence of cycles depends on the number of nodes, , which are added at each time step. In the case in which , the model gives rise to a stochastic tree structure Drmota1997 (); Drmota2005 (), while for it forms cycles.

In equilibrium networks such as configuration model networks, one can distinguish between three situations, which are determined by the degree distribution . In the sub-percolation regime of dilute networks, the network consists of finite tree components, whose size does not scale with . In this regime, the number of cycles does not scale with . Above percolation, the network consists of a giant cluster, which includes cycles, in addition to many finite components. As the network becomes denser, the number of cycles increases and their typical length becomes shorter. In the regime of dense networks, the giant cluster encompasses the entire network and there are many short cycles.

The degree distribution plays a crucial role in shaping the properties of cycles in a network. In particular, isolated nodes (of degree ) and leaf nodes (of degree ) cannot reside on any cycle. Only nodes of degrees may reside on a cycle. Still, some nodes of degrees do not reside on any cycle. Instead, they reside on a tree component which can be either isolated or connected to the giant cluster.

There are interesting connections between the DSCL and the DSPL of a configuration model network. For a random node, , the cycles on which it resides consist of paths between pairs of neighbors of and two edges from to these neighbors. The shortest cycle length is thus given by the shortest path between all pairs of neighbors of plus . A naive expectation would thus be that the shortest cycles are longer than the shortest paths by units. From Fig. 8 we observe that the mean cycle length is longer than the mean path length by about one unit over a broad range of values of in the ER network. To understand this point, we recall that the shortest cycle on which a random node of degree resides, is composed of the shortest path among all the paths connecting pairs of neighbors of . Another issue is the fact that the degrees of the neighbors of are not uniformly sampled from but from . The mean path length between pairs of neighbors of is given by , while the mean path length between pairs of random nodes is given by . Clearly, the path lengths between nodes of higher degrees are shorter than between nodes of lower degrees, as can be seen from Eqs. (23)-(25). It is thus interesting to compare the mean degrees of and . The former is given by while the latter is . In our context, the effective degree of a neighbor of a random node is given by the connective constant , where the edge connecting and its neighbor is removed. It turns out that may be larger than, equal to or smaller than in different network ensembles. In the case of the ER ensemble, a special symmetry gives rise to . In the random regular graph, it turns out that and thus . In configuration models with a power-law degree distribution and , the moment diverges and thus . In those cases in which , the mean distance between neighbors of is smaller than the mean distance between random nodes, and vice versa. Therefore, the difference between the mean of the DSCL and the mean of the DSPL is determined by a combination of these conflicting effects.

The results presented above have implications for the stability of configuration model networks to node deletion processes due to failures or attacks. In particular, if a node of degree , which does not reside on any cycle, is removed, the network breaks down to separate components. Thus, nodes of degree which do not reside on any cycle are articulation points Tian2017 ().

In this paper we have studied configuration model networks in which the DSCL is completely determined by the degree distribution . Recently, other network ensembles were introduced, which include many short cycles, where the cycle lengths are controlled by various constraints Roberts2014 (); Coolen2016 (). It would be interesting to generalize the calculation of the DSCL to such networks.

Knowing the properties of cycles is important for the study of many dynamical processes on networks. For example, shortest cycles provide the fastest feedback paths in the network and introduce correlations between the signals arriving at a given node via different links. It was found that in neural circuits the lengths of the shortest cycles determine the frequencies of broadband spontaneous macroscopic neural oscillations Vladimirov2012 (); Goldental2015 (); Goldental2017 (). In a broader context, feedback processes are affected by the entire spectrum of cycle lengths, up to the longest possible length of the Hamiltonian cycles. The number of cycles of a given length was studied extensively in Refs. Marinari2004 (); Bianconi2005 (); Marinari2006a (); Marinari2006b (); Klemm2006 (); Noh2008 ().

In the context of network control theory, it was shown that dynamical processes on complex networks can be identified and controlled by a small set of ’determining nodes’, which can be identified from the network structure alone, regardless of the specific properties of the dynamical process. Moreover, this set must include at least one node from each one of the feedback loops in the network Fiedler2013 (); Mochizuki2013 (). This approach was recently applied Zanudo2017 (). to the analysis of real biological, technological and social networks, providing predictions for the set of nodes whose control can push the network dynamics towards any desired asymptotic state (fixed point, cycle or limit cycle).

Analytical techniques for treating spin models on networks are mostly exact on tree structures. Utilizing the local tree structure of random networks, they provide accurate results for short range properties. However, in order to obtain insight about collective and long range correlations, one needs to take into account the large scale structure, which notably involves the statistics of loops as done recently in Refs. Montanari2005 (); Altieri2017 ().

## Ix Summary

We presented an analytical approach for the calculation of the distribution of shortest cycle lengths in configuration model networks. This approach is based on a fundamental relation between the distribution of shortest cycle lengths and the distribution of shortest path lengths in such networks. It employs an analytical approach for the calculation of the distribution of shortest path lengths, presented in Ref. Nitzan2016 (). We use this approach for the calculation of the DSCL in Erdős-Rényi networks, random regular graphs and scale-free configuration model networks, and obtain very good agreement with the results of computer simulations. The mean and standard deviation of the DSCL in these networks are also calculated. We also obtain a closed form expression for the fraction of nodes which do not reside on any cycle. While in this paper we have focused on the case of undirected networks, cycles are known to be important also in directed networks, in contexts such as gene regulation networks, neural networks and food webs Johnson2017 (). It would thus be interesting to study the DSCL on directed networks. Another interesting direction is the study of properties of long cycles Marinari2007 (). In this context, an open question is the distribution of longest cycle lengths on random networks.

## Appendix A The conditional DSPL for short distances

The conditional DSPLs presented in Eqs. (23)-(25) apply for the case in which . Here we provide the expressions for the conditional DSPLs for the special cases of and . Starting from , the probability that two random nodes of degrees and are not connected to each other is given by

 PPL(L>1|k,k′)=1−kk′(N−1)⟨K⟩. (60)

Similarly, when the node of degree is selected as a random neighbor of a random node, while the node of degree is a random node, one obtains

 ˜PPL(L>1|k,k′)=1−(k−1)k′(N−2)⟨K⟩. (61)

Finally, in the case in which both nodes are selected as random neighbors of random nodes, one obtains

 ˆPPL(L>1|k,k′)=1−(k−1)(k′−1)(N−3)⟨K⟩. (62)

Proceeding to , one first evaluates the conditional probability , which is given by

 PPL(L>2|L>1;k,k′)=[1−∞∑k′′=0P(k′′)kk′k′′(k′′−1)N(N−1)⟨K⟩2]N−2. (63)

Carrying out the summation and multiplying by , we obtain

 PPL(L>2|k,k′)=1−⟨K2⟩kk′N⟨K⟩2+O(1N2), (64)

which is valid under the assumption that is finite. Using similar considerations, one can show that

 ˜PPL(L>2|k,k′)=1−⟨K2⟩(k−1)k′N⟨K⟩2+O(1N2) (65)

and

 ˆPPL(L>2|k,k′)=1−⟨K2⟩(k−1)(k′−1)N⟨K⟩2+O(1N2). (66)

## Appendix B The giant cluster in Erdős-Rényi networks

In the asymptotic limit the ER network exhibits a percolation transition at , such that for the network consists only of finite components while for there is a giant cluster. At a higher value of the connectivity, namely at , there is a second transition, above which the giant cluster encompasses the entire network and there are no non-giant components. We denote the probability that a randomly selected node belongs to the giant cluster by