Correlations in connected random graphs
We study the properties of the giant connected component in random
graphs with arbitrary degree distribution. We concentrate on the
degree-degree correlations. We show that the adjoining nodes in the
giant connected component are correlated and derive analytic
formulas for the joint nearest-neighbor degree probability
distribution. Using those results we describe correlations in
maximal entropy connected random graphs. We show that connected
graphs are disassortative and that correlations are strongly related
to the presence of one-degree nodes (leaves). We propose an
efficient algorithm for generating connected random graphs. We
illustrate our results with several examples.
Published in: Phys. Rev. E 77, 036124 (2008). PACS numbers: 89.75.Hc, 05.10.–a, 05.90.+m
In the last decade or so, there has been a great increase of interest in the theory of random graphs and networks (in the following we will use those two terms interchangeably). While in principle this is a branch of mathematics, much of this effort was fueled by the availability of “experimental” data on real graphs (see review () for review). These data are compared to the predictions of various random graphs models. Probably the best known and simplest example of such reference models is the ensemble of all labeled graphs with vertices and links (without multiple- and self-links), chosen with uniform probability. We will call this model Erdös-Rényi (ER) graphs after the authors, who were the first to introduce and study them ErdosRenyi ().
The ER ensemble is the simplest example of the so-called “maximally random” graphs. Intuitively those are the ensembles where the distributions of vertices and links joining them are “as random as possible” for a given set of constraints. In the case of ER graphs the only constraints are the fixed number of links and vertices. The “maximal randomness” can be formalized using the notion of entropy (see next section). The maximally random ensembles serve as null hypothesis. For example, it was the deviation of data collected on the World Wide Web (WWW) graph from the predictions of the ER model that triggered the interest in random networks nature (), because it implied that those graphs were not created just by joining vertices at random, but required the existence of another mechanism preferential ().
A popular generalization of the ER ensemble are graphs with a given degree distribution (degree of a node is the number of links attached to it) MolloyReed (); Newman2001 (); BCK (); BauerBernard (); BurdaKrzywicki (); fronczak (). One feature of those ensembles is the absence of correlations between neighboring nodes’ degrees, at least for degree distributions without heavy tails (see the discussion in Sec. IV.3). The object of our study was to find what happens when we constrain to connected graphs only. A simple argument indicated that correlations would appear: a neighbor of a node with degree one (leaf) must have its degree greater than 1; otherwise, they would form a separate connected component. Similarly, all neighbors of a node cannot have their degree equal to 1, as such a “hedgehog” would also form a separate connected component pb1 (); oles (). This obviously leads to correlations. It is not clear, however, how strong they are and if they survive the large- limit. We have already studied those correlations numerically in Ref. oles () and found that they also appear in large graphs. In this paper we derive the analytic formulas describing them. We also found a strong indication that the described mechanism is the only one responsible for the correlation in maximally random connected graphs: when we forbid vertices with degree 1 correlations disappear.
Connectivity is a nonlocal constraint hard to deal with. To study the properties of connected graphs we use another feature of maximally random graphs with a given degree distribution: the appearance of a connected component that includes a finite fraction of all the vertices (and links). From the properties of this giant connected component we can infer the properties of connected graphs.
The paper is organized as follows: Section II introduces some basic definitions concerning random graphs. In Sec. III we present the method of generating functions used to study the properties of the giant connected component in random graphs with arbitrary degree distribution Newman2001 (). Then we calculate degree-degree correlations in the giant component. Section IV contains some examples where we compare our predictions with the results of Monte Carlo (MC) simulations. Finally, we show in Sec. V how to relate connected random graphs to giant connected components in other ensembles. In Sec. VI we address the situation when correlations in random graphs are suppressed by the absence of vertices with degree one (leaves). The paper is summarized in Sec. VII.
Ii Random Graphs
ii.1 Average degree
Formally we consider random graphs as an ensemble of graphs with probability assigned to every graph . Using this definition we introduce the entropy of the ensemble:
The maximally random ensembles described in the previous section are those which for given constraints have maximal entropy.
Denoting by some property of graph we can calculate its average over the whole ensemble:
The most widely studied example is the probability distribution of node degrees:
where is the number of vertices with degree and is the total number of vertices in graph (in the following we will often omit the argument ). The mean of this distribution is the “link density,”
because ; by , we denote the number of links in graph .
However, what is frequently observed is not an average (2), but the properties of a single graph (e.g., WWW). That is why we are actually interested in the probability that our model will produce a graph with those properties. It is described by the distribution
In many cases this distribution is sufficiently well characterized by its mean (2) with relative fluctuations disappearing in the large- limit. In this situation we will say the is self-averaging. In such a case one can infer the properties of the whole ensemble from the properties of just one large graph. We want to emphasize, however, that this is only an assumption that has to be checked for each particular model (see Santo () for a discussion of self-averaging in real graphs).
In Appendix A we show for illustration a definition of a non-self-averaging ensemble. Although this is an artificial example, let it serve as a warning. In this paper we assume that our models are self-averaging without any further formal proofs.
We end with the following comment: as in the self-averaging ensemble fluctuations do not matter, in the large-volume limit we have
We will use this kind of approximations in the following sections.
The distribution does not give any information about the correlations between vertices. An obvious generalization is the joint distribution which describes the probability that a pair of nearest neighbors (NNs) has degrees and (we assume that we pick a pair of NNs with uniform probability):
where is the number of links with their start point having degree and endpoint having degree . Note that we treat each undirected link as two directed links. On an undirected graph,
If vertex degrees are independent, the probability (7) should factorize:
leading to the relation
One should, however, keep in mind that this defines the absence of correlations in the ensemble of graphs. A more appropriate question could be, are the vertices on individual graphs uncorrelated (see previous section)? The condition for absence of correlations between vertices in each individual graph is
or, after averaging,
As already pointed out, for a large class of ensembles conditions (10) and (12) are equivalent in the large-volume limit. However, it is easy to check that for the non-self-averaging ensemble in Appendix A vertices on each individual graph are uncorrelated according to the condition (12), but correlated according to (10). Again, we leave this as a warning and proceed further with the assumption that our models are self-averaging and that those two conditions are equivalent.
It describes the average degree of nearest neighbors of a vertex with degree . Obviously is defined for a given only if . can be interpreted as the first moment of the conditional probability:
If the degrees are independent, should not depend on and (12) implies
When grows with the graph is called assortative and when it shrinks disassortative.
Iii Connected components
In general, maximally random graphs with a given degree distribution do not need to be connected. However, if
(which translates into in the case of ER graphs), one of the connected components (called the giant connected component) will gather a finite fraction of all links and vertices Newman2001 (). This is a phenomenon akin to percolation. In Ref. Newman2001 () the size of the giant component and the size distribution of finite components were calculated. The degree distribution in the giant component was calculated in Ref. BauerBernard (). Here we generalize those results and calculate the two-point distributions and for the giant component.
We will use the method of generating functions introduced in Newman2001 (). The crucial observation is that the finite connected components are essentially trees. That is because a link emerging from one of the vertices in the component has the probability of connecting back to a node from this component, where is the size of the component. So for finite this becomes negligible in the large- limit.
Now let us pick a link from the graph at random. It belongs to some connected component. We will call the probability that cutting this link will split the component into two parts, one of them finite and having size . Stated differently, is the probability that a randomly chosen link will lead into a finite part of size . By the argument above this finite “half” will be a tree. Because of that, one can write down the equation for the generating function Newman2001 ():
We denote by the value of :
When there is no giant component in the graph, all connected components are finite and are trees. This means that cutting each link will result in two finite parts; thus, However, when the giant component appears, then there is a nonzero probability that the chosen link will belong to this component and either cutting it will split the component into two infinite parts, or will not split it at all. As this probability is missing from the sum (20) will be smaller the one. is to be interpreted as the probability that a randomly chosen link is connected to a finite part on at least one side of the graph fronczak (). It follows that is the probability that a random link belongs to a finite component of arbitrary size.
That can be derived in a more explicit way. Let us denote by the probability that a randomly chosen link belongs to a component of size . Then,
It is a convolution of the probability distribution with itself, so its generating function is just . Then is the probability that a link belongs to a finite connected component of arbitrary size and is the probability that it is inside the giant component.
Finally, if we denote by the probability that a randomly chosen vertex belongs to a finite component of size , we can obtain its generating function from Newman2001 ():
By the same arguments as above,
is the probability that a randomly chosen vertex belongs to a finite connected component and is the probability that it belongs to the giant component.
From the definition (19) it is easy to note that is always a solution, but when condition (17) is fulfilled the above equation has a solution smaller than 1 as well Newman2001 (). As argued, this signals the appearance of a giant component.
iii.1 Average degree
Using the results of the previous section it is easy to derive formulas for the average degree in the giant component and in the rest of the graph :
As we have already pointed out, the giant connected component is not a tree. The number of independent loops that it contains equals
and as all the remaining connected components are trees, this is also the number of loops in the whole graph.
We can also easily calculate the number of finite connected components knowing that they form a forest. The number of links in the forest is which gives
From that we can derive the formula for the average size of the finite connected component:
iii.2 Degree distribution
In this section we will calculate the degree distribution in the nongiant component part of the graph. From the relation
we automatically get the distribution in the giant component. This has been already done in BauerBernard (), but we find it instructive to use the same method of generating functions as described in Sec. III. The idea is to apply it only to the graph with the giant component excluded—i.e., to the finite connected components. We will use a tilde to denote the generating functions of the sought probability:
Using the argument from Ref. Newman2001 () we obtain the same equations
for the generating functions of the probabilities and . Here is the probability that a vertex belongs to a finite component of size provided that it belongs to a finite component and is the probability that a link leads into a finite component of size provided that it leads into a finite component. From this we can write the relations
which leads to
so that Eq. (31a) can be rewritten as
Comparing with (18) we see that it will be fulfilled if
Inserting this into (33b) we get
because of Eq. (24), which can be solved by putting .
From that and relation (29) we get the formula for the degree distribution in the giant component:
In the limit and this reduces to
In this limit the connected giant cluster is a tree. Indeed, one can check that
To calculate we use the relation
We have already assumed that vertex degrees are uncorrelated; we further assume that this is also true for the finite connected components (nongiant) part of the graph. Assuming self-averaging and using Eq. (10) for and we obtain
In the derivation we have used the relation , which should be valid for self-averaging quantities in the large- limit. Comparing this with formulas (10) and (16) we note that the correlations disappear in the limit . In the tree limit the formulas above take the form
While deriving our formulas we have made several assumptions: (i) the vertex orders are uncorrelated, (ii) the measured quantities are self-averaging, and of course (iii) all the derivations are only valid in the large- limit. To check to what extent those assumptions are satisfied and, more importantly, to check the magnitude of the finite size effects, we have compared our predictions to the results of MC simulations of moderate-sized graphs (5000 vertices). To simulate ER graphs we used a straightforward algorithm which connects vertices at random. To generate maximally random graphs with a given distribution we used the method described in Refs. BurdaKrzywicki (); BogaczBurdaWaclaw () and implemented in Ref. graphgen (). This method consists of generating graphs with suitably chosen one-point weights using a Metropolis-type algorithm.
iv.1 Erdös-Rényi graphs
For ER graphs the distribution is Poissonian, and
It follows that , so with being the closest to one (from below) positive solution of the equation
The results for and are shown in Fig. 1. They are compared with the results of the MC simulations of ER graphs. The agreement is perfect, and there are no visible finite-size effects (error bars are smaller than the size of the points). The degree distribution can be now easily obtained from (41). The results are presented in Fig. 2. Again, the agreement is very good without any noticeable finite-size effects.
In this case it may be instructive to derive those results in a simpler way: when we omit the giant component from our considerations we are left with a graph with vertices and links on average. As there are no further restrictions, we can assume that this graph is an Erdös-Rényi graph as well. This means that its degree distribution is again Poissonian with mean :
From the relation we obtain formula (40). Finally, for we get
The results are presented in Fig. 3. One can see clearly the appearance of correlations in the giant connected component as advocated in the introduction. The agreement with the predicted values is again very good.
iv.2 Exponential degree distribution
As the second example we take graphs with exponential degree distribution
The average degree in this case is
and Newman2001 ()
This implies . The giant component appears for . The results for and are presented in Fig. 4. As in the previous example, there are no visible deviations from the theoretical predictions.
iv.3 Scale-free graphs
Probably the most interesting case are scale-free graphs with distribution . While studying them we have to consider two scenarios and . In the first case we expect correlations between node degrees, as pointed out in Refs. BurdaKrzywicki (); dogorovtsev (); bpv (); cbp (). This invalidates both the derivation of Eqs. (18) and (45). Additionally the quantity diverges and so is not defined. Because our aim was to investigate the correlations appearing solely as an effect of the connectedness of graphs, we have decided not to study the case in this paper. This is, however, an interesting issue and merits further investigation. One line of pursuit is to use the algorithm proposed in cbp () to generate uncorrelated graphs with heavy tails. Then one should obtain predictions at least for the joint probability which does not contain any divergences. One could also use the -dependent “cutoff” distribution as proposed in cbp () instead of the “full” distribution . This would yield the depending results, but may not be feasible analytically. In the case of already the first moment of the distribution is not defined and the generating function approach fails completely.
When the is finite and there are no correlations, at least in the infinite-size limit bpv (); cbp (). However, for finite we expect strong finite-size effects for close to 3. To see this let us estimate the asymptotic behavior of :
In the above we have assumed the natural cutoff dogorovtsev (); BurdaKrzywicki (); bpv (); cbp (). For close to 3, this converges very slowly. To observe those effects we have simulated our system at , when approaches its asymptotic value as . The results of our simulations of graphs with 5000 vertices are presented in Figs. 7 and 8. As expected the data for and distributions show strong cutoff effects around , but for smaller values of the agreement with theoretical predictions is rather good. Looking at the results for we notice two things: (i) Data for the full graph show a deviation from a straight line, indicating the presence of some correlations due to heavy tails. (ii) Data for the giant connected component show a very strong effect of correlations. The agreement with theoretical values is very poor, so we have not included them in the picture. This is due to the described cutoff effect on . We can obtain a better agreement if we use in Eq. (46) the actual value of measured in simulations instead of its infinite-volume limit.
V Connected graphs
Finally, we would like to calculate the properties of the maximally random connected graphs. To this end we assume that the ensemble of giant connected components of the maximal entropy graphs with distribution is a maximal entropy ensemble of connected graphs with distribution (we neglect the fluctuations in the number of vertices and links of the giant component). This is a plausible assumption as we do not put any additional constraints except connectivity. In Appendix B we provide a more detailed argumentation. With this assumption the properties of the maximal entropy connected random graphs with distribution and/or average degree are the same as that of the maximal entropy random graphs with distribution and/or average degree given by Eqs. (41) and (25a).
v.1 Connected ER graphs
By connected ER graphs we mean maximal entropy connected graphs with a given average degree . According to the arguments from the previous section this ensemble corresponds to the ensemble of giant components in ER graphs with average degree related by Eq. (25a). For a given we solve this equation for (numerically) and use formulas (41) and (52) for degree distribution and for respectively. The results are presented in Figs. 9 and 10 and compared with the MC data for connected graphs taken from oles (). The agreement is very good which confirms the validity of the assumption made in the previous section.
v.2 Connected random graphs with arbitrary degree distribution
To calculate the properties of connected random graphs with arbitrary degree distribution we need to invert Eq. (41). This can be done by rewriting it as
where satisfies Eq. (24):
The above equation can be solved by the simple iteration procedure. To prove that it has a solution we rewrite it as
It is easy to check that
So for connected graphs is positive () and negative ().
Once we know we can calculate and from the normalization of the distribution and Eq. (23):
Because , those two equations are not independent and we can set . Then,
v.3 Simulating connected graphs
This procedure may be actually used to generate connected random graphs in an efficient way. Instead of generating connected graphs with degree distribution and checking the connectivity after every move, we can generate graphs with distribution given by (57) and use the giant connected component. This still requires calculating the connected parts, but it need to be done only once before each measurement.
As an example, we have generated connected maximally random graphs with Poissonian degree distribution
with . For this distribution , , and . Using the program graphgen () we have simulated a maximally random graph with vertices and links with degree distribution (57). We generated 10 000 independent graphs. The average size of the giant component was with standard deviation . The degree distribution in the connected component agrees very well with the desired one, as can be seen in Fig. 11.
Vi Uncorrelated connected graphs
An interesting situation arises when ; i.e., vertices with degree 1 (leaves) are forbidden. Then and . This means that the resulting graph consists of one giant connected component and isolated vertices only. It is easy to understand: finite connected components are trees, but there are no trees without leaves, except the degenerated ones made of a single vertex. If we additionally set then we will obtain a graph containing only the giant component—i.e., a connected graph.
But as observed in Sec. III.3, implies the absence of correlations. That would support our argument made in the Introduction about the role of the one-degree vertices in the appearance of correlations in a connected graph. Using the results of the previous section we can state that vertex degrees in the maximal entropy random graphs are uncorrelated if and only if ; i.e., there are no leaves in the graph.
As a check, we have carried out simulations with the exponential degree distribution and no leaves:
for (). The results for the giant component which consisted on average of more the of the whole graph are presented in Figs. 5 and 6 (squares). As predicted, vertices are uncorrelated in the stark contrast to the case plotted in the same figures.
We have also performed simulations for the scale-free distribution and no leaves. The results are presented in Figs. 7 and 8 (squares). We see that correlations are very much suppressed compared to the case when we admit leaves (presented in the same figures). The slight remaining correlation is due to long tails as explained in Sec. IV.3.
In this paper we have studied the correlations in connected random graphs. We have extended the results of Refs. Newman2001 (); BauerBernard (); fronczak () and calculated correlations in the giant connected components of random graphs. We argue that those correlations are related to the presence of nodes with degree 1, suggesting that the only cause of correlations is the absence of “hedgehogs.” This has been already stated in pb1 () where it has been shown that in the grand-canonical ensemble of arbitrary-sized trees, where “hedgehogs” appear, correlations vanish. We find this to be a very interesting issue that merits further studies.
The correlations observed in connected random graphs are an example of the so-called “structural” or “kinematic” correlations, as they appear in consequence of some global constraint. This should be contrasted with “dynamic” correlations which are the result of local two-point interactions between vertices. Such correlations may be generated by two-point weights pb2 (). This distinction can be important in simplicial quantum gravity where degree-degree correlations are interpreted as curvature-curvature correlations (see, for example, SmitBaker ()). However, as the simplicial manifolds are connected by definition those correlations are due to the above described mechanism rather than to some kind of gravitational interaction pb1 (); bbpt (). We believe that our results may help in clarifying such issues and in the interpretation of data obtained from MC simulations.
Finally, we have shown how to relate the giant connected components to the maximal entropy connected graphs ensemble. This allowed us to propose an efficient method for generating connected random graphs based on the Metropolis algorithm.
Acknowledgements.We would like to thank Zdzislaw Burda, Jerzy Jurkiewicz, Andrzej Krzywicki, and Bartłomiej Wacław for valuable discussions. This work was supported by KBN Grant No. 1P03B-04029 and EU Grants Nos. MTKD-CT-2004-517186 (COCOS) and MRNT-CT-2004-005616 (ENRAGE).
Appendix A Non-self-averaging ensemble
Denoting by the ensemble of all simple regular graphs with vertices and degree (in a regular graph all vertices have the same degree), we define
where denotes the number of graphs in the ensemble and is an arbitrary probability distribution. With this definition we find
It is easy to note that this poorly describes the distributions of single graphs which are just ’s. The variance of is
and indeed does not disappear in the large- limit.
For correlations we obtain
So the condition (10) is not satisfied. It means that
vertices on each particular graph are uncorrelated, but correlated
if the whole ensemble is considered. This is easy to explain: if we
pick a link from a graph with a given , then the information about
the first vertex does not provide any additional information;
however, if we do not know , then the degree of the first vertex
will give us immediately the value of
Appendix B Entropy of the giant connected components
Let and define a maximal entropy ensemble with vertices, links, and vertex degree distribution . We assume that the probability factorizes:
where are the connected components of the graph .
Let denote the ensemble of all giant connected components. We assume that we can neglect the fluctuations, so all the graphs in this ensemble have vertices and links. The degree distribution in this ensemble is . Because of the property (70), the entropy (1) of the whole ensemble is the sum of the entropy of the giant connected component ensemble and the rest:
Now we assume that there exists a probability defined on the ensemble such that the entropy
is greater than , but the vertex degree probability distribution remains unchanged. Then we can define a new probability on the ensemble :
where is the giant connected component of graph . The degree distribution of the ensemble would be the same as that of ensemble, but according to (71), its entropy would be greater. This contradicts the assumption that is the maximal entropy ensemble and proves that the ensemble of giant connected components is a maximal entropy ensemble.
- (1) R. Albert and A.-L. Barabasi, Rev. Mod. Phys. 74, 47 (2002).
- (2) P. Erdös and A. Rényi, Publ. Math. 6, 290 (1959); Publ. Math. Inst. Hung. Acad. Sci. 5, 17 (1961).
- (3) R. Albert, H. Yeong, and A.-L. Barabasi, Nature (London) 401, 130 (1999).
- (4) R. Albert and A.-L. Barabasi, Science 286, 509 (1999).
- (5) M. Molloy and B. Reed, Random Struc. Algorithms 6, 161 (1995); Combinatorics, Probab. Comput. 7, 295 (1998).
- (6) M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys. Rev. E 64, 026118 (2001).
- (7) Z. Burda, J. D. Correia, and A. Krzywicki, Phys. Rev. E 64, 46118 (2001).
- (8) M. Bauer and D. Bernard, e-print arXiv:cond-mat/0206150.
- (9) Z. Burda and A. Krzywicki, Phys. Rev. E 67, 046118 (2003).
- (10) A. Fronczak, P. Fronczak, and J. Hołyst, in Science of Complex Networks: From Biology to the Internet and WWW; CNET 2004, edited by J. F. F. Mendes et. al., AIP Conf. Proc. No. 776 (AIP, Melville, NY, 2005), p. 52. In this reference is denoted by .
- (11) P. Bialas, Phys. Lett. B 373, 289 (1996).
- (12) A. K. Oleś, Master’s thesis (in Polish), Jagellonian University, 2006.
- (13) M. Serrano, A. Maguitman, M. Boguñá, S. Fortunato, and A. Vespignani, ACM Trans. Web 1, 10 (2007).
- (14) R. Pastor-Satorras, A. Vazquez, and A. Vespignani, Phys. Rev. Lett. 87, 258701 (2001).
- (15) L. Bogacz, Z. Burda, and B. Wacław, Physica A 366, 587 (2006).
- (16) L. Bogacz, Z. Burda, W. Janke, and B. Wacław, Comput. Phys. Commun. 173, 162 (2005).
- (17) S. N. Dogorovtsev. J. F. F. Mendes, and A. N. Samukhin, Phys. Rev. E 63, 062101 (2001).
- (18) M. Boguñá, R. Pastor-Satorras, A. Vespignani, Eur. Phys. J. B 38, 205 (2004).
- (19) M. Catanzaro, M. Boguñá, R. Pastor-Satorras, Phys. Rev. E 71, 027103 (2005).
- (20) P. Bialas, Nucl. Phys. B 575, 645 (2000).
- (21) B. V. de Bakker and J. Smit, Nucl. Phys. B 454, 343 (1995).
- (22) P. Bialas, Z. Burda, B. Petersson, and J. Tabaczek, Nucl. Phys. B 495, 463 (1997).