Generation of arbitrarily two-point correlated random networks

Generation of arbitrarily two-point correlated random networks

Sebastian Weber Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany    Markus Porto Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany
July 3, 2019
Abstract

Random networks are intensively used as null models to investigate properties of complex networks. We describe an efficient and accurate algorithm to generate arbitrarily two-point correlated undirected random networks without self- or multiple-edges among vertices. With the goal to systematically investigate the influence of two-point correlations, we furthermore develop a formalism to construct a joint degree distribution which allows to fix an arbitrary degree distribution and an arbitrary average nearest neighbor function simultaneously. Using the presented algorithm, this formalism is demonstrated with scale-free networks () and empirical complex networks ( taken from network) as examples. Finally, we generalize our algorithm to annealed networks which allows networks to be represented in a mean-field like manner.

pacs:
89.75.Hc, 05.40.–a

I Introduction

The fast developing research field of complex networks Albert and Barabási (2002); Dorogovtsev and Mendes (2002) focuses on the three main aspects of (i) measuring network topology, (ii) investigating dynamics on networks, and (iii) studying the interplay between dynamical processes on networks and the network topology. Surprisingly, empirical networks from a vast variety of scientific fields share a lot of characteristical features. Prominent examples are the small-world property Watts and Strogatz (1998), high clustering Newman (2003a), and the scale-free degree distribution Barabási and Albert (1999). One possibility to unravel the properties of empirical networks is to compare them to null models. Appropriate null models are random networks with some of the statistical features preserved being present in the empirical network under investigation. This idea gave birth to the well-known configuration model (CM) algorithm Bender and Canfield (1978); Bollobas (1980); Molloy and Reed (1995, 1998); Catanzaro et al. (2005a) which is capable of generating random networks with an a priori given degree distribution. Some extensions to this model have been proposed to even conserve some further statistical properties than the plain degree distribution, for instance the degree dependent clustering coefficient Serrano and Boguñá (2005).

A fundamental way to categorize and distinguish empirical networks beyond the degree distribution and clustering has been proposed by Newman Newman (2003b, 2002) who introduced the Newman factor . This number is basically the Pearson correlation coefficient of degrees (the number of edges emanating from a vertex) from connected vertices in a network and is therefore fully defined by two-point correlations in a network. The range of the Newman factor is in the interval where positive (negative) values indicate that vertices with the same (different) degree tend to be connected, while a value of means no correlation. Practically all empirical networks show a non-trivial two-point correlation structure. An astonishing observation is, for example, the fact that biological networks show negative Newman factors, while technological networks display rather small values of the Newman factor close to zero, whereas social networks tend to have rather large positive values Newman (2003c). The evident importance of correlations within the degree distribution has led to lots of efforts, for example a hidden variable approach has been developed in Ref. Boguñá and Pastor-Satorras (2003) and so-called -series networks which systematically describe the full correlation structure of a network have been introduced in Ref. Mahadevan et al. (2006) together with an algorithm for the lowest -classes. Thus, an efficient random network generator which constructs null model networks at the basis of an a priori prescribed two-point correlation structure is very important. Such a generator is presented below and allows to construct undirected random networks with a prescribed two-point correlation structure and hence much more realistic null models. The major advantage of our generator in comparison with similar algorithms previously introduced Boguñá and Pastor-Satorras (2003); Mahadevan et al. (2006); Vázquez and Weigt (2003) is its high accuracy and the generality of the approach which allows to construct networks with an arbitrary two-point correlation structure. As an application of this scheme and in order to investigate the influence of two-point correlations within empirical networks, we address the question how one can model two-point correlations while preserving the degree distribution of a network. This is fundamental, for instance, in order to shed light on the interplay between dynamical processes on networks on the underlying network topology with respect to two-point correlations.

The modeling of two-point correlations is especially interesting for the verification of theoretical predictions from theories describing dynamical processes on networks which do incorporate two-point correlations. Due to the small-world effect present in networks, it is common use to utilizes a mean-field (MF) ansatz. Hence, within these theories the network is modeled using a probabilistic approach and vertices are only connected with a certain probability to each other. The idea to represent a network by probabilities has already been brought up in the context of Kauffman’s model of random complex automata Derrida and Pomeau (1986); Bastolla and Parisi (1996). This so-called annealed network changes in every time step such that all edges are redistributed. A similar approach has recently been applied by Stauffer and Sahimi to scale-free networks to study the effect of ‘annealed disorder’ on a diffusion process Stauffer and Sahimi (2005). Such annealed networks are ideally suited to test the validity of MF theories of dynamics on networks. We extend this approach below by generalizing our algorithm to allow for the construction of two-point correlated annealed networks.

This paper is organized as follows: Section II introduces the network correlation measures used in this paper. Section III describes the algorithm to construct arbitrarily two-point correlated networks. Section IV develops a formalism which allows to fix a degree distribution and to arbitrarily choose the two-point correlations at the same time. The formalism is demonstrated with scale-free networks and empirical networks as examples. Section VI introduces the notion of a two-point correlated annealed network. We conclude and give an outlook in section VII.

Ii Correlation Measures

The following is a short summary of common definitions adapted to our purposes which will be used frequently within this paper. Two-point correlations are statistically described by the joint degree distribution which is the probability that a randomly chosen edge of the network has vertices with degrees and at its ends. This distribution is a symmetric function in the case of undirected networks, . By summation over either parameters of , one obtains the distribution over edge ends,

(1)

which is related to the distribution of vertices by

(2)

This last relation (2) between the edge end distribution and the degree distribution can easily be understood by the fact that every vertex with degree has probability of being drawn at random from the network. Therefore, the probability to draw an edge end connected to a vertex of degree is proportional to . Normalizing this last expression yields the edge end distribution . Here, denotes the mean with respect to the degree distribution . This mean has to be carefully distinguished from the mean with respect to the edge end distribution which we denote by . It is convenient Song et al. (2006) to extract the actual correlations from by relating it to the uncorrelated case , which has the special product form

(3)

By taking the ratio between and , this defines

(4)

as a correlation function.

However, the joint degree distribution and the correlation function are complex functional objects which are hard to imagine. A way to quantify the overall correlation present in a network was introduced by Newman Newman (2002). He defined the Newman factor to be the Pearson correlation coefficient of the remaining degrees of two vertices at either ends of a randomly chosen edge. The use of the remaining degree, which is the actual degree of a vertex minus one, is only an arithmetic trick to suppress some terms in calculations performed by Newman. In this paper, we directly use the degrees of the vertices, which is equivalent to Newman’s definition in the limit of large networks,

(5)

The Newman factor is normalized by to fall into the range . A positive (negative) value means that vertices with a degree preferentially attach to vertices with a degree of the same (different) order which is referred to as (dis-)assortative mixing. The special case of is achieved in the case of no correlation, which can be seen by substituting of Eq. (3) into Eq. (5). It is clear that the Newman factor quantifies the correlations present in a network only on a global scale. An intermediate approach, being on the level of degrees, has been introduced in Ref. Boguñá et al. (2003) with the average nearest neighbor function . Using the conditional probability

(6)

which is the probability that a randomly chosen neighbor of any vertex with degree has the degree , one defines to be

(7)

In the case of an (dis-)assortative network the average nearest neighbor has to be an (de-)increasing function, while it has the constant value for uncorrelated networks. It is interesting to note that

(8)

is generally valid, which can be seen by plugging Eq. (6) into Eq. (7) and averaging the resulting equality over with respect to the edge end distribution .

Iii Algorithm

The well-known CM algorithm Bollobas (1980); Bender and Canfield (1978); Molloy and Reed (1998, 1995) fixes a priori a degree sequence which is usually drawn from a given degree distribution . Each element of this degree sequence is the number of desired edges emanating of a vertex. These may be thought of as half-edges which still need to be joined with half-edges of other vertices. To construct the network, the CM algorithm may be implemented by placing all half-edges of all vertices into a single list, which is a discrete representation of the edge end distribution . An edge is formed by selecting two random members of that list. If the constraint of neither self- nor multiple-edges is met, the edge is created and the two half-edges are removed from the list. As the first and the second draw is done from the same list or, equivalently, each draw is done independently with the edge end distribution , the resulting network is always uncorrelated. Only the constraint of self- and multiple-edge prevention induces some intrinsic correlations, which can be avoided if the maximal degree is limited (cf. section IV.1). The CM algorithm paired with the correct choice of the maximal degree is as well known as the uncorrelated CM (UCM) algorithm Catanzaro et al. (2005a). However, almost all empirical networks do display two-point correlations in their topology. The algorithm discussed below allows to fix a priori an arbitrary joint degree distribution and generates a network which is completely random under all other topological aspects, just as the CM algorithm does with respect to the degree distribution .

A major computational complication arises from the fact that probabilities in the matrix may become very small as the probability for one edge is of the order and computationally hard to handle for large . Due to this problem, we sample in a first step a half-edge with the usual edge end distribution , in a second step, we sample a half-edge from the conditional probability distribution . The former two objects are much easier to sample as those are the result of integrals over and therefore contain probabilities of greater order.

The overall scheme of the algorithm to construct a network with vertices and a given joint degree distribution is the following:

  1. As in the CM algorithm, one first has to draw a degree sequence by calculating the theoretical (continuous) edge end distribution from the joint degree distribution and transform that into a degree distribution . From this distribution, a degree sequence of length is drawn.

  2. Each element of the degree sequence represents a vertex. All vertices with the same degree are then sorted into degree classes, each containing only vertices of the same degree .

  3. To compensate for discretization effects caused by the finiteness of the sampled network, one has to calculate the discrete edge end distribution from the generated degree sequence. To do so, one acquires, by estimating the size of each degree class, the discrete degree distribution , which corresponds to a discrete edge end distribution by .

  4. Next, the discrete conditional probability is setup. To obtain a matrix which accommodates the discretization effects, one replaces the continuous edge end distributions in the definition of the conditional probability distribution of Eq. (6) by the discrete edge end distributions and obtains therefore

    (9)

    Since we mix the discrete edge end distribution and the continous correlation function , the resulting conditional degree distribution is only approximately normalized for a given degree class . To obtain a conditional probability distribution suitable for sampling degree classes, we normalize each degree class separately, leading to the final form

    (10)

    This definition is consistent with the limes , as the discrete edge end distribution becomes equal in this limit to the continous edge end distribution and the ratios become exactly , respectively.

  5. After all base data structures have been initialized, the algorithm starts to draw edges by drawing edge ends. The first edge end is selected by first drawing a degree class from the edge end distribution and then randomly choose a vertex from that degree class.

  6. The second end of the edge is chosen in the same two step manner. However, the first draw of a degree class is done with the appropriate conditional probability distribution instead of the edge end distribution . This construction scheme yields correctly correlated graphs, since we have

    (11)

    An edge is created whenever the constraints of neither self- nor multiple-edges is met. Otherwise the drawn edge is rejected and the algorithm continues with step five.

  7. If the edge is created, the probability weights of the two edge ends are removed from the corresponding degree classes in the edge end distribution and the conditional probability distribution matrix . The removal of the probability weight is equivalent to the removal of the two half-edges from the list of eligible half-edges in the CM algorithm.

  8. The steps five to seven are repeated until no edge ends are left and all edges are formed.

The principal numerical costs of the algorithm arises from the continuous sampling of degree classes in the steps five and six above. Since the algorithm has to sample only the degree classes actually realized, which is a significant lower number than the system size , the numerical costs are of the order with . Furthermore, due to the removal of probability weight of used half-edges throughout the construction procedure, the algorithm samples only the possible configuration space which remains valid in each iteration step just as in the CM algorithm. The memory usage of the algorithm scales with the square of the number of realized degree classes. This can become a significant advantage over the CM procedure as described above, since the memory usage of the CM procedure scales with the number of half-edges needed to construct the network.

To validate our algorithm, we use three empirical networks as test cases: (i) a social network where the vertices are actors and the edges between those are assigned if they performed in at least one movie together Barabási and Albert (1999); (ii) a subset of the WWW containing web pages which are connected if there exists a link among them Albert et al. (1999); (iii) the yeast protein-interaction network constituent of proteins Jeong et al. (2001). The data has been downloaded from Barabási’s web site http://www.nd.edu/~networks. All self- and multiple-edges were removed from each network. The actor network is assortatively (), the WWW network weakly () and the yeast protein-interaction network disassoartively () correlated. To test the correctness of the algorithm, one measures the joint degree distribution of the base networks and uses this function as input for the construction algorithm. The resulting random network has to display the same degree distribution and joint degree distribution as the empirical one. A very sensitive test to validate if the correlation structure of the reference and the random network indeed match is on the level of the correlation function which varies on a much smaller scale than the joint degree distribution . Thus, comparing the reference correlation function , which one obtains from the empirical network, with the correlation function of the network as generated by the algorithm by means of a correlation coefficient ( means total agreement, indicates that the two functions are of opposite sign and means no correlation among the two functions in comparison) reveals almost complete agreement of (i) (ii), , and (iii) . A density plot of the reference correlation function versus the resulting correlation function in Fig. 1 verifies the excellent agreement of the correlation functions and . The plot shows the corresponding values of versus for all indices and at either axis. Ideally, all data-points would be on the diagonal which would be the case if the two functions were identical and the density plot would show a delta-shaped line along the diagonal. As one can see from the plots, the highest density of points, which is indicated by darker red, is almost solely centered at the diagonal. Just as the correlation functions coincide, the degree distributions show the same very good agreement, which is illustrated in Fig. 2. The statistics per curve are randomized realizations for the actor-, for the WWW- and for the yeast-network in both figures.

Figure 1: (color online) Density plot of the correlation function of the empirical network versus the correlation function of the corresponding random network as generated by the algorithm for all indices and . Darker red regions contain a higher density of data points, while lighter red indicates a lower density. The reference line is drawn as a guide to the eye.
Figure 2: (color online) Degree distribution of empirical networks and their corresponding degree distribution as generated by the algorithm. The red squares denote the reference points as measured from the empirical networks and the black circles mark values measured from the randomized networks.

Iv Controlling Correlations in Networks

The algorithm described in this paper constructs undirected random networks with an arbitrary two-point correlation structure. This allows us to test explicitly the influence of two-point correlations present in a network on its properties. For example, being able to control the two-point correlation structure of a network allows to directly test their influence on dynamical processes taking place on the networks. We therefore aim at developing a formalism which allows to control the two-point correlations of a whole network in terms of the average nearest neighbor degree and the Newman factor , given a fixed degree distribution .

As we want to preserve a given degree distribution , which translates into a given edge end distribution , while varying the joint degree distribution , some restrictions apply to the joint degree distribution. We begin with an ansatz by writing the joint degree distribution in product form as in Eq. (4),

(12)

It is clear that the correlations in the network are encoded by this ansatz within the correlation function . The relation to the Newman factor from the definition Eq. (5) is

(13)

By the notation , we indicate that the average with respect to is to be taken simultaneously over the indices and , similarly as denotes the average with respect to . The correlation function is as well tightly connected to the average nearest neighbor degree function . Using that the conditional probability , the definition of Eq. (7) turns into

(14)

Multiplying the average nearest neighbor function with and summing over all , we are lead to

(15)

which we can substitute into Eq. (13), leading us finally to

(16)

From the constraint of a given degree distribution it follows that an integration over either argument of the joint degree distribution has to be equal to the corresponding edge end distribution (or ). Thus, the correlation function has to fulfill the condition,

(17)

which means

(18)

The considerations so far are general. However, as we want to control correlations within the network, we seek for an explicit correlation function which has the property of Eq. (18) and produces a joint degree distribution which yields a given average nearest neighbor degree function. To do so, we make a simple ansatz for the correlation function

(19)

This functional form may be understood as a series expansion of first order, fulfilling the necessary symmetry property that the correlation function has to be constant under exchange of indices and . Plugging this ansatz into Eq. (14) takes us to

(20)

which means that

(21)

The constant can easily be calculated by multiplying Eq. (21) with and summing over all . Rearranging the terms then yields

(22)

Finally, the correlation function has the form

(23)

Employing condition (18) to the ansatz in Eq. (19) yields

(24)

This property is consistent with the functional form of in Eq. (21), since the average of over with respect to the edge end distribution yields zero by usage of Eq. (8) (). Eq. (8) helps furthermore to construct valid average nearest neighbor functions with an arbitrary functional dependence upon the degree . Taking a sufficiently smooth and positive weighting function , the corresponding compatible with Eq. (8) is then

(25)

However, the resulting correlation function is still constrained by even further conditions Lee et al. (2006); Dorogovtsev et al. (2005); Boguñá et al. (2004). For example, the ratio as introduced in Ref. Boguñá et al. (2004) is defined as the actual number of connections () divided by the maximal number of connections among the degree classes and . For networks without multiple edges this ratio is given by

(26)

It is clear that this ratio must always be in the range between and for all valid degree classes and present in the network,

(27)

From this condition the admissible degree range becomes dependent upon the details of the correlation function . To proceed, we choose as an example the average nearest neighbor function to be a power law , as this functional form roughly approximates the measured average nearest neighbor function of various empirical networks. Using this ansatz, one obtains the final form of the correlation function as

(28)

Up to this point the degree distribution or equivalently the edge end distribution is still arbitrary as the former does only enter Eq. (28) via the averages used in the definition of the correlation function . Nevertheless, the range of the exponent is limited, since condition of Eq. (27) has to be fulfilled. A further complication arises from intrinsic correlations caused by the constraint of the absence from self- and multiple-edges. In the following we discuss these issues for scale-free networks and empirical networks in detail.

iv.1 Scale-Free Networks

The degree distribution of a scale-free network is defined by

(29)

where is the scale-parameter. The edge end distribution is therefore given by

(30)

As we only discuss finite networks, the range of admissible degrees is limited by various conditions. First, the rapidly decreasing probability for increasing degrees requires to cut-off the degree range at a maximal degree above which the accumulated probability weight is equal to . This yields the so-called natural cut-off Cohen et al. (2000),

(31)

This natural cut-off is necessary to prevent large fluctuations in a finite random network ensemble and is an upper limit for the maximal degree . It is important to emphasize that this cut-off is by no means induced by the topology of the complex network.

However, it turns out that the natural cut-off is not always compatible with the condition of Eq. (27), which can easily be used to determine the so-called structural cut-off. In the case of scale-free networks, Eq. (26) reduces for sufficiently large degrees and to and defines therefore a maximal degree at the upper bound for the ratio (). With this criteria, one obtains, in the case of uncorrelated networks having a constant correlation function , the scale-parameter independent cut-off . This is smaller than the natural cut-off for values of the scale-parameter in the range . Nevertheless, newer calculations by Dorogovtsev et al. Dorogovtsev et al. (2005) reveal that this structural cut-off is still too large in that particular range of the scale-parameter and causes intrinsic correlations to arise within otherwise uncorrelated networks without self- or multiple-edges. Due to the maximal degree being too large and the required constraints, the vertices with large degrees do have a tendency to connect preferably with low degree vertices which effectively yields disassortativity. The reason for the failure of condition (27) in the case of scale-free networks with a scale-parameter in the range can be seen in the diverging fluctuations in the degree distribution as only the first moment of the degree distribution is finite. The approach taken by Dorogovtsev et al. is based upon a statistical ensemble ansatz. A canonical network ensemble is defined as the set of networks with a fixed set of vertices and a fixed number of edges. The final networks are then the out-come of an evolution process where randomly chosen edges are removed and simultaneously added to a pair of vertices in the network. The pair of vertices is chosen at random with weights given by the product of a preferential function where and are the degrees of the respective vertices. With the preferential function and beneath the critical temperature, the authors observe that the degree distribution becomes scale-free. However, depending upon the finiteness of the second moment of the degree distribution, Dorogovtsev et al. find different cut-offs of the degree range

(32)

The evolution process driving a network into this equilibrium network is, of course, neither the same as constructing a network with the CM algorithm nor with the algorithm developed in this paper. The CM algorithm and the algorithm presented in this paper, however, fix a priori the number of vertices and edges as well, just as in the canonical network ensemble. Thus, both algorithms can be interpreted to produce graphs which are members of the canonical network ensemble below the critical temperature, since both approaches evidently yield random networks with the correct degree distribution.

Up to this point, we have only treated the uncorrelated case which corresponds to in Eq. (28). Numerical experiments indicated a strong deviation from the expected power-law for the measured average nearest neighbor function in the case of assortative networks which have , if one naively uses a cut-off as it is applicable for uncorrelated networks. The average nearest neighbor function shows that the vertices with the largest degree fall below their expected average nearest neighbor value and tend therefore to cause some degree of disassortivity. This effect roots in the constraint of the prevention of self- and multiple-edges and becomes stronger for larger values of the exponent . To compensate this effect, we incorporated the exponent in the exponents of the maximal degrees identified so far in a simple way (an analytically exact derivation is beyond the scope of this paper) and always use the minimal resulting maximal degree,

(33)

Using a maximal degree of this form lowers (raises) the cut-off degree for assortative (disassortative) correlations with increasing (decreasing) exponent . Having fixed the maximal degree , we set the minimal degree to be in all simulations. This ensures that we always obtain a largest giant component in the network having almost the size of the entire network, which in turn guarantees that the largest giant component has the same two-point correlation structure as the entire network. This is favorable, since in most applications only the largest component of the generated random networks is of interest.

As already pointed out, it is crucial to note that only the first moment of the degree distribution is finite for values of the scale-parameter in the range while all higher moments diverge. However, already the first moment of the edge end distribution is diverging in this range of the scale-parameter . This has the important consequence that the average nearest neighbor function becomes system size dependent, as by Eq. (8). To validate the predicted power-law behavior of the average nearest neighbor function , we employ a dimensionless data-collapse of the function,

(34)

This type of plot is extremely sensitive even against smallest deviations from the predicted power-law in the average nearest neighbor function . The numerical results for various values of the scale-parameters and the exponent are shown in Fig. 3 for networks of size . Each data point is calculated over an ensemble of random networks. The curves run quite nicely along the predicted constant line of . Especially the curves coincide with the constant line of , which is a further, very important validation of the algorithm, since in this case the algorithm has to coincide with the well-known UCM algorithm Catanzaro et al. (2005a). Three details are interesting to note: (i) with decreasing the curves become longer as the maximal degree increases, (ii) not all values of the exponent can be realized for a given value of the scale-parameter as condition is violated for some curves and would require a further adjustment of or even , (iii) with increasing scale-parameter the curves for larger values of the exponent show a trend to slightly bend below the constant line of which is an indication that the cut-off as of Eq. (33) still gives slightly too large values for the maximal degree . Another test of our formalism can be accomplished by comparing the Newman factor of the resulting networks to the values of the analytically predicted ones by Eq. (13). The Fig. 4 shows that numerical simulations (points) and theoretical predictions (lines) coincide very well.

Figure 3: (color online) Data-collapse for average nearest neighbor function with various values of the power parameter for networks with a scale-free degree distribution with varying values of the scale parameter . The symbols used for the different values for the scale-parameter are: blue circle , pink square , dark green triangle up , red diamond , yellow triangle down , and light green star .
Figure 4: (color online) Newman factor as a function of the scale-parameter for different values of the scale-parameter . The straight line denotes the theoretic values of the Newman factor as of Eq. (16). The symbols denote the value of the scale-parameter : blue circle , pink square , dark green triangle , and red diamond .

The diverging moments and of the edge end distribution for values of the scale-parameter within the range make a careful inspection of finite-size effects necessary. One can easily see that the ratio , appearing in the denominator of the correlation function in Eq. (28), diverges, as the ratio becomes proportional to . Nevertheless, a detailed calculation reveals certain restrictions on the maximal range of admissible degrees if is chosen to be different than . In this case, the criterion leads to a relation between the minimal degree and the maximal degree . Thus, the range of admissible degrees is limited and the moments and , which would otherwise diverge, remain finite. Fig. 5 shows the finite-size effects on the Newman factor as a function of the exponent . The plot shows only a marginal effect of the system size on the curves. However, for smaller sizes, a broader range in the exponent can be used. This is due to a violation of the criterion which requires for larger networks either a smaller maximal degree than the one used from Eq. (33) or a greater minimal degree . Despite the restrictions which apply to the ansatz made, the range of correlations span very well the range of correlations found in empirical networks.

Figure 5: (color online) Network size dependence of the Newman factor as a function of the exponent for different values of the scale-parameter . The network size is marked by the symbols: blue circle , pink square , and dark green triangle .

iv.2 Empirical Networks

A very interesting aspect of our formalism is its applicability to empirical networks. By extracting a degree sequence from an empirical network and employing the formalism developed in the last section, it is possible to create random networks which have the same degree sequence as the empirical network and an arbitrarily chosen average nearest neighbor function , for instance following a power-law with tunable exponent . Thus, given a degree sequence from a network, one constructs from this the corresponding edge end distribution and calculates then via Eq. (28) a joint degree distribution with which one builds a randomized network. As a result, one obtains randomized versions of the empirical network with freely tunable two-point correlation strength, depending upon the choice of the exponent . However, the range of the exponent is limited by condition (27). In Fig. 6 (a), (b), and (c) the numerical results are shown for the actor-, the WWW-, and the yeast-network. The plot uses the same type of data-collapse as already presented in Fig. 3. The deviations from the expected constant value of for the data-collapse are due to intrinsic correlations which arise in networks without neither self- nor multiple-edges and are caused by the maximal degree in the degree sequence (see section IV.1). Especially the WWW-network is strongly affected by this as it has a maximal degree of the order , while the network size is and hence only one order of magnitude greater.

Figure 6: (color online) Data-collapse for average nearest neighbor function for the three empirical networks actor-, WWW- and yeast protein-interaction network. The left column (a), (b), and (c) shows the data-collapse for networks generated by the algorithm, while the right column (d), (e), and (f) shows the same data-collapse for networks simulated in an annealed manner. The statistics for each curve is , , and realizations, respectively. The different symbols indicate different values for the exponent : blue circle , pink square , dark green triangle up , red diamond , and yellow triangle down .

V Annealed Networks

To investigate, for example, a dynamical processes on random networks, one typically performs the dynamical process on a whole ensemble of networks and computes averages of the observables one is interested in. The algorithm presented so far is suitable to generate such random network ensemble. The network itself always stays constant during one dynamical process and one refers to this type of network typically as static or quenched network. A different approach is to change the network on a certain time-scale during a dynamical process and then calculate averages over time of the observables one is interested in. In an extreme case, the vertices of the network are reshuffled before every microscopic step of the dynamic. Such changing networks are referred to as annealed networks (see Ref. Burda et al. (2001); Dorogovtsev and Mendes (2003); Dorogovtsev et al. (2003); Stauffer and Sahimi (2005)). If the dynamic is local in each microscopic step (for instance a diffusion step from one vertex to another along an edge), it is sufficient to draw edges on demand only and to generate solely the local connections around the vertex considered. Here, we propose a scheme which efficiently simulates such annealed networks . The idea is to treat vertices of a network discrete while the edges are solely represented by an arbitrary joint degree distribution such that the connectivity structure of the network is only defined on average. Hence, this scheme effectively simulates the networks connectivity structure in a mean field (MF) like manner.

This is a very convenient tool as theoretical approaches to complex network topics are frequently based on MF theories. Successful examples are reaction-diffusion systems Catanzaro et al. (2005b); Weber and Porto (2006), epidemic disease spreading Boguñá et al. (2003), and phase transitions in ferromagnetic magnets Dorogovtsev et al. (2002), to mention just a few examples. These theories usually describe the network topology via a statistical approach. Thus, it is desirable to numerically represent networks in a probabilistic manner as well. This allows an even better test of MF based theories since the network is represented as it is done within the theory. Furthermore, by comparison of quenched with annealed simulations, one can analyze in detail which aspects of such a MF theory are an over-approximation due to the MF assumption. We define such an annealed network to consist of a degree sequence of size and a corresponding joint degree distribution . Each element of the degree sequence represents a vertex with connections. Thus, the set of edges is not fixed, only the total number of edges () is held constant. Whenever, for example, a dynamical process requests an adjacent vertex of a given vertex, the neighbor vertex is instantly determined by sampling one edge which emanates from the given vertex. This edge is drawn from the joint degree distribution and will instantly be removed after usage.

This simulates a continuously rewired network which is only locally defined by means of one edge at a time. The first four steps to setup such an annealed network are basically the same as done for the initialization of the algorithm of section III: (i) Draw a degree sequence from the joint degree distribution or take the degree sequence from a real network. That degree sequence is (ii) sorted according to degree classes and (iii) mapped into a discrete edge end distribution . In the same manner as done previously, (iv) one calculates the discrete conditional degree distribution from the theoretical joint degree distribution . Now, instead of constructing the network, one only redefines how neighbors of vertices and hence how edges have to be understood:

  • The neighbor vertices of a vertex with degree are always drawn by the conditional probability distribution .

  • An edge is sampled by first drawing a vertex via the edge end distribution and secondly the vertex neighbor is found by sampling the conditional probability distribution .

As we want the network to be free of self-connections, we assure that the sampled vertices at both ends of the sampled edges are not the same. However, the constraint of preventing multiple-edges among vertices is not possible to be enforced within this local definition of the network. Therefore, these annealed networks are free of the intrinsic degree correlations which arise due to this particular constraint. This becomes apparent in Fig. 6(d), (e), and (f) where numerical results of annealed networks are shown as a data-collapse for the average nearest neighbor function , aside with the corresponding curves in the case where the network is actually constructed (Fig. 6(a), (b), and (c)). Only the curve for the WWW network, Fig. 6(e), deviates from the expected value of for very large degrees. This has to be attributed to the prevention of self-connections, which is still enforced. Since these vertices with a very large degree are not allowed to connect to themselves, they have to connect on average with vertices which have a degree below the preassigned average nearest neighbor function , causing some slight trend towards disassortativity.

Vi Conclusions

In summary, we have presented an efficient and accurate algorithm which generates networks with an a priori defined two-point correlation structure defined by an arbitrary joint degree distribution . This provides much better null models for the investigation of empirical networks, as these are usually two-point correlated. Besides the applicability to reconstruct the two-point correlations of empirical networks, we developed a formalism which allows to systematically tune the strength of two-point correlations in a network while preserving the degree distribution of a network. The two-point correlations are specified in our ansatz via the average nearest neighbor function which we exemplified by a power-law ansatz with the tunable exponent . As two important examples, we employed this formalism in the cases of scale-free networks and empirical networks. However, as intrinsic degree correlations arise from the constraint of the prevention of self- and multiple-edges, these cause inevitable deviations from the theoretically preassigned two-point correlations. Furthermore, we found that the maximal cut-off degree in the case of articifial scale-free networks to prevent these intrinsic correlations is substantially lower than it was believed.

At last, we introduced the notion of two-point correlated annealed networks which are ideally suited to test the validity of mean field theories, since the edges of these networks are solely represented in a probabilistic manner.

Using this algorithm and the new formalism developed, one can investigate the effects of two-point correlations in empirical and artificial networks. Such scheme is expected to be an important tool to better understand, for example, how the topology of a network influences dynamical processes on it.

References

  • Albert and Barabási (2002) R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2002).
  • Dorogovtsev and Mendes (2002) S. Dorogovtsev and J. F. F. Mendes, Adv. Phys. 51, 1079 (2002).
  • Watts and Strogatz (1998) D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
  • Newman (2003a) M. E. J. Newman, Phys. Rev. E 68, 026121 (2003a).
  • Barabási and Albert (1999) A.-L. Barabási and R. Albert, Science 286, 509 (1999).
  • Bender and Canfield (1978) E. A. Bender and E. R. Canfield, J. Combin. Theor. A 24, 296 (1978).
  • Bollobas (1980) B. Bollobas, Eur. J. Comb. 1, 311 (1980).
  • Molloy and Reed (1995) M. Molloy and B. Reed, Random Structure and Algorithms 6, 161 (1995).
  • Molloy and Reed (1998) M. Molloy and B. Reed, Combinatorics, Probability and Computing 7, 295 (1998).
  • Catanzaro et al. (2005a) M. Catanzaro, M. Boguñá, and R. Pastor-Satorras, Phys. Rev. E 71, 027103 (2005a).
  • Serrano and Boguñá (2005) M. A. Serrano and M. Boguñá, Phys. Rev. E 72, 036133 (2005).
  • Newman (2002) M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).
  • Newman (2003b) M. E. J. Newman, Phys. Rev. E 67, 026126 (2003b).
  • Newman (2003c) M. E. J. Newman, SIAM Review 45, 167 (2003c).
  • Boguñá and Pastor-Satorras (2003) M. Boguñá and R. Pastor-Satorras, Phys. Rev. E 68, 036112 (2003).
  • Mahadevan et al. (2006) P. Mahadevan, D. Krioukov, K. Fall, and A. Vahdat, in SIGCOMM (2006), eprint arXiv.org:cs.NI/0605007.
  • Vázquez and Weigt (2003) A. Vázquez and M. Weigt, Phys. Rev. E 67, 027101 (2003).
  • Derrida and Pomeau (1986) B. Derrida and Y. Pomeau, Europhys. Lett. 1, 45 (1986).
  • Bastolla and Parisi (1996) U. Bastolla and G. Parisi, Physica D 98, 1 (1996).
  • Stauffer and Sahimi (2005) D. Stauffer and M. Sahimi, Phys. Rev. E 72, 046128 (2005).
  • Song et al. (2006) C. Song, S. Havlin, and H. A. Makse, Nature Physics 2, 275 (2006).
  • Boguñá et al. (2003) M. Boguñá, R. Pastor-Satorras, and A. Vespignani, in Statistical Mechanics of Complex Networks, edited by R. Pastor-Satorras, M. Rubi, and A. Diaz-Guilera (Springer Verlag, Berlin, 2003), vol. 625 of Lecture Notes in Physics.
  • Albert et al. (1999) R. Albert, H. Jeong, and A.-L. Barabási, Nature 401, 130 (1999).
  • Jeong et al. (2001) H. Jeong, S. P. Mason, A.-L. Barabási, and Z. N. Oltvai, Nature 411, 41 (2001).
  • Lee et al. (2006) J.-S. Lee, K.-I. Goh, B. Kahng, and D. Kim, Eur. Phys. J. B 49, 231 (2006).
  • Dorogovtsev et al. (2005) S. Dorogovtsev, J. Mendes, A. Povolotsky, and A. Samukhin, Phys. Rev. Lett. 95, 195701 (2005).
  • Boguñá et al. (2004) M. Boguñá, R. Pastor-Satorras, and A. Vespignani, Eur. Phys. J. B 38, 205 (2004).
  • Cohen et al. (2000) R. Cohen, K. Erez, D. ben Avraham, and S. Havlin, Phys. Rev. Lett. 85, 4626 (2000).
  • Burda et al. (2001) Z. Burda, J. D. Correia, and A. Krzywicki, Phys. Rev. E 64, 046118 (2001).
  • Dorogovtsev and Mendes (2003) S. Dorogovtsev and J. Mendes, Evolution of Networks (Oxford University Press, 2003).
  • Dorogovtsev et al. (2003) S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, Nucl. Phys. B 653, 307 (2003).
  • Catanzaro et al. (2005b) M. Catanzaro, M. Boguna, and R. Pastor-Satorras, Phys. Rev. E 71, 056104 (2005b).
  • Weber and Porto (2006) S. Weber and M. Porto, Phys. Rev. E 74, 046108 (2006).
  • Dorogovtsev et al. (2002) S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes, Phys. Rev. E 66, 016104 (2002).
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
15122
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description