General expression for the component size distribution in infinite configuration networks
In the infinite configuration network the links between nodes are assigned randomly with the only restriction that the degree distribution has to match a predefined function. This work presents a simple equation that gives for an arbitrary degree distribution the corresponding size distribution of connected components. This equation is suitable for fast and stable numerical computations up to the machine precision. The analytical analysis reveals that the asymptote of the component size distribution is completely defined by only a few parameters of the degree distribution: the first three moments, scale and exponent (if applicable). When the degree distribution features a heavy tail, multiple asymptotic modes are observed in the component size distribution that, in turn, may or may not feature a heavy tail.
pacs:64.60.aq, 02.10.Ox, 89.75.Da
Random graphs provide models for complex networks, and in many cases, real-world networks has been accurately described by such modelsBarabási and Albert (1999); Newman et al. (2002); Vázquez (2003); Kryven et al. (2016). Within the scope of random graph models one finds: Erdős-Rényi model, Barabási-Albert modelBarabási and Albert (1999), node copying modelBhat et al. (2016), small world networkNewman and Watts (1999), configuration networkMolloy and Reed (1998); Newman (2010) and many others. In the configuration network nodes are assigned pre-defined degrees. The edges connecting these nodes are then considered to be random, and every distinct configuration of edges that satisfies the given degree sequence is treated as a new instance of the network in the sense of random graphs. Interesting properties emerge when the number of nodes, approaches infinity, or at the so-called thermodynamic limitChung et al. (2003); Kryven (2016). In this case, the infinite degree sequence, which provides the only input information for the model, is equivalent to the frequency distribution of degrees, i.e. the probability that a randomly chosen node has degree .
Component-size distribution, denotes probability that a randomly chosen node is part of a connected component of finite size Connected components in the infinite configuration network can be of finite or infinite size. Molloy and ReedMolloy and Reed (1995) showed that if an infinite component exists then it is the only infinite component with probability 1. Hence, the infinite component is referred to as the giant component.
Depending upon a specific context behind the network, the component size distribution may summarise an important feature of the modelled system. In polymer chemistry, for example, the infinite configuration network is used as a toy model for hyper-branched and cross-linked polymers. In this context, the component size distribution predicts viscoelastic properties of the material while the emergence of the giant component is interpreted as a phase transition from liquid to solid state of the soft matterKryven et al. (2016); Kryven and Iedema (2014). Since connected components are closely related to clusters in bond percolation processes, the distribution of component sizes can be used to model outbreaks for SIR epidemiological processesNewman et al. (2002). In linguistics, component size distribution of the sentence similarity graph is an important tool when studying structure of natural languages Biemann and van den Bosch (2011). This brief list of application cases is far from being exhaustive. Despite the vast applications, the empirical component size distribution is hard to measure precisely unless the whole topology of the network is known. On another hand, empirical observations on the degree distribution, are much easier to perform.
Ref.Molloy and Reed (1995) provides an elegant criterion that connects moments of to the fact that the network contains the giant component. A somewhat deeper question further in this direction reads: providing is given, what is the component size distribution, In Ref.Newman et al. (2001) Newman et al. showed that the component size distribution can be recovered by a numerical algorithm that involves solving a fixed point problem followed by an inversion of a generation function. Such algorithm demonstrates that indeed and can be put into a correspondence, however, it becomes computationally infeasible for large values of . This numerical issues aries due to ill-posedness of the numerical generating function inversion.
On another hand, the component size distribution has been analytically resolved only for a limited number of partial cases of Newman (2007). Within the scope of analytically solvable cases, only the Yule-Simon degree distribution features a heavy tail, that is to say it decays proportionally to an algebraic function, at large . At the same time, the heavy-tailed (or scale-free) distributions are commonly observed in the empirical data collected from many real-world networksYook et al. (2002); Ravasz and Barabási (2003); Eguiluz et al. (2005); Fu et al. (2008). Empirically observed exponents vary in a broad range. Some studies report degree exponents that are as small as in the case of the Internet topologyAdamic and Huberman (2002) and in social networks Timár et al. (2016). On the opposite side of this spectrum, one finds exponent in the generalisation of preferential attachment model Dorogovtsev et al. (2000).
The only asymptotic analysis available for component size distribution in the configuration network states that for large is either proportional to or to where is a constant Newman et al. (2001). The current paper uncovers new asymptotic modes for that emerge only when the degree distribution features a heavy tail. The paper shows that for an arbitrary can be expressed as a finite sum. In practice, this sum can be stably computed up to the machine precision in the cost of multiplicative operations. Finally, the paper discusses how a finite cutoff introduced in the degree distribution reflects on the distribution of component sizes.
Ii Component size distribution by Lagrange inversion
It has been noticed that all components in the infinite configuration model are locally tree-like. Using this fact as a departure point, Newman et al.Newman et al. (2001) showed that the degree distribution can be put into a correspondence to the component size distribution by applying the generation-function (GF) formalism. Here, by a GF of we refer to the series,
According to the approach presented in Ref. Newman et al. (2001), the generating function for the component size distribution, is found as a solution of the following system of functional equations,
where is the GF of , and is the GF for the excess degree distribution
where Similarly to combinatorial tree-counting problems, Eq. (2) can be solved by applying the Lagrange inversion formulaBergeron et al. (1998). The original formulation of the Lagrange inversion principle is as follows. Suppose, are such formal power series that then for an arbitrary formal power series the coefficient of power series at reads as,
Here as being the inverse operation to the GF transform (1), refers to the coefficient at of the corresponding power series. By substituting , and applying Eq. (2) one transforms the left hand side of (4), Further on, the right hand side of (4) is transformed by substituting and realising that, according to the definition (3), Now, we are ready to write an expression for even though we have no explicit expression for generating function itself,
A similar equation was also derived in Ref.Newman (2007) by means of different reasoning. In principle, Eq. (5) provides enough information to analytically recover the component size distribution for a few special cases of the degree distributionNewman (2007). In practice, however, the main difficulty when applying Eq. (5) is that the equation employs the inverse GF transform, , which limits the choices one has when searching for an exact solution or performing numerical computations. With this in mind, one may rewrite (5) so that the new expression does not involve the GF concept at all. It turns out that the only reason why Eq. (5) utilises the GF formalism is that it provides means for convolution power.
The convolution of two distributions, is defined as a binary multiplicative operation,
where the summation is performed over all non-negative ordered couples that sum up to . This sum contains exactly of such couples. In this paper, the order of operations is chosen in such a way that the point-wise multiplication precedes convolution, for instance, . The convolution can be inductively extended to the -fold convolution, or the convolution power,
where by the definition. It can be shown that the convolution power can be expanded into a sum of products,
The convolution has a peculiar property in respect to the GF transform. If are GFs for and then . Furthermore, if is GF for then generates By exploiting this relation one immediately reduces Eq. (5) to,
Here, the value of is derived directly from the formulation of the problem: nodes with degree zero are also components of size one. This simple equation is ready to be used: by combining (8) and the definitions (3),(7) one may directly expresses the values of the component size distribution in terms of for
For example, first five values of read as,
The number of terms in this expansion increases rapidly with . That said, the formula (8) can be easily readjusted for numerical computations. Namely, one can use Fast Fourier Transform (FFT) to compute the convolution powers, In this case, multiplicative operations is sufficient to compute all values of . Alternatively, if is known can be found in the cost of Besides FFT, there are algorithms that are specifically designed for fast approximation of convolution powers, such as projection onto basis functions that are invariant under convolutionKryven and Iedema (2013).
Analytic formulas for convolution powers (sometimes also referred to as compositas Kruchinin (2015)), were covered by literature for many elementary functionsMa and Liu (2004); Ma and King (2002). Convolution powers of can also be found analytically by applying discrete functional transforms, for instance Z-transform and discrete Fourier transform. A few examples of such results are given in Table 1. Focusing on one of them, the first curve in Figure 1 demonstrates that both analytical and numerical results for the exponential degree distribution coincide.
|Degree Distribution,||Component Size Distribution|
Iii Asymptotic analysis
The format of Eq. (8), naturally suggests a straightforward way to perform an asymptotic analysis for One may view as a probability mass function PMF (or alternatively discrete probability density function) of some discrete random variables Recall the following property of convolution powers: if i.i.d. random variables have PMF then gives the PMF for the sum . The central limit theorem (CLT) gives an estimate for this sum as and the idea is now to obtain the asymptotes of by applying CLT to the definition (8).
iii.1 Light-tailed degree distributions
First, let us assume that distribution decays faster than algebraically, that is
which is also equivalent to . Then according to CLT, approaches the normal distribution, when where and denote the mean value and variance of . The normal distribution can now replace in (8), which yields the asymptote for the component size distribution,
Quantities are directly expressible in terms of moments of degree distribution
Two examples of component size distributions that converge with various rates to their asymptotes are given in Figure 1. Peculiarly, the only information on that is contained in the asymptote definition (13) is the first three moments . Furthermore, depending upon the value of the asymptotic expression (13) switches between the two modes: it either decays exponentially as when , or it decays as an algebraic function, when (see also Table 2, Case A). The last equality is the well-known giant component criterion,
The criterion (14) was obtained by Molloy and ReedMolloy and Reed (1995) by means of a different reasoning. In Ref Molloy and Reed (1995), the authors prove that implies existence of the giant component in the configuration network, whereas implies non-existence of this component. In Ref. Newman et al. (2001), it was hypothesised that the exponent is universal and must hold for all degree distributions at the critical point . We will see now that when the condition (10) fails to hold, distinct from exponents may also appear in the asymptotic of .
iii.2 Heavy-tailed degree distributions
Suppose that, on the contrary to the condition (10), degree distribution features a heavy tail,
which is equivalent to , It turns out that exponent and the scale together with the moments provide enough information to generalise the asymptote (13) for the case of heavy-tailed degree distributions. Suppose In terms of moments this condition casts out as As follows from Gnedenko and Kolmogorov’s generalisation of CLTGnedenko and Kolmogorov (1954) the mass density distribution for approaches the stable law,
Here, we use the notation of Uchaikin & Zolotarev Uchaikin and Zolotarev (1999) which includes: exponent parameter the location parameter
and the scale parameter
No general analytical expression is known for and the stable law is defined via its Fourier transform
Consider the case when According to (16) the point in which the stable law is evaluated, , approaches positive or negative infinities depending upon the sign of Indeed, as
For these values of function is non-zero on If the function features an algebraic decay, whereas if the decay is exponential. Therefore, the limiting value switching that takes place in (20) may reflect on the asymptotic behaviour of To give a precise answer one has to consider series expansions of around the points of interest, . We use here the leading terms of these seriesUchaikin and Zolotarev (1999),
By replacing the expression for the limiting distribution (16) with the leading terms given in (21) one obtains the asymptotes for (8). This time, the asymptote has three modes: depending upon the value of it either features a heavy tail with exponent a heavy tail with exponent or an exponential decay, as shown in Table 2, Case D. A few examples of such asymptotic modes for a heavy-tailed degree distribution
According to the definition (17), the location parameter vanishes, when . In this case, as and only one asymptotic mode is possible for . Stable law is supported on and we make use of the series expansion around
which when plugged in (16) yields faster then algebraic decay of the component size distribution, see Table 2, Case F. Due to the parametrisation scheme for the stable law, the point needs to be considered separately. In this case, when and we utilise the leading term of the series expansion,
which admits one sub-algebraic asymptotic mode for as shown in Table 2, Case E. This case is special in that the stable law is supported on but asymptotically, always tends to for large At the same time, if for small the point stays on the positive half-axis where (24) does not provide correct description for the convergence to the asymptote will be slow. In other words, there is an intermediate asymptote that the component size distribution can be approximated with, before it eventually switches to Eq. (24). This switching point is given by such that changes the sign from to i.e. when becomes greater then By solving one obtains , which means that in principle, the switching between the intermediate and the final asymptotes may be indefinitely postponed if is small enough. The intermediate asymptote itself is deduced from the leading term of the stable law expansion at that is After the substitutions one obtains,
As illustrated in Fig. 3, similar considerations are also valid for the case where
When occurs, such switching has a practical importance when dealing with empirically observed component size data. Indeed, it may happen that one observes only the intermediate asymptote and not the final one due a small number of samples at the tail of the component size distribution. For instance, the second curve in Fig. 3 does feature an exponential decay at infinity, but if one limits the data points to the component size distribution will seem to be a heavy-tailed one.
Finally, we consider the case when the condition (15) holds for even though has finite mean and variance it also features a heavy tail. Again, as features the limiting values that are defined by the sign of , see Eq. (20). One would expect that since is finite, this case should be also well approximated with (13). This is indeed the case for However, large deviations from zero do not follow Gaussian statistics Ramsay (2006); Nacher and Akutsu (2011), and we approximate with the Pareto stable law It turns out that behaves as the normal distribution for where is a finite positive constant, but features a heavy tail with the same exponent as when see Ref. Ramsay (2006) Thus, when the component size distribution features asymptotic modes as in (13), while when it features a heavy tail with exponent see Table 2, Case B. Interestingly, when is a small negative number, transiently follows one asymptote and then switches to the other as demonstrated in Figure 4. If there is a process that continuously changes the degree distribution so that progresses from being negative to positive, the exponent of the associated component size distribution will jump from the sub-critical branch, at , to the critical one at An example of such transition between two power-law modes is given in Figure 4, where a component size distribution switches between power laws with exponents and .
Iv Discussion and Conclusions
The broad generality of the results obtained in the previous section is achieved due to the fact that the configuration networks are locally tree-like and have vanishing probability of clustering in the thermodynamic limit, which allows one to benefit from the available in analytic combinatorics tools. Eq. (8), that was analysed in the previous section, connects the degree distribution in a configuration network to the distribution of sizes for connected components. The main conclusion one may draw from this equation is that the convolution power provides a smoothing effect. This means that all points of have a significant contribution to the definition of , but as increases, the system ‘forgets’ the exact shape of the degree distribution and the component size distribution tends to the asymptote, that is defined by only a few parameters. The only information that is still preserved at the limit is the first three moments of the degree distribution if such does not feature a heavy tail, see for example Fig. 1. If does feature a heavy tail then the information that characterises the tail becomes also important: that is the scale parameter and the exponent . Depending upon the values of these parameters, many asymptotical modes exist.
The expression for the asymptote is framed in terms of small deviation statistics for a sum of random variables and in some cases can be used as a good approximation for the component size distribution. Table 2 contains the analytical expressions for the asymptotes. Additionally, supporting code computing the component size distribution and the corresponding asymptotes is providedgit . When using the asymptotical expressions to approximate , one should pay attention to two factors that follow from central limits: firstly should be large, secondly the approximation is best for close to zero. Finally, small deviations or a cutoff in a heavy-tailed degree distribution can trigger considerable and non-trivial changes in , for instance, the change of the asymptotical mode of the latter.
iv.1 Degree distributions with a cutoff
In practice, no empirical degree distribution is a heavy-tailed one. Most of the ‘real-world’ degree distributions feature a cutoff, and therefore fail to be heavy-tailed in the strict sense of the definition (15). It turns out that if a cutoff is featured at large enough the above-provided asymptotic analysis still has a relevant meaning. This situation can be compared to how we commonly attribute the fractal dimension to real-world geometric objects that fail to be fractals on infinitesimal scales.
Suppose one applies a cutoff at to a degree distribution, that features a heavy tail. Since has a finite support, the asymptote of associated is covered by Case A (Table 2), however, if is large, may also transiently follow the original asymptote. Instead of an analytical investigation, we demonstrate the influence of the cutoff with numerical examples obtained by computing (8). This influence strongly depends on how the sign of is affected by the introduction of the cutoff. For example, if even after the cutoff, the cutoff will cause more nodes to appear in finite-size components, and thus the component size distribution will shift towards larger sizes. The opposite case is valid when before (and after) the cutoff, then the cutoff causes the component size distribution to shift towards smaller sizes. The third option is when the cutoff changes the sign of form ‘’ to ‘’. In this case, both shifts are possible. Fig. 5 shows how a component size distribution that corresponds to degree exponent is affected by a cutoff with various vales of .
iv.2 Excess degree distribution with no mean value
In principle, the excess degree distributions that do not have a mean value, i.e. , do not fall within any of the above categories. However, if one introduces a cutoff, will feature finite moments including, hence this case should be treated according to Case A of Table 2. Fig. 6 shows how cutoffs at influence an instance of component size distribution with . Unlike as in the previous example, in which with no cutoff generates a valid , here the increase of results in vanishing probability of finding a finite-size component at all: for any when . This illustrates the fact that finite-size components do not exist for and the whole configuration network is connected almost surely. Non-existence of finite components for also follows from the fact that in this case diverges and the point values of as given below the the definition (9), tend to zero.
Suppose the cutoff in the empirical, heavy-tailed degree distribution is due to the fact that the network sample has a finite size, , then one may approximate the expected number of edges in this sample as
Subsequently, three scenarios are possible here:
i) sparse network, the asymptotic modes are given in Table 2;
ii) semi-dense network, either or the mean value of excess distribution diverges; there are finite components but no power law in the distribution of component sizes;
iii) dense network, the mean value of degree distribution , and finite components vanish as .
iv.3 The role of the giant component
All the cases presented in Table 2 depend in some way on the value of . This is not a coincidence as the sign of is the indicator for the giant component existence. If the degree distribution features a heavy tail with exponent , depending upon the value of there are two possible heavy-tail exponents for the component size distribution: subcritical branch when , and critical branch when This relation is illustrated in Fig. 7, where the component size distribution exponent is plotted versus the degree-distribution exponent . We can see that if the giant component exists, then irrespectively of what is the degree distribution, the component size distribution always decays faster then the power law. Therefore it can be concluded that the giant component is not compatible with a heavy-tailed component size distribution. Any degree distribution with leads to a giant component since can only be positive in this case. Furthermore, if the giant component is also the only component: with probability 1 the configuration network is fully connected.
|Finite moments of||,||Asymptote of|
Acknowledgements.This work is part of the research program Veni with project number 639.071.511, which is financed by the Netherlands Organisation for Scientific Research (NWO).
- Barabási and Albert (1999) A.-L. Barabási and R. Albert, science 286, 509 (1999).
- Newman et al. (2002) M. E. Newman, D. J. Watts, and S. H. Strogatz, PNAS 99, 2566 (2002).
- Vázquez (2003) A. Vázquez, Physical Review E 67, 056104 (2003).
- Kryven et al. (2016) I. Kryven, J. Duivenvoorden, J. Hermans, and P. D. Iedema, Macromolecular Theory and Simulations 25, 449 (2016).
- Bhat et al. (2016) U. Bhat, P. L. Krapivsky, R. Lambiotte, and S. Redner, Physical Review E 94, 062302 (2016).
- Newman and Watts (1999) M. E. J. Newman and D. J. Watts, Physical Review E 60, 7332 (1999).
- Molloy and Reed (1998) M. Molloy and B. Reed, Combinatorics, probability and computing 7, 295 (1998).
- Newman (2010) M. Newman, Networks: an introduction (Oxford university press, Oxford, 2010).
- Chung et al. (2003) F. Chung, L. Lu, and V. Vu, PNAS 100, 6313 (2003).
- Kryven (2016) I. Kryven, Phys. Rev. E 94, 012315 (2016).
- Molloy and Reed (1995) M. Molloy and B. Reed, Random structures & algorithms 6, 161 (1995).
- Kryven and Iedema (2014) I. Kryven and P. D. Iedema, Chemical Engineering Science 126, 296 (2014).
- Biemann and van den Bosch (2011) C. Biemann and A. van den Bosch, Structure discovery in natural language (Springer Science & Business Media, 2011).
- Newman et al. (2001) M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Physical review E 64, 026118 (2001).
- Newman (2007) M. E. J. Newman, Phys. Rev. E 76, 045101 (2007).
- Yook et al. (2002) S.-H. Yook, H. Jeong, and A.-L. Barabási, PNAS 99, 13382 (2002).
- Ravasz and Barabási (2003) E. Ravasz and A.-L. Barabási, Physical Review E 67, 026112 (2003).
- Eguiluz et al. (2005) V. M. Eguiluz, D. R. Chialvo, G. A. Cecchi, M. Baliki, and A. V. Apkarian, Physical review letters 94, 018102 (2005).
- Fu et al. (2008) F. Fu, L. Liu, and L. Wang, Physica A: Statistical Mechanics and its Applications 387, 675 (2008).
- Adamic and Huberman (2002) L. A. Adamic and B. A. Huberman, Glottometrics 3, 143 (2002).
- Timár et al. (2016) G. Timár, S. N. Dorogovtsev, and J. F. F. Mendes, Physical Review E 94, 022302 (2016).
- Dorogovtsev et al. (2000) S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, Physical review letters 85, 4633 (2000).
- Bergeron et al. (1998) F. Bergeron, G. Labelle, and P. Leroux, Combinatorial species and tree-like structures (Cambridge University Press, Cambridge, 1998).
- Kryven and Iedema (2013) I. Kryven and P. Iedema, Macromolecular Theory and Simulations 22, 89 (2013).
- Kruchinin (2015) D. V. Kruchinin, Advances in Difference Equations 2015, 1 (2015).
- Ma and Liu (2004) N.-Y. Ma and F. Liu, Applied mathematics and computation 158, 225 (2004).
- Ma and King (2002) N.-Y. Ma and R. King, Applied mathematics and computation 133, 83 (2002).
- Gnedenko and Kolmogorov (1954) B. V. Gnedenko and A. Kolmogorov, Limit distributions for sums of independent variables (Addison-Wesley, Cambridge, 1954).
- Uchaikin and Zolotarev (1999) V. V. Uchaikin and V. M. Zolotarev, Chance and stability: stable distributions and their applications (Walter de Gruyter, Urecht, 1999).
- Ramsay (2006) C. M. Ramsay, Communications in Statistics—Theory and Methods 35, 395 (2006).
- Nacher and Akutsu (2011) J. Nacher and T. Akutsu, Physica A: Statistical Mechanics and its Applications 390, 4636 (2011).
- (32) “Matlab/GNU Octave source code for cacluating component size distribution in configuraiton network,” https://github.com/ikryven/PhysRevE_2017_GECS.