Preferential Attachment in Systems and Networks of Constant Size

Preferential Attachment in Systems and Networks of Constant Size

Cornelia Metzig, Caroline Colijn
Queen Mary University London, UK, Simon Fraser University, Vancouver, Canada
c.metzig@qmul.ac.uk, ccolijn@sfu.ca
Abstract

We propose a preferential-attachment-type model for a system of constant size which applies to urn/ball systems and to networks. It yields a power law for the size (or degree) distribution with exponential cutoff depending on parameters. This distribution can be explained by maximization of the Gibbs-Shannon entropy, using as constraint information on the growth of individual urns, or alternatively calculating the exact probabilities. Another distribution that often occurs together with power laws, a ‘tent-shaped’ growth rate distribution, comes out naturally from this model. We confirm our theoretical results with numerical simulations and by another method using recusively calculated exact probabilities.

pacs:
02.50.Fz, 05.10.Gg, 05.40.-a, 05.65.+b

Version xx as of August 23, 2019

Comment to c.metzig@qmul.ac.uk

I Introduction

Complex systems self-organize and form scalefree distributions, from word frequency in language mandelbrot1953informational and web databases babbar2014power , city and company sizes marsili1998interacting ; axtell2001zipf to high-energy physics biro2005power . Oftentimes, scalefree distributions appear in degrees in a network, e.g. in protein interaction networks albert2005scale , brain functional networks eguiluz2005scale , email networks ebel2002scale , road networks lammer2006scaling , and various social networks barabasi2000scale , e.g. respiratory contact networks eubank2004modelling .

Many of these systems have been described by preferential attachment-type models, where the probability of a ball to join an urn (or a link to join a node) is proportional to the urn’s size (node’s degree): either the Yule-model simon1955class for urn/ball-type problems (like originally biological genera and species yule1925mathematical ) or, for networks, the Barabasi-Albert model barabasi1999emergence . Both models yield a scalefree distribution above some . They can furthermore explain another scaling phenomenon, a ‘tent-shaped’ probability density for the growth rate which often occurs in combination with a scalefree distribution alves2013scaling ; schwarzkopf2010explanation ; bottazzi2006explaining ; takayasu2014generalised ; bottazzi2002corporate ; alfarano2012statistical ; picoli2006scaling ; stanley1996scaling . However, preferential attachment as explanation for scaling relies on the system to be constantly growing, which is often not the case, especially for social systems because people also exit the sytem. An ansatz for systems of constant size is to use a multiplicative noise term in the linear Langevin equation takayasu1997stable ; biro2005power (where is additive noise). This yields a scalefree distribution for above some value , but since can be any i.i.d. random variables, it does not additionally explain scaling in the growth rate .

Here, we introduce a micro-founded model for a scalefree distribution and growth rate scaling, which preserves a stationary size in urns and balls by ‘preferential’ deletion of balls, and/or by deletion of urns. Like the Yule process, this algorithm can be extended to networks, where links and nodes are entering and exiting the network. The stationary size distribution of urns (degree distribution of nodes) is a power law (with/without exponential cutoff), which we derived in two ways, via maximization of the Gibbs-Shannon entropy, and by computing the exact probabilities of evolution.

Ii The process

We consider a system of urns and balls. Each urn is filled with balls, and their sizes satisfy . The dynamics are framed in terms of urns receiving and losing balls, in discrete timesteps . The two key features are that averages of and are conserved over time, and that every ball has the same chance of attracting another ball or vanishing, which implies ‘preferential’ growth simon1955class ; barabasi1999emergence . We give now the succession of events in one iteration .

  1. Growth of urns: every ball has probability of attracting another ball from a reservoir. Let be the number of new balls in urn ; is binomial with mean , such that the urn grows on average to .

  2. Shrinking of urns: every ball has probability of disappearing , which is adjusted as a result of the growth step such that will be conserved. Let be the number of disappearances of urn ; is a random variable with a binomial distribution with mean . The system shrinks in number of balls, but since some urns might be be left with balls, possibly also in number of urns.

  3. Exit of urns (and balls): every urn has probability of exiting, i.e. being set to size , so the system loses balls.

  4. Entry of urns (and balls): Urns that have lost all their balls due to steps (2) or (3) are replaced by urns that contain ball.

A varying number of balls exits at each iteration, depending on the exact number and sizes of the exiting urns. Even in the case where step (3) is omitted, some urns will exit, as urns can vanish by losing all their balls. Steps 3 and 4 conserve the number of urns but may still leave the system with a net loss of balls. To conserve the average number of balls after growth, , the probability to attract a new ball from the reservoir is adjusted for the next iteration.

ii.1 Possible cases

This general process can be reduced to two limiting scenarios with the same growth but different shrinking mechanisms. These are: (I) No deletion of urns of size . The system stays at a constant size (in terms of number of balls ) because the overall shrinking of urns equals the overall growth of urns. (II) Urns can only grow and do not shrink, but exit (with their balls) at rate and get replaced by urns of size , allowing the system to stay at constant size. (III) A combination of both.

  1. Urns do not exit (step 3 is omitted), i.e. . For an urn of size , the probability distribution of the size after a growth-and-shrink cycle, can be written as a discrete Gaussian centered around and with standard deviation

    (1)

    with standard deviation scaling exponent (see equations (V) - (10) supplementary informationV). The number of urns that have attained size and need to be refilled in step (4) is very low, since the probability to reach size decreases with urn size like , so that in practice average urn size is conserved, .

  2. Urns do not shrink (step 2 is omitted). The system size in terms of number of balls, has then very high fluctuations, as each time a fraction of urns is deleted (and replaced by urns of size 1), which means that the number of exiting balls varies strongly. The expectation after growth, deletion and replacement of urns is which is different to , i.e. the average urn size is not conserved. With probability , the urn grows by , and the binomial distribution of has standard deviation

    (2)

    with again scaling exponent .

  3. Mixed case. Steps 2 and 3 can be combined such that some balls (a fraction ) will disappear from the system due to shrinking of urns, and some because urns exit with probability with their balls. Since the exiting urns have the same mean size as all urns in the system, on average a fraction of balls exits with them. The turnover rate can then be defined as the fraction of balls that gets removed through exit of urns, normalized by the total number of balls that get removed in one timestep, .

Iii Results

The size distribution of urns converges to one that maximizes Gibbs-Shannon entropy in one timestep. Which urn size distribution has highest entropy, given that every urn has a probability to change size which can be approximated by a Gaussian with ? If there was a distribution that allows for higher multiplicity of outcomes of all individual , it would be preferred. We use the fact that for urns that do not exit, the probability is either Gaussian or binomially distributed and their associated entropies of . This term becomes for the first case using (1), or for the second case using (2). At stationary state, is also stationary. Formulated differently, the size distribution maximizes entropy under the constraint (or subtracting the constant ), can be written as

(3)

For case (I) where urns shrink, another constraint is the conservation of individual average urn size , or summed over all urns , which can be written as .

For case (II) where urns exit, this constraint does not hold since for most urns (except for the fraction that exits, which are replaced by urns of size ). The mean number of balls per urn is only conserved for the system as a whole because of reintroduction of urns, but not for individual urns. The entropy functional of the urn size distribution is

(4)

with the second constraint only for case (I). To determine the distribution that maximimizes , we calculate and set to , leading to

(5)

with . This equation can be solved using , and , which gives and (with the upper incomplete Gamma function). For the constant in equation (5) becomes , if urn sizes can take values in . Knowing , the exponent can be determined from the condition . In continuous approximation this yields . This result is independent of and for simplifies to

(6)

For , depends only on , which is the logarithm of the geometric mean of urn sizes. Exponential decay is only present if in addition is conserved.

iii.1 Numerical size distribution

Numerically the system converges to a power law distribution for urn sizes, in agreement with (5) derived with our entropy argument. However, this comparison requires an adjustment to how (3) is calculated. Approximating the entropy of a binomial holds for large , but yields . Urns of size make up a large fraction of urns, and since they can grow, we need to account for their contribution to the total entropy. We calculated the exact entropies and from the definition , and then multiply their fraction by from the large-n-approximation: with for a wide range of . We use an adjusted

(7)

The correction is significant for high turnover rates where a large fraction of urns has size . The theoretical is confirmed by the numerical one (see figures 1 a) and 3), so our entropy argument can be used to describe the size distribution. Simulations of the different cases show further that

  • The power law distribution has an exponential cutoff , in agreement with (5). Although (6) holds only for , it only slightly overestimates for , since the exponential cutoff affects only a small fraction of urns. In the presence of can be greater than , resulting in , which would diverge without exponential cutoff.

  • and case (III) Individual is not conserved, and already for low turnover rates the cutoff diminishes rapidly (see figure 1). The larger and the mean urn size , the larger the fluctuations in number of removed balls in step , and the more the urn size distribution fluctuates. At each timepoint, (7) and (6) give the correct exponent , if the distribution follows a power law starting at size , even if this exponent fluctuates over time. For low , a range between power law and exponential decay can form: in that case also two separate power laws could be fitted. In that case the probability that becomes very low for large urns, so that the large urns do not interact strongly with the smaller urns, and another power law forms from the large urns. It can again be described by a system maximizing entropy. Such distributions have been empirically observed mandelbrot1953informational .

Both and are independent of system size except if the system size is too low for convergence, in which case increases (see figure 1 c). Simulation results are independent of the urn growth rate in one timestep, , in agreement with our theoretical result in (6).

(a) example of size distribution, in log-linear and double logarithmic scale. , , .
(b) Numerical vs mean .
(c) Exponential cutoff fitted to simulations for different system sizes and average urn sizes. For small system sizes,for high and the cutoff eventually is no longer exponential and resembles another power law with higher exponent, which is why for some are lacking.
(d) vs for no turnover rate and different mean degrees in parantheses.
Figure 1: Simulation results for different turnover rates

iii.2 Growth rate probability density

It follows from the binomially (or normally) distributed (where ) that an urn’s growth rate, defined as , is also normally distributed

(8)

with scaling . Dropping the index , the aggregate growth rate distribution is , or in the continuous limit . This can be evaluated using (8) and for the expression (5). For and , yields a upper incomplete Gamma function shown in figure 2 and metzig2014model : .

Figure 2: Aggregate growth rate distribution, simulation and fit (for , )

Such ‘tent-shaped’ growth rate distributions are often observed for quantities that themselves follow a power-law picoli2006scaling ; bottazzi2006explaining ; schwarzkopf2010explanation ; stanley1996scaling ; mondani2014fat ; alfarano2012statistical ; fu2005growth . This result adds credibility to our model, in particular since a tent-shaped growth rate distribution does not result from simpler multiplicative noise Langevin models for power law formation.

iii.3 Extension to Networks

The algorithm can be adapted to derive the degree distribution for networks, where nodes are connected with undirected and unweighted links. The substeps become: (1.and 2.) A random link is broken, and one of its neighbors is chosen to receive an additional link (i.e. every node is picked with probability proportional to its degree ). Its new neighbor is also picked with probability . (3.) Nodes are removed at random at rate ; their links are broken. (4.) Nodes are re-introduced and linked to an existing node; the probability of selecting a node as neighbor is . New links are added to keep constant; each node has a probability of receiving a link . Compared to an urn/ball system, the exponential cutoff is increased for the following reasons: (i) the number of available neighbours, exclusion of multiple links between the same nodes, exclusion of self-links, (ii) additional features that can make the model more plausible e.g. for epidemiology, clustering metzig2017impact ; metzig2018phylogenies or different exit rules, e.g. removal of a node after a given timespan instead of exit by rate . These features cannot be written as simple constraint in the entropy functional (4), but it is possible to infer and in (5) numerically from , and additional features (see figure 4).

Iv Discussion

We introduced a micro-founded growth process for systems and networks of constant size, whose stationary size distribution follows a power law with or without an exponential cutoff, depending on the mechanism that keeps system size constant. It can be deletion of balls (links), and deletion of urns (nodes). The stationary size distribution has been derived by maximizing the Gibbs-Shannon entropy under two constraints. The theoretical and numerical exponent agree. Furthermore, if the average size and turnover rate are known, the power law exponent (via the constant ) and the exponental decay can be inferred numerically (see figure 1). The method of using the entropies of probabilities of individual urns as constraint on entropy to derive urn size distribution can also reproduce established results: trivially for Brownian motion (where the standard deviation scaling exponent and therefore ), and for the Yule process (or Barabasi-Albert for networks) where but since urns neither shrink nor exit, takes larger values. Our method also holds for multiplicative noise Langevin systems takayasu1997stable ; biro2005power . They can be written like where the noise term appears now as an additive term. This (e.g. Gaussian) noise term has then , i.e. . In this case, a much larger number of urns will shrink to zero, since does not decrease for larger urns. Conservation of individual mean sizes does not hold, so there is no constraint that accounts for exponential decay, which is also not present in numerical results. We derived the same size distributions for cases (I) and (II) numerically with another method using the exact probabilities for every individual urn, which we calculated with a recursion equation (see V and figures LABEL:fig:numerical_entropy and 5). Also this method can reproduce Zipf’s law for multiplicative noise Langevin systems.

Newman newman2005power derive the same equation (6) as a means to determine the exponent of power-law distributed data . is the exponent that maximizes the log-likelihood , assuming .

Figure 3: Fitted vs calculated exponent , for three different ways of accounting of urns of size .
Figure 4: simulated clustered network, and simulation results for different turnover rates,

References

  • (1) Reka Albert. Scale-free networks in cell biology. Journal of cell science, 118(21):4947–4957, 2005.
  • (2) Simone Alfarano, Mishael Milaković, Albrecht Irle, and Jonas Kauschke. A statistical equilibrium model of competitive firms. Journal of Economic Dynamics and Control, 36(1):136–149, 2012.
  • (3) Luiz GA Alves, Haroldo V Ribeiro, and Renio S Mendes. Scaling laws in the dynamics of crime growth rate. Physica A: Statistical Mechanics and its Applications, 2013.
  • (4) Robert L Axtell. Zipf distribution of us firm sizes. Science, 293(5536), 2001.
  • (5) Rohit Babbar, Cornelia Metzig, Ioannis Partalas, Eric Gaussier, and Massih-Reza Amini. On power law distributions in large-scale taxonomies. ACM SIGKDD Explorations Newsletter, 16(1):47–56, 2014.
  • (6) Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
  • (7) Albert-László Barabási, Réka Albert, and Hawoong Jeong. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A: Statistical Mechanics and its Applications, 281(1):69–77, 2000.
  • (8) Tamás S Biró and Antal Jakovác. Power-law tails from multiplicative noise. Physical review letters, 94(13):132302, 2005.
  • (9) Giulio Bottazzi, Elena Cefis, and Giovanni Dosi. Corporate growth and industrial structures: some evidence from the italian manufacturing industry. Industrial and Corporate Change, 11(4):705–723, 2002.
  • (10) Giulio Bottazzi and Angelo Secchi. Explaining the distribution of firm growth rates. The RAND Journal of Economics, 37(2):235–256, 2006.
  • (11) Sergey N Dorogovtsev and José Fernando F Mendes. Scaling behaviour of developing and decaying networks. EPL (Europhysics Letters), 52(1):33, 2000.
  • (12) Holger Ebel, Lutz-Ingo Mielsch, and Stefan Bornholdt. Scale-free topology of e-mail networks. Physical review E, 66(3):035103, 2002.
  • (13) Victor M Eguiluz, Dante R Chialvo, Guillermo A Cecchi, Marwan Baliki, and A Vania Apkarian. Scale-free brain functional networks. Physical review letters, 94(1):018102, 2005.
  • (14) Stephen Eubank, Hasan Guclu, VS Anil Kumar, Madhav V Marathe, Aravind Srinivasan, Zoltan Toroczkai, and Nan Wang. Modelling disease outbreaks in realistic urban social networks. Nature, 429(6988):180–184, 2004.
  • (15) Dongfeng Fu, Fabio Pammolli, Sergey V Buldyrev, Massimo Riccaboni, Kaushik Matia, Kazuko Yamasaki, and H Eugene Stanley. The growth of business firms: Theoretical framework and empirical evidence. Proceedings of the National Academy of Sciences of the United States of America, 102(52):18801–18806, 2005.
  • (16) Stefan Lämmer, Björn Gehlsen, and Dirk Helbing. Scaling laws in the spatial structure of urban road networks. Physica A: Statistical Mechanics and its Applications, 363(1):89–95, 2006.
  • (17) Benoit Mandelbrot. An informational theory of the statistical structure of language. Communication theory, 84:486–502, 1953.
  • (18) Matteo Marsili and Yi-Cheng Zhang. Interacting individuals leading to zipf’s law. Physical Review Letters, 80(12):2741–2744, 1998.
  • (19) Cornelia Metzig and Caroline Colijn. Phylogenies from dynamic networks. forthcoming.
  • (20) Cornelia Metzig and Mirta B Gordon. A model for scaling in firms’ size and growth rate distribution. Physica A, 2014.
  • (21) Cornelia Metzig, Julian Surey, Marie Francis, Jim Conneely, Ibrahim Abubakar, and Peter J White. Impact of hepatitis c treatment as prevention for people who inject drugs is sensitive to contact network structure. Scientific Reports, 7(1):1833, 2017.
  • (22) Hernan Mondani, Petter Holme, and Fredrik Liljeros. Fat-tailed fluctuations in the size of organizations: the role of social influence. 2014.
  • (23) Cristopher Moore, Gourab Ghoshal, and Mark EJ Newman. Exact solutions for models of evolving networks with addition and deletion of nodes. Physical Review E, 74(3):036121, 2006.
  • (24) Mark EJ Newman. Power laws, pareto distributions and zipf’s law. Contemporary physics, 46(5):323–351, 2005.
  • (25) S Picoli Jr, RS Mendes, LC Malacarne, and EK Lenzi. Scaling behavior in the dynamics of citations to scientific journals. EPL (Europhysics Letters), 75(4):673, 2006.
  • (26) Nima Sarshar and Vwani Roychowdhury. Scale-free and stable structures in complex ad hoc networks. Physical Review E, 69(2):026101, 2004.
  • (27) Yonathan Schwarzkopf, Robert Axtell, and J Farmer. An explanation of universality in growth fluctuations. 2010.
  • (28) Herbert A Simon. On a class of skew distribution functions. Biometrika, 42(3/4):425–440, 1955.
  • (29) Michael HR Stanley, Luis AN Amaral, Sergey V Buldyrev, Shlomo Havlin, Heiko Leschhorn, Philipp Maass, Michael A Salinger, and H Eugene Stanley. Scaling behaviour in the growth of companies. Nature, 379(6568):804–806, 1996.
  • (30) Hideki Takayasu, Aki-Hiro Sato, and Misako Takayasu. Stable infinite variance fluctuations in randomly amplified langevin systems. Physical Review Letters, 79(6):966–969, 1997.
  • (31) Misako Takayasu, Hayafumi Watanabe, and Hideki Takayasu. Generalised central limit theorems for growth rate distribution of complex systems. Journal of Statistical Physics, 155(1):47–71, 2014.
  • (32) G Udny Yule. A mathematical theory of evolution, based on the conclusions of dr. jc willis, frs. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213:21–87, 1925.

V Supplementary material

In one growth and shrink cycle, an urn of size can reach 3 possible states, 0, 1 and 2. Their probbabilities can be calculated by the probability to grow by one in the growth step, and the probability to shrink by one. From this follows that

(9)
(10)

This probability mass function has mean and variance . For an urn of size , , and and thus the standard deviation of an urn’s next size scales as

(11)

with its size . This scaling holds whenever growth is the sum of independent growth of balls.

v.1 Size distribution with exact probabilities

  1. From (V)- (10), the probabilities , can be calculated, similar to Pascal’s triangle for binomial coefficients. The lowest possible for an urn of size is always (all balls leave), the largest is always (all balls attract another ball). Every probability is itself a sum of terms

    (12)

    We calculated the coefficients recursively from coefficients of the corresponding addends in the 3 terms , and with the corresponding powers and :

    (13)

    if exists, given . The with is calculated first and no can be used in two addends for the same . With (13) the coefficients and probabilities have been computed (until ). Care has been taken at the implementation since (12) and (13) sum over terms of very different orders of magnitude.

  2. With the transition probabilities the most probable time evolution of an urn that started at size can be calculated recusively like . grows with and approaches , since over time, the probability to have died out is increasing.

  3. Assuming that equilibrium has been obtained by continuously replacing urns of size by urns of size , the equilibrium distribution is . It is shown in figure (5).

The obtained size distribution can again be fitted by a power law with exponential cutoff (see figure 5). The method applies to other processes if can be known. We used it also for multiplicative noise systems where Zipf’s law is recovered as result (figure 5).

Figure 5: (a) Numerical normalized probability density with Gaussian with without and with turnover, both in log-linear and double logarithmic scale (b) Numerical probability density with Gaussian with generates Zipf’s law .
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
338817
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description