Network Growth with Arbitrary Initial Conditions: Degree Dynamics for Uniform and Preferential Attachment
Abstract
This paper provides timedependent expressions for the expected degree distribution of a given network that is subject to growth, as a function of time. We consider both uniform attachment, where incoming nodes form links to existing nodes selected uniformly at random, and preferential attachment, when probabilities are assigned proportional to the degrees of the existing nodes. We consider the cases of single and multiple links being formed by each newlyintroduced node. The initial conditions are arbitrary, that is, the solution depends on the degree distribution of the initial graph which is the substrate of the growth. Previous work in the literature focuses on the asymptotic state, that is, when the number of nodes added to the initial graph tends to infinity, rendering the effect of the initial graph negligible. Our contribution provides a solution for the expected degree distribution as a function of time, for arbitrary initial condition. Previous results match our results in the asymptotic limit. The results are discrete in the degree domain, and continuous in the time domain, where the addition of new nodes to the graph are approximated by a continuous arrival rate.
I Introduction
The complex network literature spans various strands of research such as sociology Granovetter (1973); Wasserman and Faust (1994); Degenne and Forse (1999), economics Jackson and Rogers (2005); Bala and Goyal (2000), computer science Adamic (1999); Adamic et al. (2001); Leskovec et al. (), marketing Leskovec et al. (2007); Delre et al. (2010); Kim et al. (2006); Martins et al. (2009), epidemiology Newman (2002); PastorSatorras and Vespignani (2001a), genetics Elowitz and Leibler (2000), and bibliometrics Redner (2005). These domains aim to extract macroscale behavior from given microscale interactions..
The structure of the underlying graph, which connects the agents and consequently regulates their interactions, is necessary for studying the dynamism of various phenomena, such as flow of information (news, rumors, trends, etc.) in the society BenNaim et al. (2003); Acemoglu et al. (2011), resilience against node or link failures (for the internet, it means survival of the system if certain nodes are shut down) Holme et al. (2002); Cohen et al. (2001), pace of diffusion of a contagious disease throughout a population and also optimal immunization strategies PastorSatorras and Vespignani (2001b, 2002a, 2002b), the effect of the network structure among actors on their chance of winning awards G. Rossmana (2010), to name a few. Models have been proposed to emulate different structural properties observed in real life graphs D.S.Price (2007); Price (1965); Erdős and Rényi (1959, 1960); Bollobás (2001a, b); Watts and Strogatz (1998); Newman and Watts (1999); Barabasi and Albert (1999); Barabasi et al. (1999).
In many applications, such as the world wide web Jeong et al. (2007); Zhou and Mondragón (2004) and scientific collaborations Newman (2004), networks are dynamic, that is, subject to growth. This provides motivation to view the problem of network formation dynamically. In this formulation, nodes are introduced successively, and they select from existing nodes whom to attach to. It mimics, for example, the mechanism by which new papers cite existing ones. Barabasi et al. (1999) takes this approach and introduces the preferential attachment mechanism, which is explained below. Also, in Krapivsky et al. (2000); Krapivsky and Redner (2001); Krapivsky et al. (2001); Krapivsky and Redner (2002) the problem is tackled by the conventional techniques of polymer physics. Both of these approaches employ approximations to solve the problem. In what follows, we go over these approximations and the corresponding results.
i.1 Previous Work: Network Growth
In the linear preferential attachment scheme introduced in Barabasi et al. (1999), the growth mechanism is as follows. The growth process starts with nodes. Then, nodes are introduced one per unit time. Each node picks existing nodes to link to, with probabilities assigned to them proportional to their degrees. This means that an existing node with a higher degree will be more likely to attract the newlyintroduced node. Denote the degree distribution of the graph when the total number of nodes is by . Their result can be expresses as follows:
(1) 
The analysis is done within the meanfield simplification and the solution is valid in the asymptotic case of . In Bollobás (2001b), this result is ameliorated by reformulating the problem more rigorously. Denote by the number of links that each newlyborn node emanates to the existing nodes. The network growth process starts from a cycle. Let be defined as above. Also, define:
(2) 
In Theorem 1 in Bollobás (2001b), it is shown that in the limit , for any positive and for , the following holds:
(3) 
Note that the expression in (2) agrees with (1) for large values of .
The problem is also closely related to the socalled Polya’s urn problem F.Chung et al. (2003) in combinatorics. Given a finite number of bins, additional balls arrive one at a time. With a given probability, a new bin is created for the new ball. The ball otherwise joins an existing bin. It picks the destination bin with probabilities dependent on the existing number of balls within the bins. In F.Chung et al. (2003), the case where probabilities are proportional to is solved. The case of is akin to the linear preferential attachment scheme mentioned above.
A novel way to tackle the problem was presented in Krapivsky et al. (2000); Krapivsky and Redner (2001); Krapivsky et al. (2001); Krapivsky and Redner (2002, 2003) by employing the master equation approach which authors borrow from polymer physics. The result of (2) for the case of has been obtained using this approach (equation (2) inKrapivsky et al. (2000), (5) in Krapivsky and Redner (2003), (2) in Krapivsky and Redner (2001) and (2) in Krapivsky and Redner (2002)). For a treatment of finite size effects (when is not infinitely large) with primary focus on nodes with degree , see Krapivsky and Redner (2002). In Dorogovtsev et al. (2000), the generating function approach has been used to solve the master equation and the asymptotic distribution (2) for has been recovered, and the asymptotic degree distribution up to the leading order of has been obtained (for the initial condition of a single node with a specified number of incoming links from outside the network, since in Dorogovtsev et al. (2000), directed links can originate from unspecified sources, even from outside the network), in the form of , as a function of time and time of birth , which is the time at which each node is introduced to the network. In the present paper, we seek for arbitrary times, and links are necessarily emanated from the newly introduced nodes at each timestep, and also the links are undirected.
In Krapivsky et al. (2000), the uniform attachment scheme is also examined. This means that, new nodes attach to existing nodes with equal probabilities, regardless of their degrees. If we start from a single node at the outset, the resulting graph is called a Random Recursive Tree (RRT). The result presented in Krapivsky et al. (2000) for the asymptotic degree distribution of RRTs is as follows:
(4) 
The same result is also presented in Theorem 1 inNa and Rapoport (1970) and equation (49) in Janson (2005) following a combinatorial approach.
i.2 Timedependent Solution, Motivation
Previous work has been primarily revolved around the asymptotic degree distribution, that is, when the number of nodes tends to infinity. Also, in some case, further simplification is acquired by limiting the range of degrees. For long times, the effect of the initial graph is neglected. In this contribution, we start from an initial graph with known degree distribution . We solve for the expected degree distribution at time . We consider both uniform and linear preferential attachment (eventuating in a scalefree graph in the long run). The timedependent solution, first develops intuition about the growth process, and the path that the system undergoes until it reaches the steady state. More importantly, the effect of the initial conditions is taken into account. Different substrates reach the equilibrium approximation of the degree distribution which is at hand, with different paces. The timedependent solution illuminates the effect of the initial condition on the accuracy of the abovementioned approximations.
Equipped with the timedependent solution, one can also examine the shorttime behavior, in marked contrast with the convention, which limits the solution to the longtime behavior. As an example of how the need for extracting the shorttime growth of an existing graph is elicited in realistic applications, consider the network of supporters in a political campaign. Nodes are fanatics who absorb new people into the campaign, causing the network to expand throughout the potential electorate. The change in the network of followers in one day is not substantial compared to the existing size of the network. As another example, consider the social network within a country, with a small number of immigrants joining and enlarging the network. The number of immigrants typically constitutes a small fraction of the population of the host country (with possible exceptions of wars or other abrupt phase transitions, to minor degrees). Then, if one wants to study the social network of the host country, the conventional models cease to perform, because the fraction of new nodes to existing nodes does not tend to infinity, but is small. The same is true for any slowlygrowing realistic network where the extrapolation of the near future provided information on the current state is called for.
i.3 Organization of the Paper
First in subsection II.1 we consider the uniform attachment scheme, with each newly introduced node linking to one existing node picked uniformly at random. We compare our results with the ones present in the literature. Then in subsection II.2 we consider the uniform attachment for multiple linking, where each new node connects to existing nodes drawn uniformly at random. In III we examine the preferential attachment scheme. First in III.1 we consider each new node linking to only one existing node with probabilities assigned to existing nodes proportional to their degrees. Then in III.2 we assume each new node attaches to existing nodes. So each new node has degree upon birth. We solve for the expected degree distribution in all cases. Throughout the paper, we compare our theoretical findings with simulations.
Ii Uniform Attachment
ii.1 Single Connection
We start from an initial graph at time with nodes. We denote the degree distribution at the outset by . At each timestep, a new node is introduced. It picks one of the existing nodes uniformly at random and connects to it. Nodes are added one by one. If the initial condition is a single node, the resulting graph will be the conventional Random Recursive Tree Na and Rapoport (1970); Janson (2005); Krapivsky et al. (2000).
Let represent the rate at which new nodes are introduced, that is, nodes are added in a time interval of duration . It also means that each node is added within unit times. So for example, if , then 100 nodes are introduced per unit time, and each of them arrives at 0.01 unit times. At time there are nodes. Let denote the expected number of nodes whose degree is at time . Let us focus on the expected variation in in the time increment within which one new node is added.
With probability , a node with degree receives a link, and its degree increments. Consequently, decrements and increments, both by one. Similarly, with probability , a node with degree receives a link, hence increments and decrements, both by one. So we have:
(5) 
Note that the case of is distinct. Each new node increments by one. So,
(6) 
These two equations can be condensed into one:
(7) 
where is the Kronecker delta function (i.e., if , and otherwise). Dividing both sides by , and denoting by (which means that one node arrives per ), we can recast this equation as the following:
(8) 
In the limit , the following differential equation is obtained for the dynamics of the expected degree distribution:
(9) 
where is the first derivative of with respect to time, and explicit dependence on time is omitted for expositional simplification.
Approximating the difference equation (7) with its differential analog (9) has error of order (readily seen through the Taylor expansion of )), which can be controlled by rescaling of time. Error shrinks as grows:
(10) 
This continuous approximation is justified more rigorously using martingales in Wormald (1995); Kurtz (1981) (also see Mitzenmacher (2004); Drinea et al. (2000)). In this paper the goodness of this approximation is empirically verified through simulations.
Note that, from (7), we see that the increments take rational values proportional to . Approximating the lefthand side with a differential yields (9). Since the denominator of (7) has the factor , the approximation becomes more accurate as grows, hence increasing . The approximation is also more accurate when is large. When both and are small, then the continuous approximation in the domain becomes less accurate. Note that, as long as is large, need not be large. This is particularly important for applications where networks are already large (such as those mentioned in Section I), and one would like to predict the shortterm evolution of the degree distribution. In these settings, the expressions obtained throughout this paper are applicable for any time regime.
To solve (9), we use the generating function , which is the conventional Ztransform in the domain. Using (9) we get
(11) 
So we intend to solve the following differential equation in the time domain:
(12) 
After solving this equation and applying the initial conditions (Appendix A), we obtain the generating function:
(13) 
To take the inverse transform, first note that:
(14) 
Also, denoting by , note that we have:
(15) 
Using the Taylor expansion of the exponential, we get:
(16) 
So the inverse transforms are:
(17) 
where is the Heaviside step function (i.e., for , and for ). Finally, note that the multiplication of Ztransforms yields convolution after inversion. Let us denote the degree distribution of the initial graph by , that is, is the fraction of nodes at the outset with degree . So by inverting (13), for and we obtain:
(18) 
where denotes the convolution operator. For general sequences and , the convolution is another sequence in the domain which is defined as follows:
(19) 
If the sequences are zero for negative values of (which is the case in our problem), this can be simplified to:
(20) 
To get the degree distribution, we divide the result in (18) by the total number of nodes at time , which is equal to . So for and we obtain:
(21) 
As , the effect of initial conditions vanish. So the first term dominates. In this limit, the asymptotic degree distribution is:
(22) 
Note that this matches the asymptotic behavior previously found for RRTs as presented in Na and Rapoport (1970); Janson (2005); Krapivsky et al. (2000).
To compare theoretical prediction with simulation results, first we start off with a 6regular graph. This means that all nodes have 6 neighbors. We build this graph by first making a ring of 50 nodes, and then connect each node to the pairs of second and third closest neighbors. Figure 1 shows the degree distribution at time , that is, . The results are average over 50 Monte Carlo trials. Also, , so nodes are introduced one at a time. As can be seen in the figure, the second majority belongs to degree 1, which are the newly born nodes. Nodes of degree 6 are mostly the initial ones who have received no new link yet, and those with degree 7 have received only one additional link.
For the next simulation, we take a ring (2regular) of 50 nodes and we plot as a function of time, for . The simulations and theoretical results are shown in Figure 2. Nodes with degree 3 are the ones who have received one link from the newly added nodes, who outnumber those who have received two, as seen in the figure. This is expected because initially it is less probable that the new node attaches to a node who already have received a link than a node whose degree is still , since the latter outnumbers the former at early times, thus its population grows substantially. After a while, many nodes have degree 3 and now that they receive new links, their degrees turn 4, reducing the population of degree3 nodes. Figure 3 shows for the case of . It can be seen that the fraction of nodes with degree decreases and that of those with degree 1 increases. The reason is tat each new node that is added to the network has degree 1, and those existing nodes with degree 2 receive links fro the new nodes and their degrees increment, and is not 2 anymore, hence the decline in the population of nodes with degree 2.
Next, the mean and variance of simulations are tabulated to provide estimates for fluctuations around the mean values which are solved for. Table 1 presents these values for a 4regular ring of 20 nodes, at different times.
\backslashboxTimeDegree  

t=5 





t=10 





t=15 





t=20 





t=25 





t=30 





t=35 





t=40 




ii.2 Multiple Connections
Now, let us consider multiple attachments. Each new node that is introduced, chooses existing nodes (where is an integer) uniformly at random and links to them. An essential difference of this scheme from the previous one is that, if one starts from a disconnected graph, then the probability of ending up with a connected graph is nonzero. This probability was zero in the previous case, because each newlyintroduced node only linked to one existing node and could not make a connection between two disconnected components. Also note that in this case one must have , so that the growth mechanism can start off. Otherwise, link multiplicity arises, that is, more than a link should be allowed between two nodes, which is tacitly assumed not to be the case throughout.
Taking the similar steps that led to (7), the change in is given by:
(23) 
Note that the last term indicates that each new node adds one to , because its degree is . The differential equation analog for becomes
(24) 
Taking the Ztransform, we get
(25) 
So we arrive at the following differential equation:
(26) 
The solution procedure is given in B. The generating function is
(27) 
Now we must invert this, term by term. This is done in appendix C. After inversion, for and we obtain:
(28) 
Then we divide by the total number of nodes to get the degree distribution for and . The result is:
(29) 
Now let us look at the longtime behavior of the result. When , the second and the third terms vanish. The first term prevails and tends to the following:
(30) 
Note that for the case of , the same asymptotic distribution is obtained as the previous section. Also note that in the asymptotic limit, all nodes have degree at least and the fraction of nodes with degree less than tends to zero.
Figure 4 shows the simulation results for a ring (2regular) of 30 nodes, and is depicted versus time, for . The value of is 3. The number of Monte Carlo trials is 30. It can be seen in the graph that the nodes with degree 4, who are mostly the initial nodes who have received one link from the newcomers, outgrow those with degree 5. This is because they also outnumber them, giving them greater link reception probabilities. After a while, this trend declines because many nodes will have degree 4, and now that they receive a new link, they will turn into nodes of degree 5, enhancing the growth of the degree 5 nodes, diminishing the portion of nodes with degree 4. Similarly, the overshoot of degree 6 curve happens after that of degree 5, and so on.
Figure 5 shows for , for a 6regular ring of total 50 nodes. The number of Monte Carlo trials is 50. The value of is 3. The peak at seen in the figure is due to the newly added nodes, who all have degree 3. Most of the initial nodes have received zero or one links by this time, hence the other peak at .
The mean and variance of simulations are presented in Table 2.
\backslashboxTimeDegree  

t=5 





t=10 





t=15 





t=20 





t=25 





t=30 





t=35 





t=40 




Iii Preferential Linking
In this section we are going to focus on preferential attachment. New nodes, instead of selecting from the existing nodes uniformly at random, assign to them probabilities of connection, proportional to their degrees. So, each existing node has the chance of receiving a link from the newlyintroduced node equal to its degree, divided by the sum of the degrees of every existing node. First we will assume the case where a new node only attaches to a single existing node, and then the case of multiple connections is considered.
iii.1 Single Connection
As mentioned above, in the preferential attachment scheme, an existing node with degree receives a link with probability , where the denominator is the sum of the degrees of all existing nodes. So the probability that the destination node selected by a newlyborn node has degree is equal to . Using the same approach that led to (9), we arrive at the following differential equation for the evolution of :
(31) 
Now to proceed as before, we take the Ztransform of this equation. First note that if is the Ztransform of a discrete function , then is the Ztransform for the the function . This means that, if the Ztransform of is , then the Ztransform of the first term on the right hand side is as follows:
(32) 
Second, note that the denominator of the attachment probabilities, , is twice the number of links in the graph. Let us denote the number of links in the graph by . Note that, since each new node adds one new link, we have:
(33) 
Twice the number of links in the initial graph equals , where denotes the average degree of the initial graph. So we get:
(34) 
We will temporarily use
(35) 
for brevity. The Ztransform of (31) is:
(36) 
This is a firstorder partial differential equation. We solve this equation using the method of characteristics. For the convenience of the reader, we briefly shed light on how this method works through a simple example in appendix D (we refer the reader to R. Courant (1989); Zauderer (2011); Zwillinger (1998), or other elementary references on partial differential equations, for further details), and then provide the solution in appendix E, where we obtain:
(37) 
Note that from the argument of the logarithm, we know the region of convergence of the Ztransform is , since the logarithm is not defined otherwise (this agrees with what one would expect intuitively, that since is zero for by definition, the region of convergence would be ). Now let us define the new variable
(38) 
This quantity is positive and less than unity at all times. Now, note that we have:
(39) 
So we simplify (37) further and arrive at:
(40) 
Note that the two terms cancel out. Also, note that there are three terms having the factor . These three terms add up to:
(41) 
These simplifications transform (40) into the following:
(42) 
We find the inverse Ztransform of this expression in appendix F. The result is:
(43) 
Replacing by , for and we get:
(44) 
Now to get the degree distribution, we divide this expression by the number of nodes at time , which is equal to . As above, we denote the degree distribution of the initial graph by , that is, is the fraction of nodes at the outset with degree . Thus the final result for the degree distribution for and is the following:
(45) 
Now to find the asymptotic limit of this expression, first by combining (38) and (35), we get
(46) 
Now note that as we have:
(47) 
Thus, the asymptotic behavior of the degree distribution is given by:
(48) 
which simplifies to
(49) 
As we mentioned previously, this asymptotic result was derived in Krapivsky et al. (2000); Krapivsky and Redner (2001, 2002). Also, for large values of , this pertains to the power law derived in Barabasi et al. (1999).
Figure 6 shows for , for a 6regular ring of total 50 nodes. It is seen in the figure that the second majority comprises of newcomers. The initial nodes of degree 6 who have received no new link are most frequent. Those who have received one new link and hence have degree 7 are 3rdmost frequent.
Figure 7 illustrates the simulation results and theoretical predictions for a ring (2regular) of 30 nodes, and is presented as a function of time, for . As seen in the figure, nodes with degree 2 are mostly the initial nodes who have received no link from the newcomers, and their population diminishes as they receive new links and consequently turn into nodes of degree 3.
The mean and variance of simulations are presented in Table 3.
\backslashboxTimeDegree  

t=5 





t=10 





t=15 





t=20 





t=25 





t=30 





t=35 





t=40 




iii.2 Multiple Connection
Now let us consider the preferential attachment scheme again, but this time, each new node attaches to existing nodes. At time , the number of nodes will be . Also, at time , the sum of the degrees of all nodes (which equals twice the number of links) will be . Each newly born node adds one to at that instant. Let us once again denote by . Similar to (31), the evolution of is
(50) 
Taking the Ztransform leads us to:
(51) 
We solve this differential equation via the method of characteristics in appendix G. The result is:
(52) 
where the function is defined as follows:
(53) 
Let us generalize (38) and define the following:
(54) 
As above, this quantity is less than one and tends to one as . Then (52) is simplified to:
(55) 
We invert the generating function in appendix H. Consequently, for and we arrive at:
(56) 
Dividing this by the number of nodes at time , which is equal to , yields the degree distribution at time . As above, let us denote the degree distribution of the initial graph by . The final result for the degree distribution for and is as follows:
(57) 
The equivalence of this result for the special case of with (45) is proved in appendix I.
Now let us focus on the long time behavior, when , we have:
(58) 
Using these values, the asymptotic degree distribution is obtained:
(59) 
which matches (2).
Figure 8 shows for , for a 6regular ring of total 50 nodes. The value of is 3. As seen in the figure, the leftmost peak belongs to those with degree , that are the newly added nodes. Those with degree 6 and 7, that are mostly the initial nodes who have received zero and one links from the newcomers respectively, are first and second most frequent, as seen in the graph.
Figure 9 is a depiction the simulation results and theoretical predictions for a ring (2regular) of 30 nodes, and is presented as a function of time, for . The value of is 3. It is observable in the figure that the network keeps losing nodes of degree 2 (the initial nodes) as they receive links from the newcomers.
The mean and variance of simulations are presented in Table 4.
Let us also compare theoretical predictions and simulation results for a graph which is not regular, which means that the degree of all nodes are not necessarily the same
\backslashboxTimeDegree  

t=5 





t=10 





t=15 





t=20 





t=25 





t=30 





t=35 





t=40 




Iv Summary and Future Work
Previous work in the literature of network growth models mainly focus on the degree distribution of the graph in the asymptotic limit, that is, when the number of nodes tends to infinity and the effect of initial conditions can be neglected. In this contribution we found timedependent expressions for the expected degree distribution, which depend explicitly on the degree distribution of the initial graph. We considered two growth schemes. One in which new nodes choose from existing nodes uniformly at random, and then connect to them, and the other where these probabilities are proportional to degrees. Uniform and multiple attachments for the newlyborn nodes are considered separately for both cases. Simulation results were accompanying theoretical predictions for each case.
One possible extension of the results presented in this work would be as follows. Suppose a given graph is subject to growth. The current state of the graph is known, and the growth mechanism can be approximated to be uniformly at random or be preferential attachment. Suppose quite on the contrary to the previous work in the literature, we are interested in the shorttime behavior of the degree distribution. Then one could employ the results in this work, and expand the expressions in the vicinity of the initial condition up to arbitrary order of , and find the degree distribution perturbatively, to arbitrary precision.
Our analysis focuses on the expected degree distribution. Due to the random nature of the growth process, has a distribution of its own, whose mean value is presented in this work. One can also focus on the variance, or other statistical properties, of this distribution.
V Acknowledgment
This work was funded in part by the Natural Sciences and Engineering Research Council of Canada.
Appendix A Solving Equation (12) for Uniform Single Attachment
The equation is repeated here for easy reference:
(60) 
with the following general form of a first order linear equation in time domain:
(61) 
Multiply both sides by an unknown integrating factor to make both sides equal to . Then