Generalized preferential attachment: tunable power-law degree distribution and clustering coefficient

# Generalized preferential attachment: tunable power-law degree distribution and clustering coefficient

Liudmila Ostroumova Yandex, Moscow, Russia Moscow State University, Moscow, Russia    Alexander Ryabchenko Yandex, Moscow, Russia Moscow Institute of Physics and Technology, Moscow, Russia    Egor Samosvat Yandex, Moscow, Russia Moscow Institute of Physics and Technology, Moscow, Russia
###### Abstract

We propose a common framework for analysis of a wide class of preferential attachment models, which includes LCD, Buckley–Osthus, Holme–Kim and many others. The class is defined in terms of constraints that are sufficient for the study of the degree distribution and the clustering coefficient. We also consider a particular parameterized model from the class and illustrate the power of our approach as follows. Applying our general results to this model, we show that both the parameter of the power-law degree distribution and the clustering coefficient can be controlled via variation of the model parameters. In particular, the model turns out to be able to reflect realistically these two quantitative characteristics of a real network, thus performing better than previous preferential attachment models. All our theoretical results are illustrated empirically.

Keywords: networks, random graph models, preferential attachment, power-law degree distribution, clustering coefficient

## 1 Introduction

44footnotetext: The authors are given in alphabetical order

Numerous random graph models have been proposed to reflect and predict important quantitative and topological aspects of growing real-world networks, from Internet and society [1, 5, 8] to biological networks [2]. Such models are of use in experimental physics, bioinformatics, information retrieval, and data mining. An extensive review can be found elsewhere (e.g., see [1, 5, 6]). Though largely successful in capturing key qualitative properties of real-world networks, such models may lack some of their important characteristics.

The simplest characteristic of a vertex in a network is the degree, the number of adjacent edges. Probably the most extensively studied property of networks is their vertex degree distribution. For the majority of studied real-world networks, the portion of vertices with degree was observed to decrease as , usually with , see [3, 5, 9, 16]. Such networks are often called scale-free.

Another important characteristic of networks is their clustering coefficient, a measure capturing the tendency of a network to form clusters, densely interconnected sets of vertices. Various definitions of the clustering coefficient can be found in the literature, see [6] for a discussion on their relationship. We consider the most popular two: the global clustering coefficient and the average local clustering coefficient (see Section 3.3 for definitions). For the majority of studied real-world networks, the average local clustering coefficient varies in the range from to and does not change much as the network grows [5]. Modeling real-world networks with accurately capturing not only their power-law degree distribution, but also clustering coefficient, has been a challenge.

In order to combine tunable degree distribution and clustering in one model, some authors [2, 20, 21] proposed to start with a concrete prior distribution of vertex degrees and clustering and then generate a random graph under such constraints. However, adjusting a model to a particular graph seems to be not generic enough and can be suspected in “overfitting”. A more natural approach is to consider a graph as the result of a random process defined by certain reasonable realistic rules guaranteeing the desired properties observed in real networks. Perhaps the most widely studied realization of this approach is preferential attachment. In Section 2, we give a background on previous studies in this field.

In this paper, we propose a new class of preferential attachment random graph models thus generalizing some previous approaches. We study this class theoretically: we prove the power law for the degree distribution and approximate the clustering coefficient. We demonstrate that in preferential attachment graphs two definitions of the clustering coefficient give quite different values. We also propose a concrete parameterized model from our class where both the power-law exponent and the clustering coefficient can be tuned. All our theoretical results are illustrated experimentally.

The remainder of the paper is organized as follows. In Section 2, we give a background on previous studies of preferential attachment models. In Section 3, we propose a definition of a new class of models, and obtain some general results for all models in this class. Then, in Section 4, we describe one particular model from the proposed class. We demonstrate results obtained for graphs generated in this model in Section 5. Section 6 concludes the paper.

## 2 Preferential Attachment Random Graph Models

In 1999, Barabási and Albert observed [3] that the degree distribution of the World Wide Web follows the power law with the parameter . As a possible explanation for this phenomenon, they proposed a graph construction stochastic process, which is a Markov chain of graphs, governed by the preferential attachment. At each time step in the process, a new vertex is added to the graph and is joined to different vertices already existing in the graph chosen with probabilities proportional to their degrees.

Denote by the degree of a vertex in the growing graph at time . At each step edges are added, so we have . This observation and the preferential attachment rule imply that

 P(dn+1v=d+1∣dnv=d)=d2n, (1)

where denotes the probability of an event. Note that the condition (1) on the attachment probability does not specify the distribution of vertices to be joined to, in particular their dependence. Therefore, it would be more accurate to say that Barabási and Albert proposed not a single model, but a class of models. As it was shown later, there is a whole range of models that fit the Barabási–Albert description, but possess very different behavior.

###### Theorem 2.1 (Bollobás, Riordan [6])

Let , , be any integer valued function with and for every , such that as . Then there is a random graph process satisfying (1) such that, with probability , has exactly triangles for all sufficiently large .

In [7], Bollobás and Riordan proposed a concrete precisely defined model of the Barabási–Albert type, known as the LCD-model, and proved that for , the portion of vertices with degree asymptotically almost surely obeys the power law with the parameter . Recently Grechnikov substantially improved this result [17] and removed the restriction on . It was shown also that the expectation of the global clustering coefficient in the model is asymptotically proportional to and therefore tends to zero as the graph grows [6].

One obtains a natural generalization of the LCD-model, requiring the probability of attachment of a new vertex to a vertex to be proportional to , where is a constant representing the initial attractiveness of a vertex. Buckley and Osthus [10] proposed a precisely defined model with a nonnegative integer . Móri [19] generalized this model to real . For both models, the degree distribution was shown to follow the power law with the parameter in the range of small degrees. The recent result of Eggemann and Noble [15] implies that the expectation of the global clustering coefficient in the Móri model with is asymptotically proportional to . For , the Móri model is almost identical to the LCD-model. Therefore the authors of [15] emphasize the confusing difference between clustering coefficients ( versus ).

The main drawback of the described preferential attachment models is unrealistic behavior of the clustering coefficient. In fact, for all discussed models the clustering coefficient tends to zero as a graph grows, while in the real-world networks the clustering coefficient is approximately a constant [5].

A model with asymptotically constant (average local) clustering coefficient was proposed by Holme and Kim [18]. The idea is to mix preferential attachment steps with the steps of triangle formation. This model allows to tune the clustering coefficient by varying the probability of the triangle formation step. However, experiments and empirical analysis show that the degree distribution in this model obeys the power law with the fixed parameter close to , which does not suit most real networks. RAN (random Apollonian network) proposed in [22] is another interesting example of a Barabási-Albert type model with asymptotically constant (average local) clustering.

There is a variety of other models, not mentioned here, that are also based on the idea of preferential attachment. Analyses of properties of all these models are often very similar. In the next section, we consider theorems aimed at simplifying these analyses and providing a general framework for them. In order to do this, we define a new class of preferential attachment models that generalizes models mentioned above, as well as many others. We also propose a new parameterized model which belongs to this class that allows to tune both the power-law exponent and the clustering coefficient by adjusting the parameters.

## 3 Theoretical Results

In this section, we define a general class of preferential attachment models. For all models in this class we are able to prove the power-law degree distribution. If an additional property is fulfilled, we are able to analyze the behavior of the clustering coefficient as the network grows.

### 3.1 Definition of the Pa-class

Let () be a graph with vertices and edges obtained as a result of the following random graph process. We start at the time from an arbitrary graph with vertices and edges. On the -th step (), we make the graph from by adding a new vertex and edges connecting this vertex to some vertices from the set . Denote by the degree of a vertex in . If for some constants and the following conditions are satisfied

 P(dn+1v=dnv+j∣Gnm)=O((dnv)2n2),2≤j≤m,1≤v≤n, (4)
 P(dn+1n+1=m+j)=O(1n),1≤j≤m, (5)

then we say that the random graph process is a model from the -class. Condition (5) means that the probability to have a self-loop in the added vertex is small. As we will show later, certain minor details of the models from this class, such as whether loops and multiple edges are allowed, are irrelevant.

Since we add edges at each step, summing up the equalities (3)-(5) (with corresponding coefficients) over all vertices and neglecting error terms we get . It is possible to prove that the sum of error terms in this case is , but for simplicity we just set . Furthermore, we have (for (3) we need and we set , therefore ).

Here we want to emphasize that we indeed defined not a single model but a class of models. Even fixing values of parameters and does not specify a concrete procedure for constructing a network. What this definition lacks is the precise description of the distribution of vertices a new incoming vertex is being connected to, and therefore there is a range of models possessing very different properties and satisfying the conditions (25). For example, the LCD, the Holme–Kim and the RAN models belong to the -class with and . The Buckley–Osthus (Móri) model also belongs to the -class with and . Another example is considered in detail in Sections 4 and 5. This situation is somewhat similar to that with the definition of the Barabási–Albert models, though our class is wider in a sense that the exponent of the power-law degree distribution is tunable.

In mathematical analysis of network models, there is a tendency to consider only fully and precisely defined models. In contrast, we provide results about general properties for the whole -class in the next two subsections.

### 3.2 Power Law Degree Distribution

Even though the precise description of the distribution of vertices a new incoming vertex is going to be connected to is not specified, we are still able to describe the degree distribution of the network.

First, we estimate , the number of vertices with given degree in . We prove the following result on the expectation of .

###### Theorem 3.1

For every we have , where

 c(m,d)=Γ(d+BA)Γ(m+B+1A)AΓ(d+B+A+1A)Γ(m+BA)d→∞\scalebox1.5[1]$∼$Γ(m+B+1A)d−1−1AAΓ(m+BA),

and is the gamma function.

Second, we show that the number of vertices with given degree is highly concentrated around its expectation.

###### Theorem 3.2

For every model from the PA-class and for every we have

 P(|Nn(d)−ENn(d)|≥d√nlogn)=O(n−logn).

Therefore, for any there exists a function such that

 limn→∞P(∃d≤nA−δ4A+2:|Nn(d)−ENn(d)|≥φ(n)ENn(d))=0.

These two theorems mean that the degree distribution follows (asymptotically) the power law with the parameter .

Theorem 3.1 is proved by induction on and . It is easy to see that given a graph , we can express the conditional expectation of the number of vertices with degree in (i.e., ) in terms of . Here we use the fact that the probability of having an edge between the vertex and a vertex depends on the degree of (see (2)). Using the law of total expectation we obtain the recurrent relation for and prove the statement of Theorem 3.1 by induction.

We use the Azuma–Hoeffding inequality to prove the concentration result of Theorem 3.2. In order to do this, we consider the martingale , . The complete proofs of these theorems are technical and are placed in Appendix due to space constraints.

### 3.3 Clustering Coefficient

Here we consider the clustering coefficient in models of the -class. There are two popular definitions of the clustering coefficient. The global clustering coefficient is the ratio of three times the number of triangles to the number of pairs of adjacent edges in G. The average local clustering coefficient is defined as follows: , where is the local clustering coefficient for a vertex : , where is the number of edges between neighbors of the vertex and is the number of pairs of neighbors. Results for some classical preferential attachment models (LCD and Móri) are mentioned in Section 2.

Here we generalize these results. First, we study the random variable equal to the number of ’s in a random graph from an arbitrary model that belongs to the PA-class. In the theorems below, we use the following notation. By whp (“with high probability”) we mean that for some sequence of events, as . We say if , and we say if for some constants .

###### Theorem 3.3

For every model from the -class, we have

• if , then whp

• if , then whp

• if , then for any whp .

The ideas of the proof of Theorem 3.3 are given in Appendix. Here it is worth noting that the value in scale-free graphs is usually determined by the power-law exponent . Indeed, we have where is the maximum possible degree of a vertex in . Therefore if , then is linear in . However, if , then is superlinear.

Next, we study the random variable equal to the number of triangles in . Note that in any model from the -class we have since at each step we add at most triangles. If we combine this fact with the previous observation, we see that if , then in any preferential attachment model (in which the out-degree of each vertex equals ) the global clustering coefficient tends to zero as grows.

Our aim is to find models with constant clustering coefficient. Let us consider a subclass of the -class with the following property:

 P(dn+1i=dni+1,dn+1j=dnj+1∣Gnm)=eijDmn+O(dnidnjn2). (6)

Here is the number of edges between vertices and in and is a positive constant. Note that this property still does not define the correlation between edges completely.

###### Theorem 3.4

Let satisfy the condition (6). Then whp

The proof of this theorem is straightforward. The expectation of the number of triangles we add at each step is . The fact that the sum of over all adjacent vertices is can be shown by induction using the conditions (25). It is also possible to first prove that the maximum degree grows as and then use this fact to estimate the sum of error terms. Therefore . The Azuma–Hoeffding inequality can be used to prove concentration.

As a consequence of Theorems 3.3 and 3.4, we get the following result on the global clustering coefficient of the graph .

###### Theorem 3.5

Let belong to the -class and satisfy the condition (6). Then

• If then whp

• If then whp

• If then for any whp

Theorem 3.5 shows that in some cases () the global clustering coefficient tends to zero as the number of vertices grows. We empirically show in Section 5 that the average local clustering coefficient behaves differently.

The theoretical analysis in this case is much harder, but we can easily show why does not tend to zero if the condition (6) holds. From Theorems 3.1 and 3.2 it follows that whp the number of vertices with degree in is greater than for some positive constant . The expectation of the number of triangles we add at each step is . Therefore whp

In the next section we introduce a concrete nontrivial model from the -class.

## 4 Polynomial Model

In this section, we consider polynomial random graph models that belong to the general -class defined above. Applying our theoretical results to polynomial models, we find the model to be very flexible: one can tune the parameter of the degree distribution and the clustering coefficient.

Definition of Polynomial Model Let us define the polynomial model. As in the random graph process from Subsection 3.1, we construct a graph step by step. On the -th step the graph is made from the graph by adding a new vertex and sequentially drawing edges (multiple edges and self-loops are allowed).

We say that an edge is directed from to if , so the out-degree of each vertex equals . We also say that and are respectively source and target ends of . We consider different approaches to add new edges from the vertex . We first choose an edge from the existing graph uniformly and independently at random and then have three options:

• Preferential attachment (PA): draw one edge from to the target end of the chosen edge

• Uniform (U): draw one edge from to the source end of the chosen edge

• Triangle formation (TF): draw two edges from to target and source ends of the chosen edge

Let us now specify how to draw edges from the vertex . Consider a collection of positive parameters for and such that , these parameters fully define our model. At the beginning of the step with probabilities we choose some and , then we draw edges using PA, edges using TF and edges using U. This random graph process defines the polynomial model and from the definition it follows that graphs in this model can be generated in linear time. This model belongs to the -class. Indeed, one can formally show by simple calculations that the conditions (25) hold for this model.

At this point the model is defined but let us explain why we call it polynomial. Denote by the in-degree of a vertex in . Let us recall that by we denote the number of edges between vertices and . For every such that and , let This is a monomial depending on and . We define the polynomial . It is easy to check that

 P(edges e1,…,em go to vertices i1,…,im, respectively)==m/2∑k=0m−2k∑l=0αk,lMn,mk,l(i1,…,im). (7)

Many models are special cases of the polynomial model. If we consider the polynomial , then we obtain a model that is practically identical to the LCD-model. The Buckley–Osthus model can be also interpreted in terms of the polynomial model.

Properties It is easy to check that the parameters from (7) and from (2) are related in the following way:

 A=∑αk,ll+km. (8)

This means that we can use an arbitrary value of and any power-law exponent in the graph generation. Also note that  .

In the next section we analyze experimentally some properties of graphs in the polynomial model. We generate polynomial graphs and compare their properties with theoretical results we obtained.

## 5 Experiments

In this section, we choose a three-parameter model from the family of polynomial graph models defined in Section 4 and analyze the properties of the generated graphs depending on the parameters.

### 5.1 Description of Empirically Studied Polynomial Model

We study empirically graphs in the polynomial model with and the probability to draw edges to vertices equals

 p∏k=1⎛⎜⎝αˆdni2kˆdni2k−1(mn)2+βei2ki2k−12mn+δ(n)2⎞⎟⎠.

Here we need and , therefore, we have three independent model parameters: , , and . Note that here we write the polynomial in a symmetric form as we ignore the order of edges.

Based on our theoretical results, we have certain expectations about the properties of generated graphs. From (8) we obtain that in this model , , therefore, due to Theorem 3.1 and Theorem 3.5, we get that

 C1(n)∼3(1−2α−β)β5m−1−2(2m−1)(2α+β),% γ=1+22α+β. (9)

### 5.2 Empirical Results

Degree Distribution and Clustering Coefficient First, we study two polynomial graphs with , , and , assigning for the first graph and for the second one. The observed degree distributions are almost identical and follow the power law with the expected parameter , see Fig. 1a.

For both cases, we also study the behavior of the global and the average local clustering coefficients of generated graphs, samples for each , see Fig. 1bc. In the first case we observe , (as ) and in the second case (as was expected due to (9)) and .

We also generate graphs with , , and varying (we took and ). In other words, we fix the probability of a triangle formation and vary the parameter of the power-law degree distribution. The obtained results are shown in Fig. 2a. The behavior of the clustering coefficients is quite different. If grows, then grows (therefore ), the number of vertices with small degrees and hence high local clustering also grows (therefore increases).

To demonstrate the difference between the global clustering and the average local clustering we generated graphs with , , and varying (Fig. 2b). In this case we have and , as expected. However, for the local clustering we obtain .

Comparison with Other Models The following table summarizes our results for the polynomial model in comparison with other mentioned preferential attachment models:

 A D γ Global clustering Average local clustering LCD 1/2 0 3 tends to zero tends to zero BO/Móri 1/(2+β) 0 (2,∞) tends to zero tends to zero HK 1/2 mt 3 tends to zero constant RAN 1/2 3 3 tends to zero constant Polynomial ∑αk,ll+km ∑kαk,l (2,∞) constant for A<12 constant

The polynomial model seems to be the only model where one can control the exponent in the power law of the degree distribution, and at the same time guarantee a positive clustering coefficient.

## 6 Conclusions

In this paper, we introduced the -class of random graph models that generalizes previous preferential attachment approaches. We proved that any model from the -class possesses the power-law degree distribution with tunable parameter. We also estimated its clustering coefficient. Next, we described one particular model from the proposed class (with tunable both the degree distribution parameter and the clustering coefficient). Experiments with generated graphs illustrated our theoretical results. We also demonstrated different behavior of two versions of the clustering coefficient in preferential attachment models.

As the degree distribution of a preferential attachment model allows adjustment to reality, the clustering coefficient still gives rise to a problem in some cases. For most real-world networks the parameter of their degree distribution belongs to . As we showed in Section 3, once in a preferential attachment model, the global clustering coefficients decreases as the graph grows, which does not correspond to the majority of real-world networks. The reason is that the number of edges added with a new vertex at each step is a constant and consequently the number of triangles grows too slowly.

Fortunately, there are many ways to overcome this obstacle. Cooper proposed a model in which the number of added edges is a random variable [11]. In collaboration with Prałat he also considered a modification of the Barabási–Albert model, where a new vertex added at time generates edges [13]. Preferential attachment models with random initial degrees were considered in [14]. Also there are models with adding edges between already existing nodes (e.g. [12]). Using one of these ideas for the -class is a topic for future research.

Acknowledgements Special thanks to Evgeniy Grechnikov, Gleb Gusev, Andrei Raigorodskii and anonymous reviewers for the careful reading and useful comments.

## References

• [1] R. Albert, A.-L. Barabási, Statistical mechanics of complex networks, Reviews of modern physics, vol. 74, pp. 47–97 (2002)
• [2] S. Bansal, S. Khandelwal, L.A. Meyers, Exploring biological network structure with clustered random networks, BMC Bioinformatics, 10:405 (2009)
• [3] A.-L. Barabási, R. Albert, Science 286, 509 (1999); A.-L. Barabási, R. Albert, H. Jeong, Physica A 272, 173 (1999); R. Albert, H. Jeong, A.-L. Barabási, Nature 401, 130 (1999)
• [4] V. Batagelj, U. Brandes, Efficient generation of large random networks, Phys. Rev. E, vol. 71, 036113 (2005)
• [5] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Complex networks: Structure and dynamics, Physics reports, vol. 424(45), pp. 175-308 (2006)
• [6] B. Bollobás, O.M. Riordan, Mathematical results on scale-free random graphs, Handbook of Graphs and Networks: From the Genome to the Internet, pp. 1-3 (2003)
• [7] B. Bollobás, O.M. Riordan, J. Spencer, G. Tusnády, The degree sequence of a scale-free random graph process, Random Structures and Algorithms, vol. 18(3), pp. 279-290 (2001)
• [8] C. Borgs, M. Brautbar, J. Chayes, S. Khanna, B. Lucier, The power of local information in social networks, preprint (2012)
• [9] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, Graph structure in the web, Computer Networks, vol. 33(16), pp. 309-320 (2000)
• [10] P.G. Buckley, D. Osthus, Popularity based random graph models leading to a scale-free degree sequence, Discrete Mathematics, vol. 282, pp. 53-63 (2004)
• [11] C. Cooper, Distribution of Vertex Degree in Web-Graphs, Combinatorics, Probability and Computing, vol. 15 , pp 637-661, (2006)
• [12] C. Cooper, A. Frieze, A General Model of Web Graphs, Random Structures and Algorithms, 22(3), pp. 311-335 (2003)
• [13] C. Cooper, P. Prałat, Scale-free graphs of increasing degree, Random Structures and Algorithms, vol. 38(4), pp. 396-421, (2011)
• [14] M. Deijfen, H. van den Esker, R. van der Hofstad, G. Hooghiemstra, A preferential attachment model with random initial degrees, Ark. Mat., vol.47, pp. 41-72 (2009)
• [15] N. Eggemann, S.D. Noble, The clustering coefficient of a scale-free random graph, Discrete Applied Mathematics, vol. 159(10), pp. 953-965 (2011)
• [16] M. Faloutsos, P. Faloutsos, Ch. Faloutsos, On power-law relationships of the Internet topology, Proc. SIGCOMM’99 (1999)
• [17] E.A. Grechnikov, An estimate for the number of edges between vertices of given degrees in random graphs in the Bollobás–Riordan model, Moscow Journal of Combinatorics and Number Theory, vol.1(2), pp. 40–73 (2011)
• [18] P. Holme, B.J. Kim, Growing scale-free networks with tunable clustering, Phys. Rev. E, vol. 65(2), 026107 (2002)
• [19] T.F. Móri, The maximum degree of the Barabási-Albert random tree, Combinatorics, Probability and Computing, vol. 14, pp. 339-348, (2005)
• [20] M.Á. Serrano, M. Boguñá, Tuning clustering in random networks with arbitrary degree distributions, Phys. Rev. E, vol. 72(3),036133 (2005)
• [21] E. Volz, Random Networks with Tunable Degree Distribution and Clustering, Phys. Rev. E, vol. 70(5), 056115 (2004)
• [22] T. Zhou, G. Yan and B.-H. Wang, Maximal planar networks with large clustering coefficient and power-law degree distribution journal, Phys. Rev. E, vol. 71(4), 46141 (2005)

## Appendix: Proofs

### Proof of Theorem 3.1

In this proof we use the notation for error terms. By we denote a function such that . We also need the following notation:

 pjn(d):=P(dn+1v=d+j∣dnv=d)=O(d2n2),2≤j≤m. (12)
 pn:=m∑k=1P(dn+1n+1=m+k)=O(1n). (13)

Note that the remainder term of can depend on . We omit in notation for simplicity of proofs.

Put . Note that . We use this equality several times in this proof.

We want to prove that with some constant and some function . The proof is by induction on and then on . First, we prove the theorem for and all . Then, if we proved the theorem for some and all , we are able to prove it for and for all .

We use the following equalities

 E(Ni+1(m)∣Ni(m))=Ni(m)(1−pi(m))+1−pi, (14)
 E(Ni+1(d)∣Ni(d),Ni(d−1),…,Ni(d−m))=Ni(d)(1−pi(d))++Ni(d−1)p1i(d−1)+m∑j=2Ni(d−j)pji(d−j)+O(pi). (15)

Consider the case . For constant number of small we obviously have with some . Assume that . From (14) we obtain

 ENi+1(m)=ENi(m)(1−pi(m))+1−pi==(iAm+B+1+θ(C1))(1−pi(m))+1+θ(C2/i)==i+1Am+B+1+θ(C1)(1−pi(m))+θ(C3i)1Am+B+1+θ(C2/i).

It remains to show that

 C1pi(m)≥C3i(Am+B+1)+θ(C2/i).

We have . It gives us

 C1(Am+B)≥C1C0i+C3Am+B+1+C2.

This equality holds for large and . This completes the proof for .

Remind that the proof is by induction on and . Consider and assume that we can prove the theorem for all smaller degrees. Now we use induction on .

We have , therefore . In particular, for , where the constant depends only on the parameters of the model and will be defined later, we have with some . Assume that

 ENi(d)=c(m,d)(i+θ(Cd2+1/A)).

From (15) we obtain

We need to prove that there exists a constant that

 Cd2+1/Ai≥C7Cd4+1/Ai2+C8Cd1+1/Ai+C9d2+1/Ai.

This inequality holds for large and . For constant number of small there exists a function such that

Thus the final is . This concludes the proof.

### Proof of Theorem 3.2

To prove Theorem 3.2 we need the Azuma–Hoeffding inequality:

###### Theorem .1 (Azuma, Hoeffding)

Let be a martingale such that for any . Then

 P(|Xn−X0|≥x)≤2e−x22∑ni=1c2i

for any .

Suppose we are given some . Fix and : . Consider the random variables , .

Let us explain the meaning of the random variable . For any let be the expectation of the number of vertices with degree we may have at the step of the process if we fix first steps of the evolution and allow the rest steps to be arbitrary. Note that and . It is easy to see that is a martingale.

We will prove below that for any

 |Xi+1(d)−Xi(d)|≤Md,

where is some constant. Theorem follows from this statement immediately. Put for all . Then from Azuma–Hoeffding inequality it follows that

 P(|Nn(d)−ENn(d)|≥d√nlogn)≤2exp{−nd2log2n2nM2d2}=O(n−logn).

If , then the value of is considerably greater than . This is exactly what we need.

It remains to estimate the quantity . The proof is by a direct calculation.

Fix and some graph . Note that

Put , . We need to estimate the difference .

For put

 δit(d)=E(Nt(d)∣^Gi+1m)−E(Nt(d)∣¯Gi+1m).

First let us note that for , then we have for some constant .

Now we want to prove that by induction. Suppose that . Fix . Graphs and are obtained from the graph by adding the vertex and edges. Therefore

Now consider : , . Note that

 E(Nt+1(m)∣Gim)=E(Nt(m)∣Gim)(1−pt(m))+1+O(1/t),
 E(Nt+1(d)∣Gim)=E(Nt(d)∣Gim)(1−pt(d))+
 +E(Nt(d−1)∣G<