Global clustering coefficient in scale-free weighted and unweighted networks

# Global clustering coefficient in scale-free weighted and unweighted networks

Liudmila  Ostroumova Prokhorenkova Yandex, Moscow, Russia Moscow State University, Moscow, Russia
###### Abstract

In this paper, we present a detailed analysis of the global clustering coefficient in scale-free graphs. Many observed real-world networks of diverse nature have a power-law degree distribution. Moreover, the observed degree distribution usually has an infinite variance. Therefore, we are especially interested in such degree distributions. In addition, we analyze the clustering coefficient for both weighted and unweighted graphs.

There are two well-known definitions of the clustering coefficient of a graph: the global and the average local clustering coefficients. There are several models proposed in the literature for which the average local clustering coefficient tends to a positive constant as a graph grows. On the other hand, there are no models of scale-free networks with an infinite variance of the degree distribution and with an asymptotically constant global clustering coefficient. Models with constant global clustering and finite variance were also proposed. Therefore, in this paper we focus only on the most interesting case: we analyze the global clustering coefficient for graphs with an infinite variance of the degree distribution.

For unweighted graphs, we prove that the global clustering coefficient tends to zero with high probability and we also estimate the largest possible clustering coefficient for such graphs. On the contrary, for weighted graphs, the constant global clustering coefficient can be obtained even for the case of an infinite variance of the degree distribution.

## 1 Introduction

In this paper, we analyze the global clustering coefficient of graphs with a power-law degree distribution. Namely, we consider a sequence of graphs with degree distributions following a regularly varying distribution . It was previously shown in [9] that if a graph has a power-law degree distribution with an infinite variance, then the global clustering coefficient tends to zero with high probability. Namely, an upper bound for the number of triangles is obtained in [9]. In addition, the constructing procedure which allows to obtain the sequence of graphs with a superlinear number of triangles is presented. However, the number of triangles in the constructed graphs grows slower than the upper bound obtained. In this paper, we close this gap by improving the upper bound obtained in [9]. Moreover, we also analyze graphs with multiple edges and show that weighted scale-free graphs with asymptotically constant global clustering coefficient and with an infinite variance of the degree distribution do exist.

The rest of the paper is organized as follows. In the next section, we discuss several definitions of the clustering coefficient for weighted and unweighted graphs. Then, in Section 3, we formally define our restriction on a sequence of graphs. In Sections 5 and 6, we analyze the global clustering coefficient for the unweighted and the weighted case respectively. Section 7 concludes the paper.

## 2 Clustering coefficients

There are two well-known definitions of the clustering coefficient [2, 5] of an unweighted graph. The global clustering coefficient is the ratio of three times the number of triangles to the number of pairs of adjacent edges in . The average local clustering coefficient is defined as follows: , where is the local clustering coefficient for a vertex : , where is the number of edges between the neighbors of the vertex and is the number of pairs of neighbors. Note that both clustering coefficients equal for a complete graph.

It was mentioned in [2, 5] that in research papers either the average local or the global clustering coefficients are considered, and it is not always clear which definition is used. On the other hand, these two clustering coefficients differ: e.g., it was demonstrated in [7] that for networks based on the idea of preferential attachment the difference between these two clustering coefficients is crucial.

It is also reasonable to study the global clustering coefficient for graphs with multiple edges. This agrees well with reality, for example, the Web host graph has a lot of multiple edges: there can be several edges between the pages of two hosts. And even in the Internet graph (vertices are web pages and edges are links between them) multiple edges occur.

We refer to the paper [6] for the definition of the global clustering coefficient for weighted graphs. They propose the following generalization of the global clustering coefficient to multigraphs:

 C1(G)=total value of closed tripletstotal value of % triplets.

There are several ways to define the value of a triplet. First, the triplet value can be defined as the arithmetic mean of the weights of the ties that make up the triplet. Second, it can be defined as the geometric mean of the weights of the ties. Third, it can be defined as the maximum or minimum value of the weights of the ties. In addition to these methods proposed in [6], we also propose the following natural definition of the weight: the weight of a triplet is the product of the weights of the ties. This definition agrees with the following property: the total value of all triplets located in a vertex is close to its degree squared.

## 3 Scale-free graphs

We consider a sequence of graphs . Each graph has vertices. As in [9], we assume that the degrees of the vertices are independent random variables following a regularly varying distribution with a cumulative distribution function satisfying

 1−F(x)=L(x)x−γ,x>0, (1)

where is a slowly varying function, that is, for any fixed constant

 limx→∞L(tx)L(x)=1.

There is another obvious restriction on the function : the function must be a cumulative distribution function of a random variable taking positive integer values with probability 1.

Note that Equation (1) describes a broad class of heavy-tailed distributions without imposing the rigid Pareto assumption. The power-law distribution with parameter corresponds to the cumulative distribution . Further by we denote random variables with the distribution . Note that for any the moment is finite.

Models with and with the global clustering coefficient tending to some positive constant were already proposed (see, e.g., [7]). Therefore, in this paper we consider only the case .

One small problem remains: we can construct a graph with a given degree distribution only if the sum of all degrees is even. This problem is easy to solve: we can either regenerate the degrees until their sum is even or we can add 1 to the last variable if their sum is odd [3]. For the sake of simplicity we choose the second option, i.e., if is odd, then we replace by . It is easy to see that this modification does not change any of our results, therefore, further we do not focus on the evenness.

## 4 Auxiliary results

In this section, we prove several auxiliary lemmas. These lemmas generalize several results from [9]. In order to prove these lemmas we use the following theorem (see, e.g., [1]).

###### Theorem 1 (Karamata’s theorem)

Let be slowly varying and locally bounded in for some . Then

1. for

 ∫xx0tαL(t)dt=(1+o(1))(α+1)−1xα+1L(x),x→∞.
2. for

 ∫∞xtαL(t)dt=−(1+o(1))(α+1)−1xα+1L(x),x→∞.

We also use the following known lemma (its proof can be found, e.g., in [8]).

###### Lemma 1

Let be mutually independent random variables, , , , then

 E(|ξ1+…+ξn|α)≤2α(E(|ξ1|α)+…+E(|ξn|α)).

We need the following notation:

 Sn,c(x)=n∑i=1ξciI[ξi>x],
 ¯Sn,c(x)=n∑i=1ξciI[ξi≤x],

here .

###### Lemma 2

Fix any such that , any such that and , and any . Then for any such that we have

 ESn,c(x)=γγ−cnxc−γL(x)(1+o(1)),n→∞,
 P(|Sn,c(x)−ESn,c(x)|>εESn,c(x))=O⎛⎝(xγnL(x))β−1⎞⎠.
###### Proof

We now assume that .

First, we estimate the expectation of :

 ESn,c(x)=n∫∞xtcdF(t)=−n∫∞xtcd(1−F(t))=−ntc(1−F(t))∣∣∣∞x+nc∫∞xtc−1(1−F(t))dt=nxc−γL(x)+nc∫∞xtc−1−γL(t)dt∼nxc−γL(x)−nc(c−γ)−1xc−γL(x)=γγ−cnxc−γL(x).

Then, we estimate

 E(ξcI[ξ>x])β=1nESn,cβ(x)∼γγ−βcxβc−γL(x)

and get

 P(|Sn,c(x)−ESn,c(x)|>εESn,c(x))≤E|Sn,c(x)−ESn,c(x)|β(εESn,c(x))β=O(nE(ξcI[ξ>x])β(ESn,c(x))β)=O(nxβc−γL(x)nβxβ(c−γ)(L(x))β)=O(n1−βx−(1−β)γ(L(x))1−β).

The case can be considered similarly:

 ESn,0(x)=nP(ξ>x)=nx−γL(x),
 P(|Sn,0(x)−ESn,0(x)|>εESn,0(x))=O(nx−γL(x)(nx−γL(x))β)=O⎛⎝(xγnL(x))β−1⎞⎠.
###### Lemma 3

Fix any such that and any . Then for any such that we have

 E¯Sn,c(x)=γc−γnxc−γL(x)(1+o(1)),n→∞,
 P(|¯Sn,c(x)−E¯Sn,c(x)|>εE¯Sn,c(x))=O(xγnL(x)).
###### Proof

Again, first we estimate the expectation of :

 E¯Sn,c(x)=n∫x0tcdF(t)=−n∫x0tcd(1−F(t))=−ntc(1−F(t))∣∣∣x0+nc∫x0tc−1(1−F(t))dt=−nxc−γL(x)+nc∫x0tc−1−γL(t)dt∼−nxc−γL(x)+nc(c−γ)−1xc−γL(x)=γc−γnxc−γL(x).

Then, we estimate

 E(ξcI[ξ≤x])2=1n¯Sn,2c(x)∼γ2c−γx2c−γL(x)

and get

 P(|¯Sn,c(x)−E¯Sn,c(x)|>εE¯Sn,c(x))≤E|¯Sn,c(x)−E¯Sn,c(x)|2(εE¯Sn,c(x))2=O(nE(ξcI[ξ≤x])2(E¯Sn,c(x))2)=O(nx2c−γL(x)n2x2(c−γ)(L(x))2)=O(xγnL(x)).

We prove two more lemmas. Put .

###### Lemma 4

For any and any

 P(ξmax>n1γ−ε)=1−O(n−α).

Also, for any

###### Proof
 P(ξmax≤n1γ−ε)=[P(ξ≤n1γ−ε)]n=exp(nlog(1−P(ξ>n1γ−ε)))=exp(nlog(1−L(n1γ−ε)n−γ(1γ−ε)))=exp(−nL(n1γ−ε)n−γ(1γ−ε)(1+o(1)))=exp(−L(n1γ−ε)nγε(1+o(1)))=O(n−α),
 P(ξmax>n1γ+ε)≤nP(ξ>n1γ+ε)≤nL(n1γ+ε)n−γ(1γ+ε)=O(n−δ).
###### Lemma 5

For any and any

 P(¯Sn,2(∞)≤n2γ+ε)=1−O(n−δ).
###### Proof

Choose such that . From Lemma 4 we get

 P(ξmax≤n1γ+φ)=1−O(n−δ).

From Lemma 2 and Lemma 3, with probability

 1−O⎛⎜ ⎜ ⎜ ⎜⎝nγ(1γ−φ)nL(n1γ−φ)⎞⎟ ⎟ ⎟ ⎟⎠=1−O(n−δ)

we have

 ¯Sn,2(n1γ−φ)≤(1+ε)γ2−γn2γ+φγ−2φL(n1γ−φ),
 Sn,0(n1γ−φ)≤(1+ε)nφγL(n1γ−φ).

In this case,

 ¯Sn,2(n1γ+φ)≤¯Sn,2(n1γ−φ)+ξmaxSn,0(n1γ−φ)≤(1+ε)γ2−γn2γ+φγ−2φL(n1γ−φ)+n2γ+2φ(1+ε)nφγL(n1γ−φ)≤n2γ+ε

for large enough . This concludes the proof.

Note that we estimated only the upper bound for , since the lower bound can be obtained using the lower bound for . Here we may use the inequality .

## 5 Clustering in unweighted graphs

### 5.1 Previous results

The behavior of the global clustering coefficient in scale-free unweighted graphs was considered in [9]. In the case of an infinite variance, the reasonable question is whether there exists a simple graph (i.e., a graph without loops and multiple edges) with a given degree distribution. The following theorem is proved in [9].

###### Theorem 2

With hight probability there exists a simple graph on vertices with the degree distribution defined in Section 3.

So, with high probability such a graph exists and it is reasonable to discuss its global clustering coefficient. The following upper bound on the global clustering coefficient is obtained in [9].

###### Theorem 3

For any with high probability the global clustering coefficient satisfies the following inequality

 C1(Gn)≤n−(γ−2)22γ+ε.

Taking small enough one can see that with high probability as grows.

In addition, using simulations and empirical observations, the authors of [9] claimed that with high probability there exists a graph with triangles and with the required degree distribution, while the theoretical upper bound on the number of triangles is . For the considered case we have and there is a gap between the number of constructed triangles and the obtained upper bound.

Further in this section we close this gap by improving the upper bound. We also rigorously prove the lower bound.

### 5.2 Upper bound

We prove the following theorem.

###### Theorem 4

For any and any such that with probability the global clustering coefficient satisfies the following inequality

 C1(Gn)≤n−(2−γ)γ(γ+1)+ε.
###### Proof

The global clustering coefficient is

 C1(Gn)=3⋅T(n)P2(n),

where is the number of triangles and is the number of pairs of adjacent edges in .

Since . Therefore, from Lemma 4 we get that for any with probability

 P2(n)>n2γ−δ.

It remains to estimate . Obviously, for any

 T(n)≤|{i:ξi>x}|3+∑i:ξi≤xξ2i. (2)

The first term in (2) is the upper bound for the number of triangles with all vertices among the set . The second term is the upper bound for the number of triangles with at least one vertex among .

From Lemma 2 and Lemma 3 we get

 ∑i:ξi≤xξ2i=¯Sn,2(x)≤(1+ε)γ2−γnx2−γL(x)

with probability  .

Now we can fix . So, with probability

 1−O⎛⎜ ⎜ ⎜ ⎜⎝n−1γ+1L(n1γ+1)⎞⎟ ⎟ ⎟ ⎟⎠=1−O(n−α)

we have

 T(n)≤n3γ+1+δ

Taking small enough , we obtain

 C1(Gn)≤nε−2−γγ(γ+1).

This concludes the proof.

### 5.3 Lower bound

We prove the following theorem.

###### Theorem 5

For any and any such that with probability there exists a graph with the required degree distribution and the global clustering coefficient satisfying the following inequality

 C1(Gn)≥n−(2−γ)γ(γ+1)−ε.
###### Proof

Again,

 C1(Gn)=3⋅T(n)P2(n).

The upper bound for follows from Lemma 5. Fix such that . Then,

 P(P2(n)≤n2γ+ε′)≥P(¯Sn,2(∞)≤n2γ+ε′)=1−O(n−α).

Now we present the lower bound for . Fix any such that . It follows from Lemma 2 that with probability

 Sn,0(n1γ+1+δ)≤(1+ε)n1γ+1−γδL(n1γ+1+δ)≤n1γ+1+δ.

Let us denote by the set of vertices whose degrees are greater than . The size of equals . Since the number of vertices in is not greater than the minimum degree in , a clique on can be constructed. Therefore, with probability

 Sn,0(n1γ+1+δ)≥(1−ε)n1γ+1−γδL(n1γ+1+δ)

and

 3T(n)≥3(Sn,0(n1γ+1+δ)3)≥n3γ+1−(ε−ε′).

Finally, we get

 C1(Gn)=3⋅T(n)P2(n)≥n3γ+1−(ε−ε′)n2γ+ε′=n−(2−γ)γ(γ+1)−ε.

It remains to prove that after we constructed a clique on the set , with high probability we still can construct a graph without loops and multiple edges. This can be easily proved similarly to Theorem 2. Namely, we use the following theorem by Erdős and Gallai [4].

###### Theorem 6 (Erdős–Gallai)

A sequence of non-negative integers can be represented as the degree sequence of a finite simple graph on vertices if and only if

1. is even;

2. holds for .

Let us order the random variables and obtain the ordered sequence . In order to apply the theorem of Erdős and Gallai we assume that the set is now a single vertex with the degree

 deg(A)=Sn,1(n1γ+1+δ)−2(Sn,0(n1γ+1+δ)2)

It is sufficient to prove that with probability the following condition is satisfied

 deg(A)+k∑i=|A|+1di≤(k−|A|)(k−|A|+1)+n∑i=k+1min(di,k−|A|+1) (3)

for all .

Let us now prove that with probability this condition is satisfied. For some large enough if , then

 deg(A)+k∑i=|A|+1di≤(k−|A|)(k−|A|+1).

This holds since with probability

 |A|=Sn,0(n1γ+1+δ)≤(1+ε)n1γ+1−δL(n1γ+1+δ)≤n1γ+1.

and the sum of all degrees grows linearly with :

 P(|Sn,1(0)−nEξ|>n2Eξ)≤4α+1nE|ξ−Eξ|α+1nα+1(Eξ)α+1=O(n−α).

Here we used that .

Finally, consider the case . Note that , so

 n∑i=k+1min(di,k−|A|+1)≥n−C√n.

It remains to show that with probability

 deg(A)+k∑i=|A|+1di≤n−C√n.

It is sufficient to show that

 [C√n]∑i=1di≤n−C√n.

This inequality is easy to prove using Lemma 2. For any with probability we have

 Sn,0(nδ)>C√n

and

 Sn,1(nδ)≤n2γ+13γ≤n−C√n.

Therefore, the condition (3) is satisfied.

## 6 Clustering in weighted graphs

In this section, we analyze the global clustering coefficient of graphs with multiple edges. First, let us note that the case when we allow both loops and multiple edges is not very interesting: we can get a high clustering coefficient just by avoiding triplets. Namely, we can construct several triangles and then just create loops in all vertices. Then, we can connect the remaining half-edges for the vertices with odd degrees. Therefore, further we assume that loops are not allowed. We show that even with this restriction it is possible to obtain a constant global clustering coefficient.

Several definitions of the global clustering coefficient for graphs with multiple edges are presented in Section 2. The following theorem holds for any definition of the global clustering coefficient .

###### Theorem 7

Fix any . For any such that with probability there exists a multigraph with the required degree distribution and the global clustering coefficient satisfying the following inequality

 C1(Gn)≥2−γ2+γ−δ.
###### Proof

Fix some . From Lemma 2 with it follows that with probability

 (1−ε)nx−γL(x)≤Sn,0(x)≤(1+ε)nx−γL(x). (4)

Let us prove that for large enough there always exists such that

 (1+ε)nx−γ0L(x0)≤x0≤(1+2ε)nx−γ0L(x0). (5)

In other words, we want to find such that

 1(1+2ε)n≤x−γ0L(x0)x0≤1(1+ε)n.

Recall that , where is a cumulative distribution function. Therefore, monotonically decreases to zero on . The only problem is that is a discontinuous function. In order to guarantee the existence of the required value , we have to prove that (for large enough ) if , then . This can be proved as follows. For the function it is obvious that if , then . Therefore, in this case, . For large enough (and this leads to large enough ) we have . This concludes the proof of the fact that the required exists.

We take any value that satisfies Equation (5) and further denote it by . Note that, up to a slowly varying multiplier, is of order . Therefore, . From Equations (4) and (5) it follows that with probability the number of vertices with degree greater than (i.e., ) is not larger than . Denote this set of vertices by . In this case, a clique on can be constructed.

In addition, we want all vertices from the set to be connected only to each other. This can be possible, since multiple edges are allowed. If the sum of degrees in is odd, then we allow one edge (from the vertex with the smallest degree in ) to go outside this set.

We are ready to estimate the global clustering coefficient:

 C1(Gn)=total value of closed tripletstotal value of% triplets.

The total value of closed triplets is at least regardless of the definition of the value of a triplet. With probability

 3(Sn,0(x0)3)≥12(1−ε)3n3x−3γ0L3(x0).

The total value of all triplets includes:

• The total value of closed triplets on estimated above,

• The total value of triplets on the remaining vertices, which is not greater than ,

• (optionally) Some unclosed triplets on the vertex with the smallest degree in , if the sum of degrees in is odd.

Since the smallest degree in the set is of order , we can estimate the last two summands in the total value of triplets by

 ¯Sn,2(x0)+O(x20)≤(1+ε)γ2−γnx2−γ0L(x0).

By Lemma 3, this holds with probability .

Finally, with probability we have

 C1(Gn)≥12(1−ε)3n3x−3γ0L3(x)12(1−ε)3n3x−3γ0L3(x0)+(1+ε)γ2−γnx2−γ0L(x0)≥12(1−ε)3n2x−2γ0L2(x0)12(1−ε)3n2x−2γ0L2(x0)+(1+ε)γ2−γ(1+2ε)2n2x−2γ0L2(x0)=12(1−ε)312(1−ε)3+(1+ε)γ2−γ(1+2ε)2≥2−γ2+γ−δ.

for sufficiently small . Here in the second inequality we used Equation 5.

Recall that the loops are not allowed. Therefore, it remains to prove that 1) a multi-clique on can be constructed; 2) a graph on the remaining vertices can be constructed. Note that a multigraph without loops can always be constructed if the maximum degree is not larger than the sum of the other degrees.

A multi-clique on can be constructed if

 ξmax≤Sn,1(x0)−ξmax−x20. (6)

Here is the upper bound for the number of half-edges already involved in the required clique. From Lemma 2, with probability

 (7)

Fix some such that . In this case we have , therefore Lemma 4 gives that

 P(ξmax≤n2γ+1−ε′)=P(ξmax≤n1γ+γ−1γ(γ+1)−ε′)=1−O(n−α). (8)

Now Equation (6) follows immediately from (7), (8), and the fact that is of order .

Similarly, it is easy to show that the graph on the remaining vertices can be constructed:

 x0≤¯Sn,1(x0)−x0

since grows linearly with .

## 7 Conclusion

In this paper, we fully analyzed the behavior of the global clustering coefficient in scale-free graphs with an infinite variance of the degree distribution. We considered both unweighted graphs and graphs with multiple edges. For the unweighted case, we first obtained the upper bound for the global clustering coefficient. In particular, we proved that the global clustering coefficient tends to zero with high probability. We also presented the constructing procedure which allows to reach the obtained upper bound. The situation turns out to be different for graphs with multiple edges. In this case, it is possible to construct a sequence of graphs with an asymptotically constant clustering coefficient.

## References

• [1] N.H. Bingham, C.M. Goldie, and J.L. Teugels, Regular Variation, Cambridge University Press, Cambridge (1987)
• [2] B. Bollobás, O.M. Riordan, Mathematical results on scale-free random graphs, Handbook of Graphs and Networks: From the Genome to the Internet, pp. 1-3 (2003)
• [3] T. Britton, M. Deijfen, and A. Martin-Löf, Generating simple random graphs with prescribed degree distribution, J. Stat. Phys., 124(6), pp. 1377-1397 (2006).
• [4] P. Erdős and T. Gallai, Graphs with given degrees of vertices, Mat. Lapok, 11, pp. 264-274 (1960).
• [5] M. E. J. Newman, The structure and function of complex networks, SIAM Review, vol. 45, pp. 167256 (2003).
• [6] T. Opsahl, P. Panzarasa, Clustering in weighted networks, Social Networks, 31(2), pp/ 155-163 (2009).
• [7] L. Ostroumova, A. Ryabchenko, E. Samosvat, Generalized Preferential Attachment: Tunable Power-Law Degree Distribution and Clustering Coefficient, Algorithms and Models for the Web Graph, Lecture Notes in Computer Science, vol. 8305, pp. 185–202 (2013).
• [8] L. Ostroumova, E. Samosvat, Recency-based preferential attachment models, http://arxiv.org/abs/1406.4308 (2014).
• [9] L. Ostroumova Prokhorenkova, E. Samosvat, Global Clustering Coefficient in Scale-free networks, Volume 8882 of the Lecture Notes in Computer Science series, pp. 47–58 2014.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters