Some Considerations on Six Degrees of Separation from A Theoretical Point of View

# Some Considerations on Six Degrees of Separation from A Theoretical Point of View

## Abstract

In this article we discuss six degrees of separation, which has been proposed by Milgram, from a theoretical point of view. Simply if one has friends, the number of indirect friends goes up to in degrees of separation. So it would easily come up to population of whole world. That, however, is unacceptable. Mainly because of nonzero clustering coefficient , does not become . In this article, we first discuss relations between six degrees of separation and the clustering coefficient in the small world network proposed by Watt and Strogatz,. Especially, conditions that (population of U.S.A or of the whole world) arises in the WS model is explored by theoretical and numerical points of view. Secondly we introduce an index that represents velocity of propagation to the number of friends and obtain an analytical formula for it as a function of , , which is an average degree over all nodes, and some parameter concerned with network topology. Finally the index is calculated numerically to study the relation between , and and .

keywords: Six Degrees of separation, Small world Network, Propagation Coefficient, Watt-Strogatz Model, Clustering Coefficient, Average Path Length

## 1 Introduction

In 1967, Milgram made a great impact on the world by advocating the concept ”six degrees of separation” by an social experiment in a celebrated paper . ”Six degrees of separation” shows that people have a narrow circle of acquaintances. A series of social experiments made by him suggest that all people in USA are connected through about 6 intermediate acquaintances. In this paper we inspect from a rather theoretical point of view that this phenomenon, so called ”six degrees of separation” is not so surprising and if anything natural one.

This article is first motivated by a following simple consideration; If every person has acquaintances, so that after steps of intermediate acquaintances the person would be able to convey a mail, in general information, to about persons. With the proviso, however, that the network of acquaintances has a tree structure without any loop ( no clustering coefficient), evaluating it in more detail, it is

 S=L−1∑i=0K(K−1)i=K(KL−1−1)K−1, (1)

where it is assumed that the relation of acquaintances is symmetric. Thus information will spread out among exponentially many persons from one person with steps. Though a person that received information, of course, may convey the information to only a part of his/her acquaintances, when a network has a tree structure, six degrees of separation for any two persons would not be so mysterious.

Real networks, however, naturally have structures with loops. If there is some loops or an effective clustering coefficient in the network of acquaintances, this discussion will greatly altered. One of aims of this article is to evaluate the effect of clustering coefficient on propagation of information. How much does the clustering coefficient reduce the population that information is provided ? We consider it from three points of view. They consist of the following three;

1. To inspect six degrees of separation for Watts-Strogatz type model where analytic expressions for the clustering coefficient and the average path length are found.

2. To inspect six degrees of separation for more general small world networks with uniform clustering coefficient.

3. To consider empirical networks such as a network in Mixi and a network of actor/actress besed on data of Bacon game.

The plan of this article is as follows. In the next section we argue on Watts-Strogatz type small world networks[2, 3] where theoretical evaluation of the average length and the clustering coefficient has been made. We study the propagation of information on the networks by making numerical analyses based on these formulae, To study more general small world networks, we present a propagation coefficient model in the section 3. We can not analyze, however, large class of small world networks including scale free networks due to technical reasons at present. We adopt a homogeneous hypothesis which would be explained in detail in the section 3. Various types of homogeneity are postulated in the propagation coefficient model. In the section 4, we analyze experimental data, such as a network in Mixi and a network based on data of Bacon game, as to estimate clustering coefficient. Concluding remarks are given in the last section.

## 2 Watts-Strogatz Model

In this section we investigate the effect of the clustering coefficient on diffusion of information in Watts-Strogatz type small world networks. The analytic expressions for the clustering coefficient and the average path length in the networks have been found. For the original Watts-Strogatz version of the model, the clustering coefficient has been given  by

 C(p)=3K−64K−2(1−p)3, (2)

where is the network size, that is, the number of nodes on the network, is the average degree and is the rewiring probability. Moreover the average node-node separation in a modified version of the original Watts-Strogatz model has found in the limit of low density of shortcuts;

 L(p)=2nKF(nKp2),F(x)=12√x2+2xtanh−1√xx+2, for small p. (3)

In the modified version of the original small world model in which shortcuts edges are added between randomly chosen node pairs, no bonds are removed. Here edges are not rewired, rather adding edges thus ensuring that the modified network stays connected. We attempt analyses of six degrees of separation based on these two formulae.

### 2.1 Parameter Regions for Six Degrees of Separation

In this subsection, we explore possible regions of parameters introduced in the previous section for six degrees of separation. First we explore parameters’ regions in order that information can spread among on average persons by steps. Taking and in Eq. (3), we explore the possible regions in a space. We can numerically find them within meaningful values of the parameters. They are shown in Fig.1. As far as is not extremely small that is the small world region, that , the average number of contacts of a person, is several tens is sufficient for information to spresd from one person to persons.

The solution for space where is eliminated from Eq.(1) and Eq.(2) is described in Fig. 2. Though Eq. (3) only holds only at small , then at least we can work out two equations simultaneously. Since as becomes large, becomes small, the validity of the analysis would be lost for small . Fig. 2 asserts that it is sufficient that is several tens for relatively large values of . FigD1. p-K plots in SW model. FigD2. C-K plots in SW model.

@

Through these analyses, we conclude that following regions of parameters are roughly needed;

 C = 0.55∼0.75, p = 0.01∼0.04, K = 15∼25,

in order that information can spread from one person to persons at about 6 steps on Watts-Strogatz type small world networks.

### 2.2 Population propagated at L=6

In this subsection we consider that to how many persons information can propagate at exact six steps from a person. We find adequate regions in a parameters’ space when falls between and at . For each , typical values of are listed in Table 1.

Table 1. and for

K 9 10 12 15 20 25 49 100 147 194
C 0.54 0.57 0.61 0.64 0.68 0.69 0.724 0.738 0.74 0.744

As becomes large, obviously becomes large. From Table 1, is not so large even for rather large . Thus information can readily spread to about a billion persons at 6 steps, if took fairly small value.

Table 2. Empirical Networks with large

 Network Number of Vertices C Index Company directors 7673 14.4 0.59 Coauthorships 56627 0.726 4 1.2 in the SPIREES e-archive Collaboration net 70975 0.59 2.1294 9.5 collected from math.journals Collaboration net 209293 0.76 2.4 6 collected from neurosci.journals metabolic network 315 0.59 World Web 470000 0.69/0.44 1.5/2.7 2.65

So far we have seen that it has been possible to realize six degrees of separation even rather large . There, however, are not so many empirical networks with large . Networks with rather large that have discovered so far are listed in Table 2 ,,,,,. Here blanks represent that they are unknown and the index in the rightest column is that for scale free. These networks mostly have scale free nature, so that they are not Watts-Strogatz type small world networks.

## 3 Propagation Coefficient Model

We study Milgram-like-propagation of information in wider class of networks, including those other than small world networks. First we focus our attention on any one node, which is the node in 0-generation, in general networks. Next we explore all nodes connected with the first node, which are nodes in the 1-generation. Next we explore all nodes connected with the nodes in the 1-generation apart from the nodes of 0-generation, which are nodes in the 2-generation. We continue these procedures until all nodes on a network are covered by these procedures. These procedures are effective in any complex networks. There are only two generations in complete graphs. The maximal generation number of a network is larger than the diameter of the network by 1 and so the maximal generation number is just equal to the diameter. This picture of networks make the analyses of the propagation of information on the network manageable.

Now we introduce some geometrical quantities that used in this article together with their notation.

means the i-th generation and is the numbers of generation .

is the total number of nodes of a network or the size of the network.

is the number of edges from node in to nodes in .

is the number of edges connected between the same generation .

is a contribution to the clustering coefficient produced by edges in .

is the degree of the node .
We define the propagation coefficient from to by as an average of .

We make the assumptions for simplicity of analyses.
1. The size of the network is infinite.
2. A parameter is the possibility that a node has two parents.
3. There is no backflow in the propagation of information.
4. The homogeneous hypothesis ;

 Ci=¯¯¯¯C=const. (4) qj=q=const.butq=0atG0generation. (5) K(j)=K=const(nearlyequaldegree). (6)

Under these assumptions, the following relations hold;

 ¯¯¯ki,i+1 = ∑nij∈Gik(j)i,i+1ni, (7) K = 1+2ki,ini+¯¯¯ki,i+1, (8) ni+1 = ¯¯¯ki,i+1ni(1−q), (9) N = d∑ini. (10)

In order to investigate the effect of the clustering coefficient upon the propagation of information on a network, we consider when does the clustering coefficient increase in this picture of networks. Notice that there are two cases that make a contribution to the clustering coefficient in this picture of networks. One is the case that ages are linked together with nodes in the same generation. The other is the case that one node has two parent nodes that are linked each other. They are shown in the Fig.3. FigD3. Two patterns that contribute to a clustering coefficient in generation. The possibility that the right hand pattern occurs is . @@

@

Since each node has edges from the assumption 4., is generally obtained by

 Ci=ti(q)KC2. (11)

We investigate two types of contributions to in order.

First we consider the case in the left hand side of Fig.3, that is to say . When one edge between the same generation is added, the average probability that the edge is just one between children nodes with a common parent node in is . So as edges in are totally added, the number of triangles in become

 ti(0)=¯¯¯ki−1,iC2ki,ini−1niC2=ki,i(¯¯¯ki−1,i−1)¯¯¯ki−1,ini−1−1=ki,i(¯¯¯ki−1,i−1)∏il−1¯¯¯kl−1,l−1, (12)

since there are families in and at in Eq. (9) is used in the last equality. We can obtain explicit expressions for first a few ;

 t1(0) = k1,1, t2(0) = k2,2¯¯¯k1,2−1K¯¯¯k1,2−1, t3(0) = k3,3¯¯¯k2,3−1K¯¯¯k1,2¯¯¯k2,3−1, ⋯ ⋯ ⋯. (13)

In the other case, the possibility that two parents are linked in their generation is and the number that nodes in have two parent nodes is . Combining these two facts, the contribution to of this case is given by

 t′(q)=(qni−1¯¯¯ki−1,i)ki−1,i−1ni−1C2=2q¯¯¯ki−1,iki−1.i−1ni−1−1=2q¯¯¯ki−1,iki−1.i−1∏il−1¯¯¯kl−1,l(i−q)1−q−1, (14)

where Eq.(9) is used in the last equation.

Thus is finally obtained by

 ti(q)=1niC2(ki,ini×¯¯¯ki−1,iC2)+1ni−1C2(qki−1,i−1ni−1¯¯¯ki−1,i), (15)

where each term in parentheses shows the contribution from the left hand figure and the right one in the Fig.3, respectively.

Thus the clustering coefficients in the generation are obtained by

 Ci=¯¯¯ki−i,ini−1K(K−1)((¯¯¯ki−1,i−1)(K−1−¯¯¯ki,i+1)¯¯¯ki−1,ini−1−1+2q(K−1−¯¯¯ki−1,i)ni−1−1), (16)

where the equation derived from Eq.(4)

 ki,i=ni2(K−1−¯¯¯ki,i+1)=ni−1¯¯¯ki−i,i2(K−1−¯¯¯ki,i+1) (17)

is used.

By using and , we can give explicit expressions for first a few ;

 C1 = 1−¯¯¯k1,2K−1, C2 = ¯¯¯k1,2K−1((¯¯¯k1,2−1)(K−1−¯¯¯k2,3)K¯¯¯k1,2−1+2q(K−1−¯¯¯k2,3)K−1), ⋯ ⋯ ⋯. (18)

Using the homogeneous hypothesis , we can express every in terms of the . By way of example, we obtain

 ¯¯¯k1,2 = (K−1)(1−¯¯¯¯C), ¯¯¯k2,3 = (K−1)−¯¯¯¯C(1−2q(1−¯¯¯¯C)1−¯¯¯¯CK(K−1)(1−¯¯¯¯C)(K−1)(1−¯¯¯¯C)−1), ⋯ ⋯ ⋯. (19)

In general we notice that the following recursion relation is satisfied;

 Ai,i+1=¯¯¯¯CK(K−1)(ni−1−1)−2qAi−1,i¯¯¯ki−1,ini−1¯¯¯ki−1,i(¯¯¯ki−1,i−1)ni−1(ni−1−1), (20)

where

 Ai,i+1≡K−1−¯¯¯ki,i+1. (21)

means the number of edges connected between the same generation . We can numerically solve this recursion relation with respect to .

In order to get numerical value of , we need to fix three parameters, , , and . When is calculated, is found from Eq. (9) and (10). If network topology was a tree structure, information would spread over persons at steps as stated before. In the present case, it is inferred that the clustering coefficient has strong influence on the propagation of information. Then the spread of information would be restricted to rather less persons in networks with large . We consider how the spread of information is restricted owing to . We measure it by propagation ratio that the ratio of the population really conveyed in the present case to ;

 R=NKl. (22)

We evaluate at with a wide range of the three parameters , , and within positive . They are listed in Table 3.

Table 3. Propagation ratio and in parameter space at

K/q q=0 q=0.1 q=0.2 q=0.3 q=0.4 q=0.5
0.996-0.229 0.996-0.032 0.996-0.020 0.996-0.025 0.996-0.028 0.996-0.013
0.29 0.33 0.38 0.42 0.46 0.51
0.961-0.035 0.961-0.037 0.961-0.025 0.961-0.029 0.961-0.016 0.961-0.018
0.28 0.32 0.37 0.41 0.46 0.50
0.876-0.004 0.876-0.041 0.876-0.028 0.876-0.001 0.876-0.001 0.876-0.023
0.27 0.31 0.36 0.4 0.46 0.49
0.82-0.018 0.82-0.0015 0.82-0.0054 0.82-0.0102 0.82-0.0147 0.82-0.0175
0.27 0.32 0.36 0.4 0.44 0.48
0.664-0.046 0.664-0.041 0.664-0.039 0.664-0.006 0.664-0.006 0.664-0.030
0.21 0.25 0.29 0.33 0.39 0.42

We find the following facts that by observing in entire parameter region of -space;
(1),
(2),
(3).

Information can spread over a large portion of , even when has rather a large value. It is like ”small world network” proposed by Watts and Strogatz,. From Table 3, we find that a person needs only have about 50 acquaintances in order that information can spread over a few hundred million from only a person even at the worst with the largest clustering coefficient. This condition may be satisfied rather easily in the actual society.

## 4 Some Empirical Networks

We consider data on Mixi and Bacon Game as suitable networks for our aim.

### 4.1 Mixi Data

We can interpret Table 4 to mean that a person can convey information to how many persons at each generation. First three columns in the Table 4 is based on the data described by Masuda. The last column is average propagation coefficients at each generation. Every column in the Table 4 except the first one is shown in Fig. 4 where the length means the distance between a root, that is, a first person and a descendant on the graph. Notice that the length is also the same as generation number . The distribution of the number of new nodes at is a bell-shaped with the peak at the . Its cumulative distribution has a logistic shape that shows nearly all of the population are included up to the . It drastically rises at the . It is, meanwhile, speculated that some hubs are brought over to the distribution. This suggests that this network is scale free as also suggested by Yuta et al. . The propagation coefficient exhibits a strange behavior at first glance. We speculate that this is an incidental event as will be observed in the next subsection.

Table 4. Data on Mixi by ATR group

generation number of new nodes at number of total nodes propagation coefficient
0 1 1 28
1 28 29 9.4642857
2 265 294 24.430189
3 6474 6768 13.792864
4 89295 96063 1.9807044
5 176867 272930 0.417692
6 73876 346806 0.167334452
7 12362 359168 0.1173758
8 1451 360619 0.0978635
9 142 360761 0.2112676
10 30 360791 0.133333333
11 4 360795 1.250
12 5 360800 0.4
13 2 360802 FigD4 Propagation In Mixi. @@

@

Since we can find and from Table 4, if is determined, we can evaluate clustering coefficients by Eq (16) for various values of . While the average degree is in the data for Mixi according to Yuta et al., we assume so as to get adequate values for the clustering coefficient. This is the assumption that the number of edges left from a first one person is . Though needless to say that is not always the case, this assumption can derive meaningful clustering coefficients in the present case. The values of them are given by Table 5. This roughly show the following relation holds;

 Ci∼10−(Gi+1). (23)

When , the structure of the Mixi network is practically a random graph. This also means that is not constant and so the homogeneous hypothesis is rejected in the present case. This is due to almost scale free nature of the Mixi network and finite size effects. Though considerations in the section 2 is not adequate for such cases, they would be adequate to Watts-Strogatz type networks.

Table 5. Clustering Coefficients for Mixi data based on propagation coefficients with

 G1=1 q=0.5 q=0.8 none none 9.6×10−3∼4 1.4×10−3 7.6×10−5 8.3×10−5 7.4×10−6 8.8×10−6 5.7×10−7 7.5×10−7 none 1.9×10−8

### 4.2 Bacon Game

We can draw a good deal of information from the web page. Fig. 5 displays figures similar to Fig.4 in the case of Bacon game. The above figures in Fig.6 are drawn when the starting person is literally Bacon and the below ones are done when the starting person is the one with the longest average path length on the actor network. They are each other quite alike apart from the position of the peak of the distribution (both ends) or maximum gradient (the middle). Key actors would connect with all the actors by short distances so that the distance at the peak is also small. Especially the propagation coefficient marks its peak at the first step for the leading actor Bacon. These observations demonstrate that the homogeneous hypothesis is not valid like the subsection 4.1.

In these experimental networks, we find that does not take a constant value mainly due to finite size effect and almost scale free nature. A simple estimation of occasionally leads to some negative value owing to these. A problem in which becomes negative when we take in the subsection 4.1 crops up. On closer investigation, a finite size effect brings on and hubs alone are connected to extraordinarily many persons. These cause negative .

In order to solve the problems, we should consider this subject more minutely beyond the homogeneous hypothesis, including distribution of of the individual, a degree correlation between neighboring persons and so on. FigD5 Propagation In Bacon Game. Left: the number of new nodes appeared at a generation . Middle: the number of total nodes appeared up to the generation . Right: The propagation coefficient at the generation . @@

@

## 5 Summary

In this article we investigate three points about six degrees of separation proposed by Milgram. First one is the analyses of the Watts-Strogatz type small world network, which is based on analytic expressions for the clustering coefficient and the average path length. Second we gave discussions on the propagation coefficient model based on the homogeneous hypothesis for information transmission. Though empirical networks do not necessarily support the hypothesis, we turn out to establish the formalism to calculate propagation coefficient and so on in the schemes where information propagates from generation to generation. Third some experimental networks were investigated where the validity of the homogeneous hypothesis was examined so that some points at issue were made clear. Lastly numerical analyses carried out in these three subjects. Knowledge gained through these investigations is summarized as follows.

1. The effect of the clustering coefficient on diffusion of information over networks of human relations is not so crucial. The effect of clustering coefficient on them only reduces the population who can receive information in a tree graph to a few percentage of it. Though there is a double-digit decrease in the population who can receive information, each person has only to have dozens of contacts for six degrees of separation. Thus ”six degrees of separation” is not so amazing phenomenon.

2.The homogeneous hypothesis should be made more accurate. By way of example, considerations for distribution of the clustering coefficient in every node, degree correlation between neighbors as well as degree distribution and so on should be given. They are future issues to be addressed.

3. The finite size effect of networks has to be considered in the analysis of empirical data.

### References

1. S. Milgram, ”The small world problem”, Psychology Today 2, 60-67 (1967)
2. D. J. Watts@and S. H. Strogatz, ”Collective dynamics of ’small-world’ networks”,@Nature, 393, 440-442(1998)
3. D. J. Watts, ”Six degree– The science of a connected age”, W.W. Norton and Company, New York (2003)
4. R.Albert and A-.L. Barabasi, ”Statistical Mechanics of complex networks”,Rev. Mod. Phys. 74, 47-97(2002)
5. A.-L.Barabasi and R.Albert, ”Emergence of scaling in random networks”, Science, 286, 509-512(1999)
6. A.-L.Barabasi and R.Albert, ”Linked: The New Science of Networks”, Perseus Books Group (2002) Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life Plume ; ISBN: 0452284392 ; Reissue Å (2003/04/29)
7. P. Erdos and A. Renyi,” On random graphs I”, Publications Mathematicae Debrecen6, 290-297, 1959
8. M.E.J. Newman, A.-L.Barabasi and D. J. Watts, ”The Structure and Dynamics of Networks”, Princeton Univ. Press, 2006 @
9. S. N. Dorogovtsev, A.V. Goltsev and J.F.F. Mendes, ”Pseudo fractal scale-free web”, Phys. Rev. E.65, 066122(2002)
10. S. N. Dorogovtsev and J.F.F. Mendes, ”Evolution of Network”, Oxford Univ. Press, Oxford(2003)
11. N. Masuda and N. Konno, ”Fukuzatu network towa nanika”, in Japanese (Kodanshya 2006)
12. K. YutaC N.Ono and Y. Fujiwara,”A gap in the community-size distribution of a large-scale social networking site”, preprint arXive:physic/0701168.
13. M.E.J. Newman, ”The structure of scientific collaboration networks”, Proc. Natl. Acad. Sci. USA 98(2001)404; ”A sutudy of scientific coauthorship networks, scientific collaboration networks I, Network consideration and fundamental results”, Phys.Rev. E64, (2001)016131; ”scientific collaboration networks II,Shortest path, weighted networks, and centrality”, Phys.Rev. E64, (2001)016132
14. A.-L.Barabasi, H. Jeong, Z.Neda, E.Ravasz, A Shubert and T. Vicsek, ” Evolution of social network of scientific collaborations”, Physica A311, (2002)590
15. C. Ferrer I and R.V.Sole, ”The small world of human language”, Proc. R. Soc. London B268(2001)2261
16. P.A. Fell and A. Wagner, ”The small world of metabolism”, Nature Biotechnology18(2000)1121
17. Patrick Reynolds and the CS Web Team,” The Oracle of Bacon at Virginia”, http://oracleofbacon.org/
18. A. Barrat and M. Weigt, ”On the properties of small-world networks”, Eur. Phys. J. B13, (2000)547
19. M.E.J. Newman, C. Moor and D. J. Watts, ”Mean-field solution of small-world network model”, Phys. Rev. Lett. 84(14)3201
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   