Measure the similarity of nodes in the complex networks
Abstract
Measure the similarity of the nodes in the complex networks have interested many researchers to explore it. In this paper, a new method which is based on the degree centrality and the Relativeentropy is proposed to measure the similarity of the nodes in the complex networks. The results in this paper show that, the nodes which have a common structure property always have a high similarity to others nodes. The nodes which have a high influential to others always have a small value of similarity to other nodes and the marginal nodes also have a low similar to other nodes. The results in this paper show that the proposed method is useful and reasonable to measure the similarity of the nodes in the complex networks.
keywords:
Complex networks, Similarity of nodes, Crossentropy1 Introduction
The complex networks is a new method to describe those complex system from the mathematic. Many of the real system in the real world can be modeled as the complex system, such as the biological, social and technological systems albert2000error (); newman2003structure (); de2014facebook (); csermely2008creative (); csermely2009weak (). Many property of the complex networks have illuminated by these researchers in this filed, such as the network topology and dynamics watts1998collective (); newman2006structure (); ferrara2013traveling (); ferrara2012large (), the property of the network structure newman2003structure (); barthelemy2004betweenness (), the selfsimilarity and fractal property of the complex networkssong2005self (); wei2014new (); zhang2015tsallis (), the evolutionary games on complex networks wang2013impact (); PhysRevE.89.052813 (), the controllability and the synchronization of the complex networks liu2011controllability (); arenas2008synchronization () and so on barabasi2009scale (); barabasi1999emergence (); barabasi2009scale (); meo2013analyzing (); ferrara2013traveling (); teixeira2010complex (); csermely2004strong (); wang2012evolution ().
The similarity of the nodes in the complex networks is a new research direction. It is interested that ”How similar are these two vertices ?” or ” Which node is most similar to others nodes?”. There are many methods have proposed to solve this problem leicht2006vertex (); zhou2009predicting (); pan2010detecting (); lu2001node (); lu2007node (). In this paper, a new methods which is based on the relativeentropy (Kullback¨CLeibler divergence) kullback1951information () is proposed to describe the similarity of those nodes in the complex networks. The definition of the probabilities of each node is based on the degree distribution.
The rest of this paper is organised as follows. Section 2 introduces some preliminaries of this work. In section 3, a new method to measure the similarity of the nodes in the complex networks is proposed. The application of the proposed method is illustrated in section 4. Conclusion is given in Section 5.
2 Preliminaries
2.1 Local network in the complex network
Based on the existing research about the complex networks, it is clear that a lot of the property of complex networks are based on the structure property of it newman2003structure (). In the complex networks, each node’s influence on the whole network is decided by the neighbour nodes of it. Based on the existing researches about the local structure of the complex networks zhou2009predicting (); zhou2009predicting (); ulanowicz1999nutrient (), a local network of each node in the complex networks is proposed zhang2014local (). The details of the local networks is shown as follows:
It is clear that each local network of the target node contains the target node and the neighbor nodes of the target nodes.
2.2 Relative entropy (Kullback¨CLeibler divergence)
The Relative entropy (Kullback¨CLeibler divergence) is a basic conception in the probability theory and the information theory. It is proposed by Kullback and Leibler er.al kullback1951information (). The Relative entropy is a nonsymmetric measure of the difference between two probability. For two probabilities and The definition of the Relative entropy is shown in the Eq.(1).
(1) 
Where the and have the same number of the components in it. The components in those two probabilities is equal to .
3 Measure the similarity of each node
The proposed method is based on the definition of the local network and the Relative entropy. The definition of the proposed new methods can be divided into two parts.

The definition of the probabilities of each node. First, calculate the degree of each node. Find the maximum of the degree in the network. Second, set the scales of the probabilities of each node base on the value of the maximum degree. Third, use the degree of the neighbour nodes as the components of probabilities. At last, sort the probabilities from the high to the low.

The Relative entropy of each node to others nodes. Calculate the Relative entropy between each node’s probabilities.
Based on the local network of each node and the degree centrality, the definition of the probabilities of each node is shown as follows. For example, we use the represents the local network of node . In the local network , the total value of degree is represented by the (). The in the represents the th node. The node number in the local network is equal to . The maximum value of the degree in the whole networks is equal to . Then, the number of the components of each node’s probabilities is equal to . The probabilities of node is defined in the Eq.(2).
(2) 
where the d(j) in the Eq.(2) is defined based on the degree of the node in the local network.
In the , the value of is defined based on the degree of the node in the local network (). If the value of node number in the local network is small than the , then the value of will be set as 0. At last, sort the probabilities from the high to the low.
An example of the definition of are shown in the Fig. 2.
Then the measure of the similarity of node and node is defined as follows:
(3) 
The sum of each node similarity to others in the network is used to identify which node is most similar to others nodes. The big the value of the sum of similarity. The more similar to others nodes.
In order to illuminate the useful of the new method an example network (Network A21) is used to measure the similarity of nodes in it. The details of the example network (Network A21) are shown in the Fig. 3.
The probabilities of each node in the example network (Network A21) are shown in the Table 1.
P(1) = [0.27  0.27  0.27  0.18  0.00  0.00  0.00] 
P(2) = [0.31  0.31  0.19  0.19  0.00  0.00  0.00] 
P(3) = [0.36  0.21  0.21  0.21  0.00  0.00  0.00] 
P(4) = [0.44  0.33  0.22  0.00  0.00  0.00  0.00] 
P(5) = [0.23  0.23  0.14  0.14  0.14  0.14  0.00] 
P(6) = [0.36  0.27  0.27  0.09  0.00  0.00  0.00] 
P(7) = [0.24  0.24  0.14  0.14  0.14  0.10  0.00] 
P(8) = [0.31  0.31  0.19  0.19  0.00  0.00  0.00] 
P(9) = [0.75  0.25  0.00  0.00  0.00  0.00  0.00] 
P(10) = [0.29  0.21  0.21  0.14  0.14  0.00  0.00] 
P(11) = [0.45  0.36  0.18  0.00  0.00  0.00  0.00] 
P(12) = [0.33  0.28  0.22  0.17  0.00  0.00  0.00] 
P(13) = [0.50  0.33  0.17  0.00  0.00  0.00  0.00] 
P(14) = [0.40  0.40  0.20  0.00  0.00  0.00  0.00] 
P(15) = [0.24  0.16  0.16  0.12  0.12  0.12  0.08] 
P(16) = [0.44  0.33  0.22  0.00  0.00  0.00  0.00] 
P(17) = [0.38  0.25  0.19  0.19  0.00  0.00  0.00] 
P(18) = [0.35  0.24  0.18  0.12  0.12  0.00  0.00] 
P(19) = [0.35  0.24  0.18  0.12  0.12  0.00  0.00] 
P(20) = [0.33  0.27  0.20  0.20  0.00  0.00  0.00] 
P(21) = [0.43  0.21  0.21  0.14  0.00  0.00  0.00] 
Then the similarity matrix of the nodes in the example network (Network A21) is shown in the Eq.(4):
(4) 
From the similarity matrix, we can find that the node 2 and node 8, the node 4 and node 16, the node 18 and node 19 have the same structure in the example network (Network A21). The details is shown in the Fig. 6:
From the similarity matrix, we also have find that the node 9 has the lowest similarity to others nodes and the node 12 have the highest similarity to others nodes.
From the results of our test on the example network (Network A21), the measurement of the similarity of the nodes based on the Relativeentropy is an reasonable and useful method. The method also can be use to node classify in the complex networks. The node 12 have the highest similarity to others nodes. The degree of node 12 is equal to 3. In the example network (Network A21) most node’s degree is equal to 3. It shows from the other hands that the degree is very important to describe the structure property of the complex networks. The node 9 is a marginal node, because this is no node has a high similarity to it.
4 Application
In the section, the new method is used to find the most similar node in four real networks. The four networks are the Zachary’s Karate Club network (Karate) uci (), the USairport network (Usairport) networkdata (), Email networks (Email) networkdata ()and the Germany highway networks (Highway) nettt (). The results are shown as follows:
Network  Nodes  Edages  High similarity node  Low similarity node 

Karate  34  78  28  12 
Usairport  332  2126  55  118 
1133  10902  855  644  
Highway  1168  2481  31  798 
5 Conclusion
Measure the similarity of the node in the complex networks is an interesting topic. In this paper, a new method which is based on the Relativeentropy is proposed the measure the similarity of the nodes in the complex networks. The nodes with common structure have a high similarity to others. When the similarity between those nodes is equal to 1, it means that those two nodes have same structure property in the complex networks. The nodes which have influential to other or the nodes which are marginal nodes in the complex networks have a low similarity to others. The results in this paper show that, the proposed methods is useful and reasonable to measure the similarity of the node in the complex networks.
Acknowledgments
The work is partially supported by National Natural Science Foundation of China (Grant No. 61174022), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20131102130002), RD Program of China (2012BAH07B01), National High Technology Research and Development Program of China (863 Program) (Grant No. 2013AA013801), the open funding project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (Grant No.BUAAVR14KF02). Fundamental Research Funds for the Central Universities No. XDJK2015D009. Chongqing Graduate Student Research Innovation Project (Grant No. CYS14062)
References
 (1) R. Albert, H. Jeong, A. Barabási, Error and attack tolerance of complex networks, Nature 406 (6794) (2000) 378–382.
 (2) M. Newman, The structure and function of complex networks, SIAM Review (2003) 167–256.
 (3) P. De Meo, E. Ferrara, G. Fiumara, A. Provetti, On facebook, most ties are weak, Communications of the ACM 57 (11) (2014) 78–84.
 (4) P. Csermely, Creative elements: networkbased predictions of active centres in proteins and cellular and social networks, Trends in biochemical sciences 33 (12) (2008) 569–576.
 (5) P. Csermely, Weak links: the universal key to the stability of networks and complex systems, Springer Science & Business Media, 2009.
 (6) D. Watts, S. Strogatz, Collective dynamics of ¡®smallworld¡¯networks, Nature 393 (6684) (1998) 440–442.
 (7) M. Newman, A.L. Barabási, D. J. Watts, The structure and dynamics of networks, Princeton University Press, 2006.
 (8) E. Ferrara, O. Varol, F. Menczer, A. Flammini, Traveling trends: social butterflies or frequent fliers?, in: Proceedings of the first ACM conference on Online social networks, ACM, 2013, pp. 213–222.
 (9) E. Ferrara, A largescale community structure analysis in facebook, EPJ Data Science 1 (1) (2012) 1–30.
 (10) M. Barthelemy, Betweenness centrality in large complex networks, The European Physical Journal BCondensed Matter and Complex Systems 38 (2) (2004) 163–168.
 (11) C. Song, S. Havlin, H. Makse, Selfsimilarity of complex networks, Nature 433 (7024) (2005) 392–395.
 (12) D. Wei, B. Wei, Y. Hu, H. Zhang, Y. Deng, A new information dimension of complex networks, Physics Letters A 378 (16) (2014) 1091–1094.
 (13) Q. Zhang, C. Luo, M. Li, Y. Deng, S. Mahadevan, Tsallis information dimension of complex networks, Physica A: Statistical Mechanics and its Applications 419 (2015) 707–717.
 (14) Z. Wang, C.Y. Xia, S. Meloni, C.S. Zhou, Y. Moreno, Impact of social punishment on cooperative behavior in complex networks, Scientific reports 3. doi:10.1038/srep03055.
 (15) Z. Wang, L. Wang, M. c. v. Perc, Degree mixing in multilayer networks impedes the evolution of cooperation, Phys. Rev. E 89 (2014) 052813.
 (16) Y.Y. Liu, J.J. Slotine, A.L. Barabási, Controllability of complex networks, Nature 473 (7346) (2011) 167–173.
 (17) A. Arenas, A. DíazGuilera, J. Kurths, Y. Moreno, C. Zhou, Synchronization in complex networks, Physics Reports 469 (3) (2008) 93–153.
 (18) A.L. Barabási, et al., Scalefree networks: a decade and beyond, science 325 (5939) (2009) 412.
 (19) A. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (5439) (1999) 509–512.
 (20) P. d. Meo, E. Ferrara, F. Abel, L. Aroyo, G.J. Houben, Analyzing user behavior across social sharing environments, ACM Transactions on Intelligent Systems and Technology (TIST) 5 (1) (2013) 14.
 (21) G. Teixeira, M. Aguiar, C. Carvalho, D. Dantas, M. Cunha, J. Morais, H. Pereira, J. Miranda, Complex semantic networks, International Journal of Modern Physics C 21 (03) (2010) 333–347.
 (22) P. Csermely, Strong links are important, but weak links stabilize them, Trends in biochemical sciences 29 (7) (2004) 331–334.
 (23) Z. Wang, A. Szolnoki, M. Perc, Evolution of public cooperation on interdependent networks: The impact of biased utility functions, EPL (Europhysics Letters) 97 (4) (2012) 48001.
 (24) E. Leicht, P. Holme, M. E. Newman, Vertex similarity in networks, Physical Review E 73 (2) (2006) 026120.
 (25) T. Zhou, L. Lü, Y.C. Zhang, Predicting missing links via local information, The European Physical Journal BCondensed Matter and Complex Systems 71 (4) (2009) 623–630.
 (26) Y. Pan, D.H. Li, J.G. Liu, J.Z. Liang, Detecting community structure in complex networks via node similarity, Physica A: Statistical Mechanics and its Applications 389 (14) (2010) 2849–2857.
 (27) W. Lu, J. Janssen, E. Milios, N. Japkowicz, Node similarity in networked information spaces, in: Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research, IBM Press, 2001, p. 11.
 (28) W. Lu, J. Janssen, E. Milios, N. Japkowicz, Y. Zhang, Node similarity in the citation graph, Knowledge and Information Systems 11 (1) (2007) 105–129.
 (29) S. Kullback, R. A. Leibler, On information and sufficiency, The annals of mathematical statistics (1951) 79–86.
 (30) R. E. Ulanowicz, D. Baird, Nutrient controls on ecosystem dynamics: the chesapeake mesohaline community, Journal of Marine Systems 19 (1) (1999) 159–172.
 (31) Q. Zhang, M. Li, Y. Du, Y. Deng, Local structure entropy of complex networks, arXiv preprint arXiv:1412.3910.
 (32) Uci network data repository, http://networkdata.ics.uci.edu/data.php?id=105 (2014).
 (33) Pajek datasets, http://vlado.fmf.unilj.si/pub/networks/data/ (2014).
 (34) Tore opsahl, http://toreopsahl.com/datasets/ (2014).