A Probabilistic Framework for Structural Analysis in Directed Networks
Abstract
In our recent works, we developed a probabilistic framework for structural analysis in undirected networks. The key idea of that framework is to sample a network by a symmetric bivariate distribution and then use that bivariate distribution to formerly define various notions, including centrality, relative centrality, community, and modularity. The main objective of this paper is to extend the probabilistic framework to directed networks, where the sampling bivariate distributions could be asymmetric. Our main finding is that we can relax the assumption from symmetric bivariate distributions to bivariate distributions that have the same marginal distributions. By using such a weaker assumption, we show that various notions for structural analysis in directed networks can also be defined in the same manner as before. However, since the bivariate distribution could be asymmetric, the community detection algorithms proposed in our previous work cannot be directly applied. For this, we show that one can construct another sampled graph with a symmetric bivariate distribution so that for any partition of the network, the modularity index remains the same as that of the original sampled graph. Based on this, we propose a hierarchical agglomerative algorithm that returns a partition of communities when the algorithm converges.
keywords: centrality, community, modularity, PageRank
I Introduction
As the advent of online social networks, structural analysis of networks has been a very hot research topic. There are various notions that are widely used for structural analysis of networks, including centrality, relative centrality, similarity, community, modularity, and homophily (see e.g., the book by Newman [1]). In order to make these notions more mathematically precise, we developed in [2, 3] a probabilistic framework for structural analysis of undirected networks. The key idea of the framework is to “sample” a network to generate a bivariate distribution that specifies the probability that a pair of two nodes and are selected from a sample. The bivariate distribution can be viewed as a normalized similarity measure [4] between the two nodes and . A graph associated with a bivariate distribution is then called a sampled graph.
In [2, 3], the bivariate distribution is assumed to be symmetric. Under this assumption, the two marginal distributions of the bivariate distribution, denoted by and , are the same and they represent the probability that a particular node is selected in the sampled graph. As such, the marginal distribution can be used for defining the centrality of a node as it represents the probability that node is selected. The relative centrality of a set of nodes with respect to another set of nodes is then defined as the conditional probability that one node of the selected pair of two nodes is in the set given that the other node is in the set . Based on the probabilistic definitions of centrality and relative centrality in the framework, the community strength for a set of nodes is defined as the difference between its relative centrality with respect to itself and its centrality. Moreover, a set of nodes with a nonnegative community strength is called a community. In the probabilistic framework, the modularity for a partition of a sampled graph is defined as the average community strength of the community. As such, a high modularity for a partition of a graph implies that there are communities with strong community strengths. It was further shown in [3] that the Newman modularity in [5] and the stability in [6, 7] are special cases of the modularity for certain sampled graphs.
The main objective of this paper is to extend the probabilistic framework in [2, 3] to directed networks, where the sampling bivariate distributions could be asymmetric. Our main finding is that we can relax the assumption from symmetric bivariate distributions to bivariate distributions that have the same marginal distributions. By using such a weaker assumption, we show that the notions of centrality, relative centrality, community and modularity can be defined in the same manner as before. Moreover, the equivalent characterizations of a community still hold. Since the bivariate distribution could be asymmetric, the agglomerative community detection algorithms in [2, 3] cannot be directly applied. For this, we show that one can construct another sampled graph with a symmetric bivariate distribution so that for any partition of the network, the modularity index remains the same as that of the original sampled graph. Based on this, we propose a hierarchical agglomerative algorithm that returns a partition of communities when the algorithm converges.
In this paper, we also address two methods for sampling a directed network with a bivariate distribution that has the same marginal distributions : (i) PageRank and (ii) random walks with self loops and backward jumps. Experiments show that sampling by a random walk with self loops and backward jumps performs better than that by PageRank for community detection. This might be due to the fact that PageRank adds weak links in a network and that changes the topology of the network and thus affects the results of community detection.
Ii Sampling networks by bivariate distributions with the same marginal distributions
In [3], a probabilistic framework for network analysis for undirected networks was proposed. The main idea in that framework is to characterize a network by a sampled graph. Specifically, suppose a network is modelled by a graph , where denotes the set of vertices (nodes) in the graph and denotes the set of edges (links) in the graph. Let be the number of vertices in the graph and index the vertices from . Also, let be the adjacency matrix of the graph, i.e.,
A sampling bivariate distribution for a graph is the bivariate distribution that is used for sampling a network by randomly selecting an ordered pair of two nodes , i.e.,
(1) 
Let (resp. ) be the marginal distribution of the random variable (resp. ), i.e.,
(2) 
and
(3) 
Definition 1
(Sampled graph) A graph that is sampled by randomly selecting an ordered pair of two nodes according to a specific bivariate distribution in (1) is called a sampled graph and it is denoted by the two tuple .
For a given graph , there are many methods to generate sampled graphs by specifying the needed bivariate distributions. In [3], the bivariate distributions are all assumed to be symmetric and that limits its applicability to undirected networks. One of the main objectives of this paper is to relax the symmetric assumption for the bivariate distribution so that the framework can be applied to directed networks. The key idea of doing this is to assume that the bivariate distribution has the same marginal distributions, i.e.,
(4) 
Note that a symmetric bivariate distribution has the same marginal distributions and thus the assumption in (4) is much more general.
Iia PageRank
One approach for sampling a network with a bivariate distribution that has the same marginal distributions is to sample a network by an ergodic Markov chain. From the Markov chain theory (see e.g., [8]), it is wellknown that an ergodic Markov chain converges to its steady state in the long run. Hence, the joint distribution of two successive steps of a stationary and ergodic Markov chain can be used as the needed bivariate distribution. Specifically, suppose that a network is sampled by a stationary and ergodic Markov chain with the state space being the nodes in . Let be the transition probability matrix and be the steady state probability vector of the stationary and ergodic Markov chain. Then we can choose the bivariate distribution
(5) 
As the Markov chain is stationary, we have
(6) 
It is wellknown that a random walk on the graph induces a Markov chain with the state transition probability matrix with
(7) 
where
(8) 
is the number of outgoing edges from vertex . In particular, if the graph is an undirected graph, i.e., , then the induced Markov chain is reversible and the steady state probability of state , i.e., , is , where is the total number of edges of the undirected graph.
One problem for sampling a directed network by a simple random walk is that the induced Markov chain may not be ergodic even when the network itself is weakly connected. One genuine solution for this is to allow random jumps from states to states in a random walk. PageRank [9], proposed by Google, is one such example that has been successfully used for ranking web pages. The key idea behind PageRank is to model the behavior of a web surfer by a random walk (the random surfer model) and then use that to compute the steady state probability for a web surfer to visit a specific web page. Specifically, suppose that there are web pages and a web surfer uniformly selects a web page with probability . Once he/she is on a web page, he/she continues web surfing with probability . This is done by selecting uniformly one of the hyperlinks in that web page. On the other hand, with probability he/she starts a new web page uniformly among all the web pages. The transition probability from state to state for the induced Markov chain is then
(9) 
where if there is a hyperlink pointing from the web page to the web page and is the total number of hyperlinks on the web page. Let be steady probability of visiting the web page by the web surfer. It then follows that
(10) 
PageRank then uses as the centrality of the web page and rank web pages by their centralities. Unlike the random walk on an undirected graph, the steady state probabilities in (10) cannot be explicitly solved and it requires a lot of computation to solve the system of linear equations.
IiB Random walks with self loops and backward jumps
Another way to look at the Markov chain induced by PageRank in (9) is that it is in fact a random walk on a different graph with the adjacency matrix that is constructed from the original graph with additional edge weights, i.e.,
(12) 
where is an matrix with all its elements being 1 and is the diagonal matrix with for all .
In view of (12), another solution for the ergodic problem is to consider a random walk on the graph with the adjacency matrix
(13) 
where is the identity matrix and is the transpose matrix of . The three parameters , are positive and
A random walk on the graph with the adjacency matrix induces an ergodic Markov chain if the original graph is weakly connected. Also, with the additional edges from the identity matrix and the transpose matrix, such a random walk can be viewed as a random walk on the original graph with self loops and backward jumps.
Iii The framework for directed networks
Iiia Centrality and relative centrality
Centrality [10, 11, 1] is usually used as a measure for ranking the importance of a set of nodes in a (social) network. Under the assumption in (4), such a concept can be directly mapped to the probability that a node is selected as in [3].
Definition 2
(Centrality) For a sampled graph with the bivariate distribution that has the same marginal distributions in (4), the centrality of a set of nodes , denoted by , is defined as the probability that a node in is selected, i.e.,
(14) 
As a generalization of centrality, relative centrality in [3] is a (probability) measure that measures how important a set of nodes in a network is with respect to another set of nodes.
Definition 3
(Relative centrality) For a sampled graph with the bivariate distribution that has the same marginal distributions in (4), the relative centrality of a set of nodes with respect to another set of nodes , denoted by , is defined as the conditional probability that the randomly selected node is inside given that the random selected node is inside , i.e.,
(15) 
We note that if we choose , then the relative centrality of a set of nodes with respect to is simply the centrality of the set of nodes .
Example 4
(Relative PageRank) PageRank described in Section IIA has been commonly used for ranking the importance of nodes in a directed network. Here we can use Definition 3 to define relative PageRank that can be used for ranking the relative importance of a set of nodes to another set of nodes in a directed network. Specifically, let be the PageRank for node in (10) and be the transition probability from state to state for the induced Markov chain in (9). Then the relative PageRank of a set with respect to another set is
(16) 
Analogous to the relative centrality in [3], there are also several properties of relative centrality in Definition 3. However, the reciprocity property in Proposition 5(iv) is much weaker than that in [3]. The proof of Proposition 5 is given in Appendix A.
Proposition 5
For a sampled graph with the bivariate distribution that has the same marginal distributions in (4), the following properties for the relative centrality defined in Definition 3 hold.
(i) and . Moreover, and .
(ii) (Additivity) If and are two disjoint sets., i.e., is an empty set, then for an arbitrary set ,
(17) 
In particular, when , we have
(18) 
(iii) (Monotonicity) If is a subset of , i.e., , then and .
(iv) (Reciprocity) Let be the set of nodes that are not in .
IiiB Community strength and communities
The notions of community strength and modularity in [3] generalizes the original Newman’s definition [12] and unifies various other generalizations, including the stability in [6, 7]. In this section, we further extend these notions to directed networks.
Definition 6
(Community strength and communities) For a sample graph with a bivariate distribution that has the same marginal distributions in (4), the community strength of a subset set of nodes , denoted by , is defined as the difference of the relative centrality of with respect to itself and its centrality, i.e.,
(19) 
In particular, if a subset of nodes has a nonnegative community strength, i.e., , then it is called a community.
In the following theorem, we show various equivalent statements for a set of nodes to be a community. The proof of Theorem 7 is given in Appendix B.
Theorem 7
Consider a sample graph with a bivariate distribution that has the same marginal distributions in (4), and a set with . Let be the set of nodes that are not in . The following statements are equivalent.
 (i)

The set is a community, i.e., .
 (ii)

The relative centrality of with respect to is not less than the relative centrality of with respect to , i.e., .
 (iii)

The relative centrality of with respect to is not greater than the centrality of , i.e., .
 (iv)

The relative centrality of with respect to is not greater than the centrality of , i.e., .
 (v)

The set is a community, i.e., .
 (vi)

The relative centrality of with respect to is not less than the relative centrality of with respect to , i.e., .
IiiC Modularity and community detection
As in [3], we define the modularity index for a partition of a network as the average community strength of a randomly selected node in Definition 8.
Definition 8
(Modularity) Consider a sampled graph with a bivariate distribution that has the same marginal distributions in (4). Let , be a partition of , i.e., is an empty set for and . The modularity index with respect to the partition , , is defined as the weighted average of the community strength of each subset with the weight being the centrality of each subset, i.e.,
(20) 
We note the modularity index in (20) can also be written as follows:
(21) 
As the modularity index for a partition of a network is the average community strength of a randomly selected node, a good partition of a network should have a large modularity index. In view of this, one can then tackle the community detection problem by looking for algorithms that yield large values of the modularity index. For sampled graphs with symmetric bivariate distributions, there are already various community detection algorithms in [2, 3] that find local maxima of the modularity index. However, they cannot be directly applied as the bivariate distributions for sampling directed networks could be asymmetric. For this, we show in the following lemma that one can construct another sampled graph with a symmetric bivariate distribution so that for any partition of the network, the modularity index remains the same as that of the original sampled graph. The proof of Lemma 9 is given in Appendix C.
Lemma 9
Consider a sampled graph with a bivariate distribution that has the same marginal distributions in (4). Construct the sampled graph with the symmetric bivariate distribution
(22) 
Let (resp. ) be the modularity index for the partition of the sampled graph (resp. the sampled graph ). Then
(23) 
As , one can then use the community detection algorithms for the sampled graph with the symmetric bivariate distribution to solve the community detection problem for the original sampled graph . Analogous to the hierarchical agglomerative algorithms in [12, 13], in the following we propose a hierarchical agglomerative algorithm for community detection in directed networks. The idea behind this algorithm is modularity maximization. For this, we define the correlation measure between two nodes and as follows:
For any two sets and , define the correlation measure between these two sets as
(25) 
Also, define the average correlation measure between two sets and as
(26) 
With this correlation measure, we have from Lemma 9, (21) and (25) that the modularity index for the partition is
(27) 
Moreover, a set is a community if and only if .
Algorithm 1: a hierarchical agglomerative algorithm for community detection in a directed network
(P0) Input a sampled graph with a bivariate distribution that has the same marginal distributions in (4).
(P1) Initially, there are sets, indexed from 1 to , with each set containing exactly one node. Specifically, let be the set of nodes in set . Then , .
(P2) For all , compute the correlation measures from (IIIC).
(P3) If there is only one set left or there do not exist nonnegative correlation measures between two distinct sets, i.e., for all , then the algorithm outputs the current sets.
(P4) Find two sets that have a nonnegative correlation measure. Merge these two sets into a new set. Suppose that set and set are grouped into a new set . Then and update
(28) 
Moreover, for all , update
(29) 
(P5) Repeat from (P3).
The hierarchical agglomerative algorithm in Algorithm 1 has the following properties.
Theorem 10
 (i)

For the hierarchical agglomerative algorithm in Algorithm 1, the modularity index is nondecreasing in every iteration and thus converges to a local optimum.
 (ii)

When the algorithm converges, every set returned by the hierarchical agglomerative algorithm is indeed a community.
 (iii)

If, furthermore, we use the greedy selection that selects the two sets with the largest average correlation measure to merge in (P4) of Algorithm 1, then the average correlation measure of the two selected sets in each merge operation is nonincreasing.
The proof of Theorem 10 is given in Appendix D. For (i) and (ii) of Theorem 10, it is not necessary to specify how we select a pair of two sets with a nonnegative correlation. One advantage of using the greedy selection in (iii) of Theorem 10 is the monotonicity property for the dendrogram produced by a greedy hierarchical agglomerative algorithm (see [14], Chapter 13.2.3). With such a monotonicity property, there is no crossover in the produced dendrogram.
Iv Experimental results
In this section, we compare the sampling methods by PageRank in Section IIA and random walks with self loops and backward jumps in Section IIB for community detection. We conduct various experiments based on the stochastic block model with two blocks. The stochastic block model, as a generalization of the ErdosRenyi random graph, is a commonly used method for generating random graphs that can be used for benchmarking community detection algorithms. In a stochastic block model with two blocks (communities), the total number of nodes in the random graph are evenly distributed to these two blocks. The probability that there is an edge between two nodes within the same block is and the probability that there is an edge between two nodes in two different blocks is . These edges are generated independently. Let and .
In our experiments, the number of nodes in the stochastic block model is 200 with 100 nodes in each of these two blocks. The average degree of a node is set to be 3. The values of of these graphs are in the range from 2.5 to 5.9 with a common step of 0.1. We generate 100 graphs for each . Isolated vertices are removed. Thus, the exact numbers of vertices used in this experiment might be slightly less than 200. For PageRank, the parameter is chosen to be 0.9. For the random walk with self loops and backward jumps, the three parameters are , and . We run the greedy hierarchical agglomerative algorithm in Algorithm 1 until there are only two sets (even when there do not exist nonnegative correlation measures between two distinct sets). We then evaluate the overlap with the true labeling. In Figure 1, we show the experimental results, where each point is averaged over 100 random graphs from the stochastic block model. The error bars are the confidence intervals. From Figure 1, one can see that the performance of random walks with self loops and backward jumps is better than that of PageRank. One reason for this is that PageRank uniformly adds an edge (with a small weight) between any two nodes and these added edges change the network topology. On the other hand, mapping by a random walk with backward jumps in (13) does not change the network topology when it is viewed as an undirected network.
V Conclusion
In this paper we extended our previous work in [2, 3] to directed networks. Our approach is to introduce bivariate distributions that have the same marginal distributions. By doing so, we were able to extend the notions of centrality, relative centrality, community strength, community and modularity to directed networks. For community detection, we propose a hierarchical agglomerative algorithm that guarantees every set returned from the algorithm is a community. We also tested the algorithm by using PageRank and random walks with self loops and backward jumps. The experimental results show that sampling by random walks with self loops and backward jumps perform better than sampling by PageRank for community detection.
Appendices
Appendix A
In this section , we prove Proposition 5. Since the relative centrality is a conditional probability and the centrality is a probability, the properties in (i),(ii) and (iii) follow trivially from the property of probability measures.
Appendix B
In this section, we prove Theorem 7. We first prove that the first four statements are equivalent by showing (i) (ii) (iii) (iv) (i).
(i) (ii): Note from Proposition 5 (i) and (ii) that and . It then follows from the reciprocal property in Proposition 5(iv) that
As we assume that , we also have . Thus,
(ii) (iii): Since we assume that , we have from that
Multiplying both sides by yields
From the reciprocal property in Proposition 5(iv) and , it follows that
(iii) (iv): Note from the reciprocal property in Proposition 5(iv) that
(35) 
It then follows from that .
(iv) (i): Since we assume that , it follows from (35) that . In conjunction with and , we have
Now we show that and (iv) and (v) are equivalent. Since and , we have
Thus, if and only if .
Replacing by , we see that (v) and (vi) are also equivalent because (i) and (ii) are equivalent.
Appendix C
In this section, we prove Lemma 9.
Appendix D
In this section, we prove Theorem 10.
(i) Since we choose two sets that have a nonnegative correlation measure, i.e., , to merge, it is easy to see from (28) and (27) that the modularity index is nondecreasing in every iteration.
(ii) Suppose that there is only one set left. Then this set is and it is the trivial community. On the other hand, suppose that there are sets left when the algorithm converges. Then we know that for .
Note from (IIIC) and (25) that for any node ,
(39) 
Thus,
(40) 
Since is a partition of , it then follows that
(41) 
Since for , we conclude that and thus is a community.
(iii) Suppose that and are merged into the new set . According to the update rules in the algorithm and the symmetric property of , we know that
for all . Thus,
Since we select the two sets with the largest average correlation measure in each merge operation, we have and . These then lead to
Thus, is not less than the average correlation measure between any two sets after the merge operation. As such, the average correlation measure at each merge is nonincreasing.
References
 M. Newman, Networks: an introduction. OUP Oxford, 2009.
 C.S. Chang, C.Y. Hsu, J. Cheng, and D.S. Lee, “A general probabilistic framework for detecting community structure in networks,” in IEEE INFOCOM ’11, April 2011.
 C.S. Chang, C.J. Chang, W.T. Hsieh, D.S. Lee, L.H. Liou, and W. Liao, “Relative centrality and local community detection,” Network Science, vol. FirstView, pp. 1–35, 9 2015.
 D. LibenNowell and J. Kleinberg, “The link prediction problem for social networks,” in Proceedings of the twelfth international conference on Information and knowledge management. ACM, 2003, pp. 556–559.
 M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
 R. Lambiotte, “Multiscale modularity in complex networks,” in Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), 2010 Proceedings of the 8th International Symposium on. IEEE, 2010, pp. 546–553.
 J.C. Delvenne, S. N. Yaliraki, and M. Barahona, “Stability of graph communities across time scales,” Proceedings of the National Academy of Sciences, vol. 107, no. 29, pp. 12 755–12 760, 2010.
 R. Nelson, Probability, stochastic processes, and queueing theory: the mathematics of computer performance modeling. Springer Verlag, 1995.
 S. Brin and L. Page, “The anatomy of a largescale hypertextual web search engine,” Computer networks and ISDN systems, vol. 30, no. 1, pp. 107–117, 1998.
 L. C. Freeman, “A set of measures of centrality based on betweenness,” Sociometry, pp. 35–41, 1977.
 ——, “Centrality in social networks conceptual clarification,” Social networks, vol. 1, no. 3, pp. 215–239, 1979.
 M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.
 V. D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.
 S. Theodoridis and K. Koutroumbas, Pattern Recognition. Elsevier Academic press, USA, 2006.