Dynamics-based centrality for general directed networks
Determining the relative importance of nodes in directed networks is important in, for example, ranking websites, publications, and sports teams, and for understanding signal flows in systems biology. A prevailing centrality measure in this respect is the PageRank. In this work, we focus on another class of centrality derived from the Laplacian of the network. We extend the Laplacian-based centrality, which has mainly been applied to strongly connected networks, to the case of general directed networks such that we can quantitatively compare arbitrary nodes. Toward this end, we adopt the idea used in the PageRank to introduce global connectivity between all the pairs of nodes with a certain strength. Numerical simulations are carried out on some networks. We also offer interpretations of the Laplacian-based centrality for general directed networks in terms of various dynamical and structural properties of networks. Importantly, the Laplacian-based centrality defined as the stationary density of the continuous-time random walk with random jumps is shown to be equivalent to the absorption probability of the random walk with sinks at each node but without random jumps. Similarly, the proposed centrality represents the importance of nodes in dynamics on the original network supplied with sinks but not with random jumps.
A network is a set of nodes and a set of links that connect pairs of nodes (see Albert and Barabási (2002); Boccaletti et al. (2006); Newman (2010) for reviews). In applications including information science, sociology, and biology, it is often necessary to determine important nodes in a network. Various definitions of the importance of nodes, or centrality measures, have been proposed since the first classical studies on social network analysis in the 1950s Katz (1953); Boccaletti et al. (2006); Wasserman94-Brandes05LNCSbook .
It is often more suitable to consider links to be directed, where the direction of link represents relationships such as the control of one node over another, unidirectional flow, and citation. Many centrality measures including degree centrality, betweenness centrality, and eigenvector centrality can be adopted to the case of directed networks. Nevertheless, the most popular centrality for directed networks appears to be PageRank, which takes nontrivial values only in directed networks. It was originally developed for ranking websites Brin98-Langville06book . In other words, the PageRank of a node is large when the node receives many links from important nodes that do not have too many outgoing links.
In the present study, we focus on another important class of centrality for directed networks, i.e., those derived from the Laplacian of the network. This class of centrality has a long history Daniels69Biom-Berman80SIAM ; Moon and Pullman (1970); Biggs (1997); Agaev and Chebotarev (2000); Chebotarev and Agaev (2005); Borm (2002) and is mathematically close to the PageRank (see Sec. V). Furthermore, for strongly connected networks, i.e., networks in which there exists a path of directed links between an arbitrary ordered pair of nodes, the Laplacian-based centrality value of a node, which we also call the influence of a node, represents its importance in various dynamics on networks Masuda et al. (2009a, b); Klemm et al. (2010).
The Laplacian-based centrality measure has mostly been analyzed for strongly connected networks Daniels69Biom-Berman80SIAM ; Moon and Pullman (1970); Biggs (1997); Masuda et al. (2009a, b); Klemm et al. (2010). However, real directed networks may not be strongly connected. This is typically the case when the network is sparse (i.e., number of links is relatively small) or of small size. Although the Laplacian-based centrality in the original form is applicable when all the nodes are reached along directed paths from a certain specified node, such a network is not generic. The Laplacian-based centrality has been generalized to the case of general directed networks Agaev and Chebotarev (2000); Borm (2002); Chebotarev and Agaev (2005). In the generalized version, nodes in an uppermost component have positive centrality values, whereas nodes in a downstream component have zero centrality values (see Secs. II and III for definitions of uppermost and downstream components). However, we may want to compare the importance of nodes in downstream components. We may also wish to compare a node 1 in an uppermost component and a node 2 in a downstream component that is not under the control of node 1.
In this paper, we extend the Laplacian-based centrality measure (i.e., influence) to the case of general directed networks. Networks do not have to be strongly connected and can be composed of disconnected components. The extended centrality measure, called as the influence or extended influence without ambiguity, is a one-parameter family of the centrality measure with parameter such that the previous definition Agaev and Chebotarev (2000); Borm (2002); Chebotarev and Agaev (2005) is recovered in the limit . The extended influence is a relative of the PageRank; the influence and the PageRank correspond to continuous-time and discrete-time simple random walks, respectively. The present paper is organized as follows. In Secs. II and III, we review previous works on the influence for strongly connected and general directed networks, respectively. In Sec. IV, we present new interpretations of the centrality measure introduced in Sec. III. In Sec. V, we extend the concept of the influence by borrowing the idea used in the PageRank to introduce some global connectivity to the original network. We also show that the proposed influence can be interpreted as the dynamical properties of nodes on the original network without additional global connectivity. In Secs. VI and VII, we apply the influence to toy examples and relatively large networks, respectively. In Sec. VIII, we summarize and discuss our results, with an emphasis on the comparison of the influence and the PageRank.
Ii Influence for networks with single zero Laplacian eigenvalue
Consider a directed and weighted network having nodes. The weight of the link from node to node is denoted by and assumed to be nonnegative. represents the strength with which node governs node . and are generally different from each other.
The Laplacian-based centrality measure, called the influence of node and denoted as , is defined as the solution of the following set of linear equations:
The normalization is given by . We can rewrite Eq. (1) as
where is the asymmetric Laplacian defined by
represents the importance of nodes in various dynamics on networks, such as the voter model, a random walk, DeGroot’s model of consensus formation, and the response of synchronized networks Masuda et al. (2009a).
If a network is strongly connected, that is, if any node can be reached from an arbitrary node along directed links, the Perron-Frobenius theorem guarantees that is unique and (). In particular, for undirected networks, which are strongly connected as long as they are connected, we have . Therefore, the influence is a centrality measure that is relevant only in directed networks.
To discuss the uniqueness of the zero eigenvector of , we use the concept of the root node Agaev and Chebotarev (2000). Consider the set of nodes in a given network (). We define to be a set of root nodes if an arbitrary node can be reached along directed links from a node included in and is minimal. In the example shown in Fig. 1, qualifies as . is another example of . does not qualify because it is not minimal. The minimality indicates that some nodes cannot be reached from , where is the set of nodes with nodes defined by removing an arbitrary node from . For strongly connected networks (), can be a set of any single node. As this exercise suggests, for a given network is generally not unique. However, is uniquely determined from a network Agaev and Chebotarev (2000). The directed chain shown in Fig. 2 is a network that is not strongly connected with . In Fig. 2, we obtain for the unique root node 1 and ().
The multiplicity of the zero eigenvalue of , also called the geometric multiplicity of the eigenvalue Golub and Loan (1996); Horn and Johnson (1985), is equal to Abramson (1964); Ermentrout (1992); Agaev and Chebotarev (2000). Therefore, the influence given by Eq. (2) is well defined only for networks with , and most previous papers that treat Eq. (2) concentrate on strongly connected networks Daniels69Biom-Berman80SIAM ; Moon and Pullman (1970); Biggs (1997); Masuda and Ohtsuki (2009); Masuda et al. (2009a, b); Klemm et al. (2010). In this case, can be readily calculated by the power iteration or the enumeration of the directed spanning tree Masuda et al. (2009a, b).
Iii Case of multiple zero Laplacian eigenvalues
In this section, we treat networks with multiple zero Laplacian eigenvalues. Such a network is not strongly connected. The influence explained in Sec. II was extended to accommodate this case by Agaev–Chebotarev Agaev and Chebotarev (2000); Chebotarev and Agaev (2005) and Borm et al. Borm (2002). We develop a new centrality measure in Sec. V by generalizing their definitions. In this section, we explain their centrality measure and examine its properties.
Consider a continuous-time simple random walk on the network generated by reversing the direction of all the links of the original network. We select each node () as the initial location of the random walker with probability . For directed networks that are not necessarily strongly connected, Agaev–Chebotarev Agaev and Chebotarev (2000); Chebotarev and Agaev (2005) and Borm et al. Borm (2002) defined a centrality measure, which we call the influence and denote by without ambiguity, as the long-term probability that the walker visits node . For a strongly connected network, is equal to the stationary density of the random walk and coincides with defined by Eq. (2) Masuda et al. (2009a); Masuda and Ohtsuki (2009). For a network with a single root node , node is the unique absorbing boundary, and any random walker is eventually trapped at node . Therefore, and (), which is again consistent with Eq. (2) Masuda et al. (2009a).
Because the generator of the continuous-time random walk is equal to , we obtain
The spectral decomposition of yields
where and () are the zero left and right eigenvectors of , respectively. Without loss of generality, we assume that and are normalized and orthogonalized such that
is the spectral gap (i.e., smallest positive eigenvalue) of . and are a pair of left and right eigenvalues of corresponding to , where . Other modes that decay at least as fast as with are omitted in Eq. (5). Note that Eq. (5) is also valid when is not diagonalizable and has a nondiagonal Jordan normal form. The combination of Eqs. (4) and (5) leads to
To gain insights into Eq. (8), we consider the decomposition of directed networks into strongly connected components (SCCs). We define the uppermost SCC as an SCC that is not downstream to any other SCC along directed links. The number of uppermost SCCs in a given network is equal to Agaev and Chebotarev (2000). The choice of , the set of root nodes, is unique up to the arbitrariness of the choice of a node in each uppermost SCC. This is consistent with the fact that the set of any single node is qualified as in a strongly connected network.
The diagonal block () corresponds to the th SCC. We denote the number of nodes in the th SCC by . Then, is an matrix and . The lower triangular nature of Eq. (9) implies that the SCCs are ordered in Eq. (9) such that links may exist from a node in the th SCC to the th SCC only when .
Because out of SCCs do not receive links from other SCCs, the uppermost SCCs occupy the first rows of blocks in Eq. (9), and we obtain
Equation (9) constrained by Eq. (10) is called the Frobenius normal form (Bapat and Raghavan, 1997, p. 38). In addition, () is the Laplacian matrix of the th SCC, which has a single zero eigenvalue. The eigenequations for this submatrix are represented by
It is easy Agaev and Chebotarev (2000) to verify that the left zero eigenvectors of are given by
To satisfy the normalization condition and the first rows of , we should take
where () is the -dimensional column vector determined by
To show that Eq. (14) has a unique nonnegative solution, we decompose the diagonal block () as
where is the Laplacian of the th SCC and is the diagonal matrix whose th element is equal to the total number of incoming links from SCCs 1, , to the th node in the th SCC. Equation (15) implies that is diagonally dominant. Therefore, by applying the Jacobi or Gauss-Seidel iteration to the first rows of Eq. (14), we can uniquely calculate Golub and Loan (1996). Furthermore, is an M-matrix Berman and Plemmons (1979). Because all the elements of that appear in the first rows of Eq. (14) are not positive, all the elements of are guaranteed to be nonnegative (Berman and Plemmons, 1979, p. 136). By substituting the obtained in Eq. (14) and applying the Jacobi or Gauss-Seidel iteration to the next rows, we can uniquely determine . By repeating the same procedure, we can successively determine , whose elements are unique and nonnegative.
The projection of onto the eigenspaces yields
where is the unit matrix. Note that Eq. (16) is valid even if is not diagonalizable. By multiplying the -dimensional column vector , a zero right eigenvector of , from the right to both sides of Eq. (16) and using Eqs. (6) and (7), we obtain
Equation (17) implies (), where is the th element of and represents the probability that the random walk starting from node is trapped by the th uppermost SCC. can be interpreted as the magnitude of the influence that the th SCC exerts on node . Note that if node cannot be reached from the th uppermost SCC along directed links in the original network.
for node that belongs to the th uppermost SCC. For these nodes, is satisfied. For nodes that do not belong to an uppermost SCC, we obtain . Equation (17) guarantees that . Equation (18) generalizes the definition for strongly connected networks given by Eq. (1). We interpret the right-hand side of Eq. (18) to be the multiplication of the influence of node within the th SCC (i.e., ) and the relative influence of the th SCC in the entire network (i.e., ).
For pedagogical purposes, the calculations of for two toy networks with and are presented in the Appendix.
Iv Interpretation of influence for networks with multiple root nodes
Borm and colleagues defined the Laplacian centrality measure on the basis of the continuous-time simple random walk on networks. In this section, we further motivate this definition by showing that given by Eq. (18) have other interpretations, as is the case for formulated for strongly connected networks Masuda et al. (2009a).
iv.1 Collective responses in the DeGroot model of consensus formation
The DeGroot model represents dynamical opinion formation in a population of interacting individuals Degroot74-Friedkin91 . The dynamics of the continuous-time version of the DeGroot model Olfati-Saber et al. (2007), also known as Abelson’s model Abramson (1964), are defined by
where represents the time-dependent opinion vector. For networks with , including strongly connected networks, the consensus, i.e., synchrony, is asymptotically reached. In this case, the final synchronized opinion is given by . Therefore, is equal to the fraction of the initial opinion at node reflected in the final opinion of the entire network Degroot74-Friedkin91 ; Olfati-Saber et al. (2007); Kori et al. (2009); Masuda et al. (2009a).
When , synchrony is neutrally but not asymptotically stable. Therefore, the consensus of the entire network is not generally reached from a general initial condition. The final opinion vector is given by
If we set () to introduce a different opinion of unit strength at node to the initial all-0 consensus state, the average response of the nodes induced by a different opinion at node is equal to
where is the -dimensional unit column vector such that the th element is equal to 1 and the other elements are equal to 0. Because Eq. (21) coincides with Eq. (18), the amount of the initial opinion of node reflected in the final opinion of the entire network is given by Eq. (18).
iv.2 Stationary density of voter model
The so-called link dynamics is a stochastic interacting particle system on networks in which each node takes one of the two opinions and Antal06prl-Sood08pre . In each time step, one link is randomly selected from the network with a probability proportional to the weight of the link. Then, the state of the source node of the link replaces that of the target node of the link if their states are different. Note that opinions and are equally strong in the dynamics. The dynamics halt when or takes over the entire network. The fixation probability of node is defined as the probability that takes over the network when the initial configuration is such that node takes and the other nodes take . When , is equal to the fixation probability of node Masuda and Ohtsuki (2009); Masuda et al. (2009a).
When , the fixation of introduced at node never occurs. If node is located in a downstream SCC, eventually vanishes because in the uppermost SCCs is permanent and replaces in the downstream SCCs. If node is located in an uppermost SCC, this SCC ends up with being entirely occupied by with a positive probability. However, other uppermost SCCs are permanently occupied by , such that the consensus is never reached.
In this situation, consider the expected fraction of in the network in the stationary state when we start from the initial configuration with a single at node . The probability that takes over the th uppermost SCC to which node belongs is equal to . Under the condition that the th uppermost SCC is entirely occupied by and the other uppermost SCCs are entirely occupied by , the master equation for the probability that node in a downstream SCC is occupied by is given by
where we set when node belongs to the th uppermost SCC and when node belongs to one of the other uppermost SCCs.
iv.3 Enumeration of spanning trees
When , the matrix-tree theorem implies that is proportional to the sum of the weights of all the possible directed spanning trees rooted at node Biggs (1997); Agaev and Chebotarev (2000); Masuda et al. (2009a). The weight of a spanning tree is defined as the multiplication of all the weights of the links included in the spanning tree.
The Markov chain tree theorem extends this result to the case Leighton and Rivest (1986). According to this theorem, for general directed networks is proportional to the sum of the weights of all the arborescences such that node is a root node of the arborescence and the arborescence passes node . An arborescence is a subgraph of the original networks with nodes such that the indegree of each node restricted to the arborescence is at most one, it has no cycles, and it contains the maximal number of links. The nodes whose indegrees are zero within the arborescence are called the root nodes of the arborescence. They form such that the concept of the root node for the arborescence and that for the network Laplacian Agaev and Chebotarev (2000) are identical. Therefore, the number of links in an arborescence is equal to , and the arborescence is composed of disconnected directed trees each of which emanates from a root node. Intuitively, represents the number of different ways in which node influences node .
The influence of node defined by Eq. (18) is proportional to the summation of all the arborescences with the modified weight. The modified weight of an arborescence is defined by the multiplication of all the weights of the links included in the arborescence and the number of nodes included in the directed tree rooted at node in the arborescence. If node is not the root of the arborescence, we set the weight of this arborescence to zero.
V Influence of nodes in downstream components
v.1 Definition of the extended influence
With the definition given by Eq. (18), nodes that do not belong to any uppermost SCC have . In practice, however, we often need to assess the relative importance of different nodes in downstream SCCs and that of nodes in different downstream SCCs. There also arise occasions when we want to compare uninfluential nodes in an uppermost SCC and influential nodes in a downstream SCC.
An extreme situation in which this is the case is realized by the network shown in Fig. 3(a). Whenever and , we obtain and . However, when and are small, node 1 may be regarded to be more central than node 3 because node 1 is much more central than node 2 and node 3 only weakly influences noncentral node 2. To cope with such a situation, we extend the influence to a one-parameter family of centrality measure by adopting the concept behind the definition of the PageRank.
The PageRank of node , denoted by , is defined as the stationary density of the discrete-time simple random walk as follows Brin98-Langville06book :
where if and if . The so-called teleportation probability represents the probability that the random walker jumps from any node to an arbitrary node in one step. The same concept underlies the definition of the centrality based on the adjacency matrix Katz (1953); Newman (2010). According to the second term on the right-hand side in Eq. (23), the random walker stays at the node without outgoing links with probability . The introduction of is necessary for treating networks that are not strongly connected.
The PageRank is originally designed for web graphs. Therefore, receiving links increases , which is opposite to the contention of the influence. To relate the PageRank to the influence, we consider the PageRank in the network generated by reversing all the links of the original network Masuda et al. (2009a). We denote this quantity for node by , which is determined by
As explained in Sec. III, the influence corresponds to the continuous-time random walk on the link-reversed network. In a strongly connected network, the influence of each node is equal to the stationary density of the continuous-time random walk on the link-reversed network Masuda and Ohtsuki (2009); Masuda et al. (2009a). As in the definition of the PageRank, let us introduce random global jumps to the continuous-time random walk on the link-reversed network. We do so by assuming that the walker jumps from any node to an arbitrary node with rate . Note that represents a probability in the PageRank, whereas it is a rate in the influence. In the following, we allow to exceed unity unless otherwise stated. The destination of the random jump is chosen from all the nodes with equal probability . We denote by the stationary density of the modified random walk at node . The normalization is given by . The stationary density is obtained from
We define the extended influence by the solution of Eq. (25). We note that the link-reversed version of Eq. (25), with a different structure of the global jump, was proposed as an alternative of the PageRank to be applied to web graphs LiuGao08ACM-LiuLiu10InfRetrieval .
In the vector notation, Eq. (25) is represented by
where is the by matrix whose all the elements are equal to unity. If , is strictly diagonally dominant, and Eq. (26) can be solved by the Jacobi or Gauss-Seidel iteration. A large guarantees exponentially fast convergence of the iteration Golub and Loan (1996).
We note that
which leads to
In the limit , Eq. (29) implies that , where is defined by Eq. (8). For the first term on the right-hand side of Eq. (8) to be comparable with the remaining terms, must be at least approximately . If this is the case, can quantitatively differentiate various nodes including those in downstream components.
where and are the outdegree and the indegree of node , respectively. The Taylor expansion is justified when , where is the eigenvalue of with the largest modulus. If is large relative to , the influence is determined by the outdegree and the indegree and is independent of the global structure of networks. Therefore, in practice, should not be too large as compared to . This is surprising because a large implies a strong global connectivity. As a rule of thumb, we recommend setting . A suitable range of the teleportation probability for the PageRank can be also obtained by applying the criterion to the PageRank matrix implied in Eq. (23).
v.2 Interpreting the extended influence without regard to global jumps
We have extended the influence by introducing global jumps to the continuous-time random walk on the link-reversed network. However, the meaning of the teleportation term in terms of the dynamical and structural properties of the nodes in the network and its rationale are somewhat vague. We show that the extended influence defined by Eq. (26) allows another interpretation: absorption probability of the random walk on the link-reversed network with a sink attached to each node but without global jumps. A similar interpretation was made for the PageRank in Ref. Avrachenkov2007siam .
We assume additional source nodes indexed by , , and directed links with weight from node to node () in the original network. The extension of the network shown in Fig. 3(a) is depicted in Fig. 3(b). The extended network has nodes. Nodes , , are the unique root nodes of the extended network. Node forms the th uppermost SCC in the extended network. The multiplicity of the zero Laplacian eigenvalue of the extended network is equal to .
We then reverse all the links and consider the probability that the random walker starting from an arbitrary node with equal probability is absorbed at node . This probability is given by . Because it is obvious and uninformative that the random walker starting from the auxiliary node is necessarily absorbed to node , we would like to exclude this factor. Therefore, we examine the quantity given by
The subtraction of in Eq. (31) accounts for the exclusion of the random walker starting from and absorbed to node . The multiplicative factor 2 accounts for the fact that we effectively start the random walk from nodes 1, , with equal probability .
Equation (18) implies that the calculation of involves , that is, the first element of the left zero eigenvector of corresponding to the th uppermost SCC. Because the uppermost SCC consists of single node , is equal to unity. The calculation of also involves , that is, the zero eigenvector of the -dimensional Laplacian. is an -dimensional column vector. By substituting these expressions and Eq. (18) in Eq. (31), the quantity given by Eq. (31) is equal to
We calculate from
where is the zero matrix and is the Laplacian of the original network. Equation (33) is equivalent to
With this interpretation, we gain an intuitive understanding of the fact that the extended influence is a local quantity when is large. In this situation, the tendency that a random walk exits from each node is strong, and a random walk would not travel a long distance before being absorbed. Therefore, it is natural that at large is efficiently approximated by local quantities of nodes such as the outdegree and the indegree, as discussed using Eq. (30).
Vi Toy examples
In this and the next section, we apply the extended influence to various networks.
vi.1 Network with
Consider the network shown in Fig. 3(a). We are concerned with the situation in which such that node 1 is apparently much more central than node 2. If node 3 is absent, ; node 1 is actually much more influential than node 2 Kori et al. (2009); Masuda et al. (2009b). However, regardless of the value of , node 3 takes all the share of the influence if we use .
The extended influence () is equal to for the network shown in Fig. 3(b). We obtain
Therefore, when . When or is small and , we have an intuitive result that node 1 is more influential than node 3.
vi.2 Directed chain
Consider a directed chain having nodes defined by () and (). The network is schematically shown in Fig. 2. We obtain and () Masuda et al. (2009b). However, nodes with small are located relatively upstream in the chain and intuitively appear influential as compared to nodes with large . We can calculate the influence either by solving Eq. (34) or by analyzing random walks with traps on the network obtained by reversing all the links shown in Fig. 2. When the random walker on the link-reversed network starts from node (), the probability that the walker exits from node to the absorbing node is equal to
Therefore, we obtain
We note that , (), and monotonically decreases with for any .
Vii Numerical results
In this section, we examine the influence in three directed networks: a random graph, a neural network, and an online social network.
vii.1 Descriptions of networks
We generate a directed random network with and expected degree by connecting each ordered pair of nodes independently with probability . Because is relatively small, the generated network is not strongly connected, whereas it is weakly connected, i.e., not divided into disconnected components. The generated network has three root nodes, each of which forms an SCC. The largest SCC contains 94 nodes and is downstream to the three root nodes. The extremal Laplacian eigenvalues are and .
We generate a C. elegans neural network with on the basis of published data Chen06pnas-wormatlas . In this network, there exist two types of links: undirected gap junctions and directed chemical synapses. A pair of neurons can be connected by multiple synapses. We regard this network as a weighted directed network, where the weight of the link from neuron to neuron is defined as the summation of the number of gap junctions between and and the number of chemical synapses from to . The network has 2993 links. The largest SCC has 274 nodes Masuda et al. (2009a). Four of the five remaining nodes are located upstream to the largest SCC and form individual SCCs. The other node is located downstream to the largest SCC. The extremal Laplacian eigenvalues are and .
The third network that we use is an online social network among students at University of California, Irvine Panzarasa et al. (2009). This network has nodes and 20296 directed and weighted links. We focus on the largest weakly connected component of this network that contains 1893 nodes and 13835 links. There exist 103 root nodes, each of which forms an SCC. The largest SCC has 1023 nodes and is downstream to these root nodes. The extremal Laplacian eigenvalues are and .
vii.2 Analysis of influence in the three networks
The rank plots of the influence for various values of for the random graph, neural network, and online social network are shown in Figs. 4(a)–(c), respectively. In the figure, the values of are shown in the ascending order for each for clarity.
When (thickest lines), is similar to for the three networks. Therefore, the root nodes have exclusively large , whereas the other nodes have . Accordingly, we find a sudden jump in the rank plot for each network. Such a small value of does not allow us to quantitatively compare the centrality of nodes in downstream components. This is also anticipated from the fact that the three networks yield . In the other extreme, is roughly satisfied when (thinnest lines). This is consistent with the fact that the three networks yield . In this range of , the influence is not an adequate centrality measure. For intermediate values of , is reasonably dispersed, and nodes that are not the roots are also endowed with positive . We consider that the influence with intermediate values of enables us to compare the importance of nodes that are in downstream SCCs and quantify the relative importance of nodes in uppermost SCCs and nodes in downstream SCCs.
The influence with intermediate values of is distinct from the interpolation of the influence when (i.e., ) and that when (i.e., ). The order of the nodes in terms of the value of drastically changes as varies. To demonstrate this, we examine the dependence of on for some selected nodes.
For the random graph, we select the three root nodes, for which is the largest at and the three nodes whose is the largest at . The dependence of on for the six nodes is shown in Fig. 5(a). The three root nodes [solid lines in Fig. 5(a)] and the three nodes with the largest at (dashed lines) do not overlap each other. In particular, the root node with the third largest for does not have large when is approximately larger than 1. Although the indegree of this root node is equal to zero, the destinations of the links from this root node are presumably nodes with small influence values in the largest SCC. This phenomenon is essentially the same as that shown in Fig. 3.
The neural network has four root nodes. The dependence of on for the root nodes and the three nodes whose is among the four largest values at are shown in Fig. 5(b). In the neural network, one of the four roots is among the nodes with the four largest values of at . For the online social network, the relationships between and for the five root nodes with the largest at and the five nodes with the largest at are shown in Fig. 5(c). The results for the neural network and the online social network are qualitatively the same as those for the random graph. In particular, some root nodes (solid lines) do not have particularly large when is approximately larger than unity.
Finally, we quantify the dependence of the influence on by calculating the Kendall rank correlation coefficient. It is defined as , where is the number of pairs , () such that the sign of for is the same as that for . The correlation coefficient falls between and . The correlation coefficient for the random graph for various values of is shown in Fig. 6(a). As anticipated, the correlation decreases with . Figure 6(a) also indicates that the ranking on the basis of the influence is fairly insensitive to in two ranges of , i.e., for smaller than and for larger than . The ranking is sensitive to between these two ranges of . For comparison, the correlation coefficient for the PageRank for various values of is shown in Fig. 6(b). Similar to the case of the influence, the correlation decreases with . The correlation between the influence and the PageRank [Fig. 6(c)] is generally small regardless of the two values of . On this basis, we claim that the influence and the PageRank are distinct centrality measures. This result generalizes that when directed networks are strongly connected and Masuda et al. (2009a).
We have proposed a centrality measure (influence) for general directed networks. It is a generalization of a Laplacian-based centrality measure that is often used for strongly connected networks Daniels69Biom-Berman80SIAM ; Moon and Pullman (1970); Biggs (1997); Masuda et al. (2009a, b); Klemm et al. (2010). It also generalizes the formulation of the same centrality measure developed for networks that are not necessarily strongly connected Agaev and Chebotarev (2000); Borm (2002); Chebotarev and Agaev (2005). Unlike the previous measure Agaev and Chebotarev (2000); Borm (2002); Chebotarev and Agaev (2005), the proposed measure is suitable for comparing the importance of nodes that are in downstream SCCs and comparing nodes in different SCCs. It has a free parameter . For networks that are not strongly connected, we suggest using (Sec. V.1). A small value of implies that the centrality values concentrate on nodes in uppermost components. A large value of makes the influence close to a degree centrality, i.e., outdegree minus indegree. The choice of is up to users’ preferences. We acknowledge that various mathematical properties of the matrix associated with the influence (i.e., ) have been analyzed in Agaev and Chebotarev (2000); Chebotarev and Agaev (2005). In Chebotarev and Agaev (2005), the use of this matrix for the centrality measure is briefly mentioned.
Arguably, the most frequently used centrality measure for directed networks appears to be the PageRank Brin98-Langville06book . Beyond the World Wide Web, for which the PageRank was originally designed, the PageRank has been applied to rank, for example, academic papers and journals (e.g., Palacios04-Fersht09 ). The PageRank is interpreted as the stationary density of the discrete-time simple random walk with global jumps on the network.
We have defined the influence as a continuous-time counterpart of the PageRank. Furthermore, we have provided the interpretation of the influence as the absorption probability of the continuous-time random walk to the sink attached to each node but not with global random jumps. As a corollary, the PageRank can be interpreted as the absorption probability of the random walk without teleportation to a sink. In addition, a suitable range of the teleportation probability in the PageRank can be estimated by adapting the criterion to the discrete-time random walk.
For the case of strongly connected networks, we refer to our previous work Masuda and Ohtsuki (2009); Masuda et al. (2009a) for a discussion of continuous-time versus discrete-time random walk. We have shown that controls the relative importance of nodes in upstream SCCs and nodes in downstream SCCs. The same role is shared by the teleportation probability in the PageRank. Then, why do we feel the need to introduce a new centrality?
First, the extended influence inherits the definition of the influence for strongly connected networks and one-root networks (i.e., influence when ), and therefore, it represents the importance of nodes in various dynamics and in the enumeration of spanning trees (Sec. IV). Actually, for each dynamics considered in Sec. IV, we can consider a discrete-time version and relate the importance of nodes in the dynamics to the PageRank. We have explained this correspondence for the random walk (Sec. V). In addition, the DeGroot model of opinion formation was originally proposed in discrete time Degroot74-Friedkin91 . We should choose one among the two centrality measures depending on whether the continuous-time or discrete-time dynamics are assumed to occur on the network in question.
In the discrete-time interpretation, the indegree is essentially normalized to be unity. Therefore, if the weight of the link represents a value that should not be normalized, such as the rate of interaction, nominal connection strength, amount of signal or monetary fluxes, and the number of wins and losses between a pair of sports teams, the continuous-time interpretation, that is, the influence, appears to be more appropriate. On the other hand, the PageRank is more appropriate in the case of scientometry; if a paper cites many papers, the value of each citation should be considered to be small, and being cited from this paper should not be of great importance. This distinction may underlie the current situation that the PageRank and the Laplacian-based centrality have been used in somewhat different research communities and for different types of data. In this light, we have extended the Laplacian-based centrality so that it is applicable to general directed networks, as is the PageRank.
Second, the PageRank has a subtle arbitrariness in determining the behavior of the random walk that has reached a dangling node. Depending on the implementation, the walker at a dangling node hops to a randomly chosen node even with probability () Brin98-Langville06book or stays at the same node with probability Fortunato et al. (2006). The theoretical justification for either assumption is not clear. In the influence, we have the sole control parameter , and the influence unambiguously corresponds to the discrete-time case in which the walker stays at the dangling node with probability .
Acknowledgements.We thank Yoji Kawamura for the helpful discussions and for careful reading of the paper. N.M. acknowledges the support provided by the Grants-in-Aid for Scientific Research (Grant Nos. 20760258 and 20540382, and Innovative Areas “Systems Molecular Ethology”) from MEXT, Japan.
Appendix: Influence for two toy networks with multiple root nodes
For the network with four nodes and two root nodes shown in Fig. 7(a), we obtain , , , , ,
Therefore, the influence is given by
Nodes 2 and 3 have the same influence because they are as strong as each other within their SCC. Although the two upstream SCCs are upstream to node 4 in the same manner, is smaller than because controls two nodes and the SCC of nodes 2 and 3 controls three nodes.
For the network with four nodes and two root nodes shown in Fig. 7(b), we obtain , , , , ,
The influence is given by