Effective and efficient approximations of the generalized inverse of the graph Laplacian matrix with an application to current-flow betweenness centrality
We devise methods for finding approximations of the generalized inverse of the graph Laplacian matrix, which arises in many graph-theoretic applications. Finding this matrix in its entirety involves solving a matrix inversion problem, which is resource demanding in terms of consumed time and memory and hence impractical whenever the graph is relatively large. Our approximations use only few eigenpairs of the Laplacian matrix and are parametric with respect to this number, so that the user can compromise between effectiveness and efficiency of the approximated solution. We apply the devised approximations to the problem of computing current-flow betweenness centrality on a graph. However, given the generality of the Laplacian matrix, many other applications can be sought. We experimentally demonstrate that the approximations are effective already with a constant number of eigenpairs. These few eigenpairs can be stored with a linear amount of memory in the number of nodes of the graph and, in the realistic case of sparse networks, they can be efficiently computed using one of the many methods for retrieving few eigenpairs of sparse matrices that abound in the literature.
keywords:Spectral Theory; Graph Laplacian; Network Science; Current flow; Betweenness Centrality
The graph Laplacian is an important matrix that can tell us much about graph structure. It places on the diagonal the degrees of the graph nodes and elsewhere information about the distribution of edges among nodes in the graph. The graph Laplacian matrix, as well as its Moore-Penrose generalized inverse (Ben-Israel and Greville, 2003), are useful tools in network science that turn up in many different places, including random walks on networks, resistor networks, resistance distance among nodes, node centrality measures, graph partitioning, and network connectivity (Ghosh et al., 2008; Newman, 2010). In particular, among measures of centrality of graph nodes, betweenness quantifies the extent to which a node lies between other nodes. Nodes with high betweenness are crucial actors of the network since they control over information (or over whatever else flows on the network) passing between others. Moreover, removal from the network of these brokers might seriously disrupt communications between other vertices (Newman, 2005; Brandes and Fleischer, 2005).
The computation of the generalized inverse of the Laplacian matrix is demanding in terms of consumed time and space and thus it is not feasible on relatively large networks. On the other hand, there are today large databases from which real networks can be constructed, including technological, information, social, and biological networks (Brandes and Erlebach, 2005; Newman et al., 2006; Newman, 2010). These networks are voluminous and grow in time as more data are acquired. We have, therefore, the problem of running computationally heavy algorithms over large networks. The solution investigated in the present work is the use of approximation methods: algorithms that compute a solution near to the exact one and, as a compromise, that run using much less resources than the exact algorithm.
We propose a couple of approximation methods to compute the generalized inverse of the Laplacian matrix of a graph. Both methods are based on the computation of few eigenpairs (eigenvalues and the corresponding eigenvectors) of the Laplacian matrix (Golub and Meurant, 2010), where the number of computed eigenpairs is a parameter of the algorithm. The first method, called cutoff approximation, uses the computed eigenpairs in a suitable way for the approximation of the actual entries of the generalized inverse matrix. The second method, named stretch approximation, takes advantage of both the computed eigenpairs as well as of an estimation of the excluded ones. Both approximation methods can be applied to estimate current-flow betweenness centrality scores for the nodes of a graph. We experimentally show, using both random and scale-free network models, that the proposed approximation are both effective and efficient compared to the exact methods. In particular the stretch method allows to estimate, using a feasible amount of time and memory, a ranking of current-flow betweenness scores that strongly correlates with the exact ranking.
The layout of the paper is as follows. Section 2 introduces the notions of Laplacian matrix and its Moore-Penrose generalized inverse and recalls some basic properties of these matrices. In Section 3 we define cutoff and stretch approximations. Moreover, we theoretically show that stretch approximation is more effective than cutoff approximation. We review the methods for inverting a matrix and for finding few eigenpairs of a matrix, which are crucial operations in our contribution, in Section 4. Current-flow betweenness centrality is illustrated in Section 5. We formulate the definition in terms of the generalized inverse of the Laplacian matrix, which allows us to use cutoff and stretch approximations to estimate betweenness scores. A broad experimental analysis is proposed in Section 6 in order to investigate effectiveness and efficiency of the devised approximation methods. Section 7 concludes the paper.
2 The graph Laplacian and its generalized inverse
Let be an undirected weighted graph with the set of nodes, the set of edges, and a vector such that is the positive weight of edge , for . We denote by the number of nodes and the number of edges of the graph. The weighted Laplacian of is the symmetric matrix
where is the weighted adjacency matrix of the graph and is the diagonal matrix of the generalized degrees (the sum of the weights of the incident arcs) of the nodes.
In order to obtain more insight on the properties of the graph Laplacian it is useful to express the matrix in another form. Let be the incidence matrix of the graph such that, if edge connects two arbitrarily ordered nodes and then , , while for . Given a vector , the square diagonal matrix whose diagonal entries are the elements of is denoted with . It holds that . Thus besides symmetric is positive semidefinite, so that it has real and nonnegative eigenvalues that is useful to order . If denotes a vector of ones, then so that . It follows that is the smallest eigenvalue of . We will assume throughout the paper that is connected. In this case all other eigenvalues of are strictly positive (Ghosh et al., 2008):
Since , the determinant of is null and hence cannot be inverted. As a substitute for the inverse of we use the Moore-Penrose generalized inverse of , that we simply call generalized inverse of (Ben-Israel and Greville, 2003). As customary, we denote this kind of generalized inverse with . It is convenient to define starting from the spectral decomposition of . Actually, since is symmetric it admits the spectral decomposition
where and the columns of are the eigenvectors of . Notice that is an orthogonal matrix, that is .
By using the spectral decomposition of , its generalized inverse can be defined as follows
where (respectively, ) denotes the -th column (respectively, row) of matrix .
Observe that inherits from the property of being symmetric and positive semidefinite. Moreover, shares the same nullspace of , as is true in general for the Moore-Penrose generalized inverse of a symmetric matrix. Thus, since , it turns out that . By setting , it follows that , where is a matrix of all zeros. By using the eigendecompositions of and it is easy to show that . It follows that
The generalized inverse of the graph Laplacian is useful to solve a linear system of the form for some known vector , which arises in many applications. The range of a matrix is the linear space of vectors for which the system has a solution. Since is symmetric its range is the space orthogonal to its nullspace. The nullspace of is one-dimensional and spanned by the vector with all components equal to unity. Hence, the range of is made up by the vectors such that , or, equivalently, that sum up to zero. It follow that the linear system has solutions if sums up to zero. By the linearity, the difference of two solution belongs to the nullspace of , and this implies that if we are able to find an arbitrary solution , then all the other solutions are of the form , .
As well known, is the minimum Euclidean norm solution of the system , i.e., it is the element having minimum Euclidean norm in the affine space of the solutions (Ben-Israel and Greville, 2003). For completeness, we notice that, no matter if belongs to the range of or not, is the minimum Euclidean norm solution of the problem .
3 Approximations of the generalized inverse
In this section we propose two approximations of the the generalized inverse of the graph Laplacian matrix . For , we define the -th cutoff approximation of as:
In all computations, we never materialize matrix , but we represent it with the eigenpairs that define it. This representation of can be stored using space, that is if is constant. Moreover, computing an entry of using its eigenpair representation costs , that is if is constant.
As increases, the matrices are more and more accurate approximations of . Actually and, for ,
It holds that for every having rank less or equal to (Golub and Van Loan, 1996). Moreover, for , the relative 2-norm error of the -th cutoff approximation is:
The second approximation of exploits the following observation. If many of the excluded eigenvalues , for larger than , are close to each other, we might approximate them with a suitable value . We define the -th stretch approximation of as:
It is worth observing that the use of does not involve any significant additional cost with respect to the use of . Indeed, since , then
Notice that the normalization of the eigenvector of associated with the eigenvalue yields , where is a vector with all components equal to unity. It follows that, knowing the value , the -th stretch approximation can be represented using eigenpairs, hence the space needed to store the representation of and the time needed to compute its entries do not increase with respect to the use of .
On the other hand, the use of instead of allows to improve the bound on the approximation error. Actually, since
from Equation 4 we obtain
Assuming, as reasonable, that , we have that
Therefore, the relative 2-norm error of the stretch approximation is strictly less than the relative 2-norm error of the cutoff approximation , as soon as we chose within and . Moreover, the closer and , the better the stretch approximation.
The optimal choice for , that is, the value that minimizes the 2-norm relative error, is the harmonic mean of and :
With this choice we reduce of one half the bound of the approximation error:
We have used this choice of in all our experiments. Notice that the computation of implies computing two additional eigenvalues, namely and , but not the corresponding eigenvectors. To avoid this additional cost, we might reasonably assume that is big so that its reciprocal is small, and that is close to , so that the optimal value of is approximately .
4 Methods for matrix inversion and for finding few eigenpairs
Instead of computing the entire generalized inverse of the Laplacian matrix, our approximation methods compute and store only few eigenpairs of the Laplacian matrix. If a matrix is big and sparse, then the computation of few eigenpairs of can be made by means of iterative methods whose basic building block is the product of by a vector, which has linear complexity if is sparse. One of the simplest among these methods is orthogonal iteration (Golub and Van Loan, 1996), a generalization of the power method. The method, while simple, can be quite slow since the number of iterations depends on the distance between the sought eigenvalues of and experimental evidence shows that the eigenvalues nearest to zero are clustered, in particular for sparse networks (Zhan et al., 2010).
On the other hand, one of the most widely used algorithms is the Lanczos method with implicit restart, implemented by ARPACK (Lehoucq et al., 1998). This is the method we have used in our experiments. For the computation of few smallest eigenpairs of a matrix the method works in the so called shift and invert mode. In other words the Lanczos method is applied to being a suitable shift. To do this, the matrix is factorized before the iteration begins and the factorization is used to solve the sequence of linear systems that arises during the calculation. This accelerates the method but the factors are surely much less sparse than the matrix itself. This, combined with the clustering of the eigenvalues near zero, leads to a nonlinear scaling, which was observed also in our experiments.
Alternative approaches are Jacobi-Davidson and Deflation Accelerated Conjugate Gradient (Bergamaschi and Putti, 2002), that seem to be highly competitive with the Lanczos method. In particular in the Jacobi-Davidson method it is still needed to solve inner linear systems, but the factorization is avoided and substituted by the use of preconditioned iterative Krylov spaces based methods. Deflation Accelerated Conjugate Gradient sequentially computes the eigenpairs by minimizing the Rayleigh quotient over the subspace orthogonal to the eigenvectors previously computed. Finally, we mention the multilevel algorithm implemented in the HSL_MC73 routine of the HSL mathematical software library, that however only computes the second smallest eigenpair (Hu and Scott, 2003).
5 Current-flow betweenness centrality
A large volume of research on networks has been devoted to the concept of centrality (Sabidussi, 1966; Freeman, 1979; Borgatti, 2005; Newman, 2010). This research addresses the fundamental question: Which are the most important or central vertices in a network? There are four measures of centrality that are widely used in network analysis: degree centrality, eigenvector centrality, closeness, and betweenness. Here, we focus on betweenness centrality.
Betweenness measures the extent to which a node lies on paths between other nodes. Nodes with high betweenness might have considerable influence within a network by virtue of their control over information (or over whatever else flows on the network) passing between others. They are also the ones whose removal from the network will most disrupt communications between other vertices because they lie on the largest number of paths between other nodes.
Typically, only geodesic paths are considered in the definition, obtaining a measure that is called shortest-path betweenness. Here, we study current-flow betweenness, which includes contributions of all paths, although longer paths give a lesser contribution (Newman, 2005; Brandes and Fleischer, 2005). For a given node, current-flow betweenness measures the current flow that passes through the vertex when a unit of current is injected in a source node and removed from a target node, averaged over all source-target pairs. Equivalently, it is equal to the net number of times that a random walk on the graph passes through the node on its journey, averaged over a large number of trials of the random walk.
We next give the precise definition of current-flow betweenness centrality in terms of resistor networks. Consider a network in which the edges are resistors and the nodes are junctions between resistors. Each edge is assigned with a positive weight indicating the conductance of the edge. The resistance of an edge is the inverse of its conductance. Outlets are particular nodes where current enters and leaves the network. A vector called supply defines them: a node such that is an outlet; in particular, if then node is a source and current enters the network through it, while if then node is a target and current leaves the network through it. Since there should be as much current entering the network as leaving it, we have that . We consider the case where a unit of current enters the network at a single source and leaves it at a single target . That is, for , , and . We are interested in how current flows through the network, for an arbitrary choice of source and target outlets.
Let be the potential of node , measured relative to any convenient reference potential, for source and target outlets. Kirchhoff’s law of current conservation implies that the node potentials satisfy the following equation for every node :
where is the weighted adjacency matrix of the network. The current flow through edge is the quantity , that is, the difference of potentials between the involved nodes multiplied by the conductance of the edge: a positive value indicates that the current flows in a direction (say from to ), and negative value means that the current flows in the opposite direction. Hence, Krichhoff’s law states that the current flowing in or out of any node is zero, with the exception of the source and target nodes.
In matrix form, equation 6 reads:
where is a diagonal matrix such that the -th element of the diagonal is equal to , that is, it is the (generalized) degree of node . Recall that is the graph Laplacian matrix. As noticed in Section 2, if is the generalized inverse of the Laplacian matrix , then the potential vector:
This means that the potential of node with respect to source and target outlets is given by . Therefore, the generalized inverse matrix contains information to compute all node potentials for any pair of source-target nodes.
An example of resistor network with node potential solution is provided in Figure 1. Notice that Kirchhoff’s law is satisfied for each node. For instance, the current entering in node B is 0.47 (from node A) which equals the current leaving node B, which is again 0.47 (0.13 to E, 0.27 to F, and 0.07 to C). Moreover, the current leaving the source node A is 1, and the current entering the target node H is also 1. Notice that there is no current on the edge from C to D, since both nodes have the same potential. Any other potential vector obtained from the given solution by adding a constant is also a solution, since the potential differences remain the same, and hence Kirchhoff’s law is satisfied. The given potential vector is, however, the solution with minimum Euclidean norm.
We have now all the ingredients to define current-flow betweenness centrality. As observed above, given a source and a target , the absolute current flow through edge is the quantity . By Kirchhoff’s law the current that enters a node is equal to the current that leaves the node. Hence, the current flow through a node different from the source and a target is half of the absolute flow on the edges incident in :
Moreover, the current flows and through both and are set to 1, if end-points of a path are considered part of the path (this is our choice in the rest of this paper), or to 0 otherwise. Figure 2 gives an example. Notice that the flow from A to H through node G is 1 (all paths from A to H pass eventually through G), the flow through F is 0.4 (a proper subset of the paths from A to H go through F and these paths are generally longer than for G), and the flow through E is 0.13 (a proper subset of the paths from A to H go through E and these paths are generally longer than for F)
Finally, the current-flow betweenness centrality of node is the flow through averaged over all source-target pairs :
Since the potential , with the generalized inverse of the graph Laplacian, Equation 9 can be expressed in terms of elements of as follows:
Hence, if we replace in Equation 11 matrix with its -th cutoff approximation (or its -th stretch approximation ), we get an approximated value of current-flow betweenness centrality for node .
The computational complexity of Equation 10 is as follows. We denote with the number of neighbors of node , that is, the number of edges incident in . Assuming we have matrix , computing Equation 11 for given , , and costs , if has at least one neighbor, or otherwise. Hence Equation 10 for a specific costs . Since , Equation 10 for all nodes costs , that is, if the graph is sparse.111This cost can be improved to , hence for sparse networks, as shown in Brandes and Fleischer (2005). Furthermore, the complexity of computing the equation using either cutoff or stretched approximations increases of a factor of , since an entry of or of can be obtained in . If is contant with respect to , then there is no asymptotic increase of complexity. Clearly, if is large, the cost of Equation 10 is prohibitive.
A possible solution for the bottleneck of Equation 10 is the adoption of a probabilistic approach by computing Equation 11 only for a sample of source-target pairs (Brandes and Fleischer, 2005). If we choose at random source nodes, with , and, for each source, we choose at random target nodes, then the sample of source-target pairs has size . For instance, if , then and hence the cost of computing Equation 10 declines of two orders of magnitude.
An alternative solution to avoid the bottleneck of Equation 10 is to use cutoff approximation with (only the second smallest eigenpair is used). In this case Equation 10 significantly simplifies. Indeed, if , then . It follows that flow can be approximated by:
Notice that the sum in the formula for the flow is now independent on the source-target pair. Hence the approximated betweenness of is:
where is a constant. This means that the approximated betweenness is proportional to the quantity . Notice that is a local version of that depends only on the neighborhood of . Ignoring the multiplicative constant , the approximated betweenness can be computed in for all nodes, hence in linear time with respect to the size of the network.
We conclude this section by computing exact and approximated betweenness centrality on a real network. The instance is a social network of dolphins (Tursiops truncatus) belonging to a community that lives in the fjord of Doubtful Sound in New Zealand. The unusual conditions of this fjord, with relatively cool water and a layer of fresh water on the surface, have limited the departure of dolphins and the arrival of new individuals in the group, facilitating a strong social relationship within the dolphin community. The network is an undirected unweighted graph containing 62 nodes (dolphins) and 159 non-directional connections between pairs of dolphins. Two dolphins are joined by an edge if, during the observation period lasted from 1994 to 2001, they were spotted together more often than expected by chance. This network has been extensively studied by David Lusseau and co-authors, see, for instance, (Lusseau and Newman, 2004).
The ranking obtained with the cutoff approximation method using only three eigenpairs of the graph Laplacian correlates at 0.92 with the exact ranking, and the mean change of rank between the two compilations is 5.4. Moreover, the stretch approximation method performs even better. Using the same number of eigenpairs (three), the approximated and exact rankings correlate at 0.99 and the mean change of rank between the two compilations is just 2 positions. Figure 3 depicts the dolphin social network where the size of the node is proportional to its exact current-flow betweenness (graph on the left) and to its approximated current-flow betweenness (graph on the right). Moreover, Table 1 shows the top-10 of both compilations and Figure 4 gives the scatter plot of the two rankings. Nine dolphins over 10 are present in both top-10 rankings: the missing dolphins (DN63 and Grin) both rank 11th in the other ranking. Six dolphins have the same rank in both compilations (notably, the top-4 is the same). In fact, the stretch approximation method is already effective using just one eigenpair (correlation at 0.98 and mean change at 2.9). These outcomes suggest that the proposed approximation methods for betweenness, in particular the stretch version, are effective.
6 Experimental evaluation
This section is devoted to the experimental evaluation of effectiveness and efficiency of the proposed approximations for the generalized inverse of the Laplacian of a graph, and in particular for the current-flow betweenness centrality scores of nodes.
6.1 Experimental setting
In our experiments, we took advantage of the following two well-known network models. The first model is the random model, also known as Erdős-Rényi model (ER model for short) (Solomonoff and Rapoport, 1951; Erdős and Rényi, 1960). According to this model, a network is generated by laying down a number of nodes and adding edges between them with independent probability for each node pair. This model generates a small world network with a binomial node degree distribution, which is peaked at a characteristic scale value. We will denote a graph drawn according to the ER model by , where is the number of nodes and is the edge probability. Such a graph has roughly edges. An ER graph is not necessarily connected, but it contains a giant connected component containing most of the nodes as soon as . We extracted the giant component from the generated graphs and used this component in our experiments.
However, many real networks are scale-free, that is, the degree distribution has a long-tail that roughly follows a power law (Newman, 2010). A network model that captures the power law distribution of node degrees is known as cumulative advantage (Simon, 1955; de Solla Price, 1976), which was later rediscovered and further investigated under the name preferential attachment or Barabási-Albert model (BA model for short) (Barabási and Albert, 1999; Barabási et al., 1999). Such a model, as described by Barabási and Albert (1999), has the following two main ingredients:
Startup: the graph has a single isolated node;
Growth: additional nodes are added to the network one at a time;
Preferential attachment: each node connects to the existing nodes with a fixed number of links. The probability that it will choose a given node is proportional to the number of links the chosen node already has.
The resulting network is a small-world graph with a power law degree distribution: most of the nodes (the trivial many) will have low degree and a small but significant share of nodes (the vital few or hubs) will have an extraordinary high degree. We will denote a graph drawn according to the BA model as , where is the number of nodes and is the number of edges attached to each node added during the preferential attachment step. Such a graph has edges. Notice that the resulting graph is connected but it is not necessarily simple: multiple edges might exist between the same pair of nodes. We simplified the graph in our experiments, removing the multiple edges, if any.
In this section we give results for unweighted networks only. We also experimented with weighted graphs with edge weights drawn from a random uniform distribution, but we noticed no particular discrepancy with respect to the unweighted case.
The experiments have been performed within the R computing environment, taking advantage of the igraph package for the generation of networks. We used of R interface to LAPACK (Linear Algebra Package) (Anderson et al., 1999) for matrix inversion and eigenpairs generation, as well as the R interface to ARPACK (Arnoldi Package) (Lehoucq et al., 1998), when few eigenpairs of large sparse matrices are necessary. Moreover, we exploited the C programming language for a faster implementation of the computation of betweenness scores using Equation 10, calling the C compiled code from the R environment (R is rather slow at iterative algorithms that require loops iterated many times). All experiments are run on a MacBook Pro equipped with a 2.3 GHz Intel Core i5 processor, 4 GB 1333 MHz DDR3 memory, running Mac OS X Lion 10.7.2.
6.2 Distribution of eigenvalues and of node degrees
We begin the experimental investigation with an exploratory analysis of the distribution of eigenvalues of the Laplacian, comparing with the distribution of node degrees on both ER and BA networks. This study is interesting to understand the behavior of the proposed approximations.
The exploratory experiment uses graphs and . Notice that the used graphs have the same number of nodes and approximately the same number of edges, hence a similar edge density. Nevertheless, they differ in the way the edges are placed among nodes.
Figures 5 and 6 explore the distribution of eigenvalues of the Laplacian as well as the distribution of node degrees on ER and BA graphs, respectively. We notice that the eigenvalues of the Laplacian are well approximated by the degrees of the nodes, an interesting phenomenon already investigated in Zhan et al. (2010). For the ER model, eigenvalues and degrees have a correlation of 0.989; the Euclidean distance between the two sorted vectors, relative to the norm-2 of the eigenvalue vector, is 0.13. Using a BA model, eigenvalues and degrees are correlated at 0.983 and the relative Euclidean distance between the two sorted vectors is 0.02. Figure 7 shows a scatter plot of Laplacian eigenvalues versus node degrees.
The plots of the sorted eigenvalues of the Laplacian on ER and BA models are visibly different (Figure 8, left plot). In the initial part (up to index 800), ER eigenvalues grow faster than BA eigenvalues; however, in the tail of the plot, BA eigenvalues rapidly catch up, overtaking ER eigenvalues around index 960. From here to the end of the sequence, BA eigenvalues literally explode (notice that, in order to appreciate the behavior of smaller eigenvalues, the figure shows the eigenvalue curve up to index 980). The eigenvalues of the generalized inverse of the Laplacian, which are the inverse of the Laplacian eigenvalues, are shown in Figure 8, right plot. Both curves decrease rapidly in the beginning, but the ER curve has a more significant drop; after this initial fall, both curves decreases gently, although the BA line declines faster. This might be a hint of the effectiveness of the stretch approximation method: the tail of the eigenvalue curve has a small slope, hence the eigenvalues of the tail can be well approximated by a single middle value.
6.3 Effectiveness of approximations
In this section we investigate the effectiveness of approximations of the generalized inverse of the Laplacian and, in particular, of the current-flow betweenness rankings. We first study the relative 2-norm error of the -th cutoff and stretch approximations, defined as the ratio between and (cutoff approximation error), and the ratio between and (stretch approximation error). Figure 9 shows the approximation error as the number of eigenpairs grows from to . On both network models, the stretch approximation is significantly more effective than the cutoff one: on average, the stretch approximation error is 30% and 50% of the cutoff approximation error on ER and BA graphs, respectively. In particular, the stretch approximation is more effective on ER graphs, as predicted by the spectral behavior highlighted in Figure 8.
We next test the effectiveness of the approximations applied to current-flow betweenness. The quality of the approximations is establishing by computing the correlation coefficient among the approximated and exact node rankings. We used the Pearson product-moment correlations, where the input data is logarithmically transformed when it is not normally distributed. This coefficient is largely used in sciences to measure the similarity of two rankings; it runs from to where values close to indicate negative correlation (one ranking is the reverse of the other), values close to correspond to null correlation (the two rankings are independent), and values close to denote positive correlation (the two rankings are similar). A good approximation, in our assessment, is a method that approaches the exact ranking with correlation of at least 0.9.
The effectiveness experiments are run on ER and BA graphs with 100 nodes and an increasing number of edges: we used for the the ER model and for the BA model. The corresponding ER and BA graphs have roughly the same density. Notice that a BA graph with is a tree, hence an acyclic graph, while generates graphs with loops. We always generated a sample of 100 such graphs and took the average of the observed correlation coefficients.
Figure 10 shows the effectiveness of the cutoff approximation for betweenness on ER and BA graphs with increasing density. The quality of the approximation increases with the number of eigenpairs that are used in the approximation and decreases, for both network models, as the graphs become more dense. A cumulative comparison between cutoff and stretch approximations as well as among ER and BA network models is given in Figures 11. The stretch approximation performs neatly better than the cutoff one. Regardless of the graph model and of the graph density, the effectiveness of the stretch method is well above the quality threshold of 0.9 already with a single eigenpair.
Finally, we tested the probabilistic approximation of betweenness that uses only a sample of node pairs for solving Equation 10. In Figures 12 we compare the probabilistic approach on both the exact version of betweenness, that uses the full generalized Laplacian inverse matrix for the computation of betweenness scores, and the stretch version of betweenness, that uses the stretch approximation method with one eigenpair only. The plot shows the effectiveness of the combined approximation algorithm that uses the stretch method (with a single eigenpair) for the approximation of the generalized Laplacian inverse and the probabilistic approach for the computation of betweenness scores: less than 10% of the node pair space is sufficient to get an approximated ranking that correlates at 0.9 with the exact ranking.
6.4 Efficiency of approximations
In this section we show the outcomes of experiments about the efficiency of the proposed approximations for the generalized inverse of the Laplacian matrix. We compare elapsed time of the computation of the full generalized inverse of the Laplacian with that of the computation of the approximation that uses only one eigenpair (the second smallest eigenvalue and the corresponding eigenvector).
Figure 13 shows the elapsed time necessary for the computation of the generalized Laplacian inverse for BA and ER networks. The time growth is and the memory requirement is on both models, where is the number of nodes of the graph. These costs make the computation not feasible on large networks. For instance, for a large network with nodes, the necessary time would be 642 days with a storage of about one terabyte. On the other hand, Figure 14 gives the elapsed time necessary for the computation of the approximation that uses only one eigenpair for BA and ER networks. For both models, the time growth is and the memory requirement is . Notice that the computation on ER graphs is one order of magnitude more efficient compared to BA graphs. These complexity make it possible to approximate the generalized inverse of the Laplacian, and in particular current-flow betweenness scores, on relatively large networks: on a standard machine, the computation on a random network with nodes would last less than one hour and it would take about 12 hours (a reasonable time) on a scale-free network with the same number of nodes. In both cases the required storage is only one megabyte.
We have proposed effective and efficient methods for approximating the generalized inverse Laplacian matrix of a graph, a matrix that arises in many graph-theoretic applications. A notable such application investigated in the present contribution is current-flow betweenness centrality, a measure for finding central nodes in a graph that lie between many other nodes. Our approximation methods turn out to be suitable when the network at hand is large, so that computing the full generalized inverse is not realistic.
Many other applications can be sought, for instance, the approximation of resistance distance matrix of a graph. Resistance distance is a metric on the graph, alternative to the classical shortest-path distance, that was defined, independently, by Stephenson and Zelen (1989), following an information-theoretic approach, and by Klein and Randić (1993), following an electrical-theoretic approach. The resistance distance notion has different interesting interpretations and many applications, even outside of network science (Ghosh et al., 2008). It turns out that resistance distance matrix can be immediately defined in terms of the generalized inverse Laplacian matrix (Ghosh et al., 2008), allowing us to extend our approximation methods to resistance distance matrix as well.
- Aho et al. (1974) Aho, A. V., Hopcroft, J. E., Ullman, J. D., 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley Publishing Company.
- Anderson et al. (1999) Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J. D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D., 1999. LAPACK users guide, 3rd Edition. SIAM, Philadelphia.
- Barabási and Albert (1999) Barabási, A.-L., Albert, R., 1999. Emergence of scaling in random networks. Science 286, 509–512.
- Barabási et al. (1999) Barabási, A.-L., Albert, R., Jeong, H., 1999. Mean-field theory for scale-free random networks. Physica A 272 (1-2), 173–187.
- Ben-Israel and Greville (2003) Ben-Israel, A., Greville, T. N., 2003. Generalized Inverses. Theory and applications, 2nd Edition. Springer, New York.
- Bergamaschi and Putti (2002) Bergamaschi, L., Putti, M., 2002. Numerical comparison of iterative eigensolvers for large sparse symmetric positive definite matrices. Computer Methods in Applied Mechanics and Engineering 191 (45), 5233–5247.
- Borgatti (2005) Borgatti, S. P., 2005. Centrality and network flow. Social Networks 27 (1), 55â71.
- Brandes and Erlebach (2005) Brandes, U., Erlebach, T. (Eds.), 2005. Network Analysis: Methodological Foundations. Vol. 3418 of Lecture Notes in Computer Science. Springer.
- Brandes and Fleischer (2005) Brandes, U., Fleischer, D., 2005. Centrality measures based on current flow. In: STACS. pp. 533–544.
- de Solla Price (1976) de Solla Price, D., 1976. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science 27, 292–306.
- Erdős and Rényi (1960) Erdős, P., Rényi, A., 1960. Connectivity of random nets. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61.
- Freeman (1979) Freeman, L. C., 1979. Centrality in networks: I. conceptual clarification. Social Networks 1, 215â239.
- Ghosh et al. (2008) Ghosh, A., Boyd, S., Saberi, A., 2008. Minimizing effective resistance of a graph. SIAM Review 50 (1), 37–66.
- Golub and Meurant (2010) Golub, G. H., Meurant, G., 2010. Matrices, moments and quadrature with applications. Princeton Series in Applied Mathematics. Princeton University Press.
- Golub and Van Loan (1996) Golub, G. H., Van Loan, C. F., 1996. Matrix Computations, 3rd Edition. The Johns Hopkins University Press, Baltimore.
- Hu and Scott (2003) Hu, Y., Scott, J. A., 2003. HSL_MC73: A fast multilevel Fiedler and profile reduction code. Tech. Rep. RAL-TR-2003-036, Rutherford Appleton Laboratory.
- Klein and Randić (1993) Klein, D. J., Randić, M., 1993. Resistance distance. Journal of Mathematical Chemistry 12 (1), 81–95.
- Lehoucq et al. (1998) Lehoucq, R. B., Sorensen, D. C., Yang, C., 1998. ARPACK users guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. SIAM, Philadelphia.
- Lusseau and Newman (2004) Lusseau, D., Newman, M. E. J., 2004. Identifying the role that animals play in their social networks. Proceedings of the Royal Society B: Biological Sciences 271, S477–S481.
- Newman (2005) Newman, M., 2005. A measure of betweenness centrality based on random walks. Social Networks 27 (1), 39–54.
- Newman (2010) Newman, M. E. J., 2010. Networks: An introduction. Oxford University Press.
- Newman et al. (2006) Newman, M. E. J., Barabási, A.-L., Watts, D. J., 2006. The Structure and Dynamics of Networks. Princeton University Press.
- Raz (2003) Raz, R., 2003. On the complexity of matrix product. SIAM Journal on Computing 32 (5), 1356–1369.
- Saad (2003) Saad, Y., 2003. Iterative methods for sparse linear systems, 2nd Edition. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
- Sabidussi (1966) Sabidussi, G., 1966. The centrality index of a graph. Psychometrika 31, 581â603.
- Simon (1955) Simon, H. A., 1955. On a class of skew distribution functions. Biometrika 42, 425–440.
- Solomonoff and Rapoport (1951) Solomonoff, R., Rapoport, A., 1951. Connectivity of random nets. Bulletin of Mathematical Biophysics 13, 107–117.
- Stephenson and Zelen (1989) Stephenson, K., Zelen, M., 1989. Rethinking centrality: Methods and examples. Social Networks 11 (1), 1–37.
- Zhan et al. (2010) Zhan, C., Chen, G., Yeung, L. F., 2010. On the distributions of Laplacian eigenvalues versus node degrees in complex networks. Physica A: Statistical Mechanics and its Applications 389 (8), 1779–1788.