Co-Betweenness: A Pairwise Notion of Centrality
Betweenness centrality is a metric that seeks to quantify a sense of the importance of a vertex in a network graph in terms of its ‘control’ on the distribution of information along geodesic paths throughout that network. This quantity however does not capture how different vertices participate together in such control. In order to allow for the uncovering of finer details in this regard, we introduce here an extension of betweenness centrality to pairs of vertices, which we term co-betweenness, that provides the basis for quantifying various analogous pairwise notions of importance and control. More specifically, we motivate and define a precise notion of co-betweenness, we present an efficient algorithm for its computation, extending the algorithm of brandes01 () in a natural manner, and we illustrate the utilization of this co-betweenness on a handful of different communication networks. From these real-world examples, we show that the co-betweenness allows one to identify certain vertices which are not the most central vertices but which, nevertheless, act as important actors in the relaying and dispatching of information in the network.
In social network analysis, the problem of determining the importance of actors in a network has been studied for a long time (see, for example, Wasserman:1994 ()). It is in this context that the concept of the centrality of a vertex in a network emerged. There are numerous measures that have been proposed to numerically quantify centrality which differ both in the nature of the underlying notion of vertex importance that they seek to capture, and in the manner in which that notion is encoded through some functional of the network graph. See borgatti.everett (), for example, for a recent review and categorization of centrality measures.
Paths – as the routes by which flows (e.g., of information or commodities) travel over a network – are fundamental to the functioning of many networks. Therefore, not surprisingly, a number of centrality measures quantity importance with respect to the sharing of paths in the network. One popular measure is betweenness centrality. First introduced in its modern form by freeman77 (), the betweenness centrality is essentially a measure of how many geodesic (ie., shortest) paths run over a given vertex. In other words, in a social network for example, the betweenness centrality measures the extent to which an actor “lies between” other individuals in the network, with respect to the network path structure. As such, it is a measure of the control that actor has over the distribution of information in the network.
The betweenness centrality – as with all other centrality measures of which we are aware – is defined specifically with respect to a single given vertex. In particular, vertex centralities produce an ordering of the vertices in terms of their individual importance, but do not provide insight into the manner in which vertices act together in the spread of information across the network. Insight of this kind can be important in presenting an appropriately more nuanced view of the roles of the different vertices, beyond their individual importance. A first natural extension of the idea of centrality in this manner is to pairs of vertices.
In this paper, we introduce such an extension, which we term the co-betweenness centrality, or simply the co-betweenness. The co-betweenness of two vertices is essentially a measure of how many geodesic paths are shared by the vertices, and as such provides us with a sense of the interplay of vertices across the network. For example, the co-betweenness alone quantifies the extent to which pairs of vertices jointly control the distribution of information in the network. Alternatively, a standardized version of co-betweenness produces a well-defined measure of correlation between flows over the two vertices. Finally, an alternative normalization quantifies the extent to which one vertex controls the distribution of information to another vertex.
This paper is organized as follows. In Section II, we briefly review necessary technical background. In Section III, we provide a precise definition for the co-betweenness and related measures, and motivate each in the context of an Internet communication network. An algorithm for the efficient computation of co-betweenness, for all pairs of vertices in a network, is sketched in Section IV, and its properties are discussed. In Section V, we further illustrate our measures using two social networks whose ties are reflective of communication. Some additional discussion is provided in Section VI. Finally, a formal description of our algorithm, as well as pseudo-code, may be found in the appendix.
Let denote an undirected, connected network graph with vertices in and edges in . A walk on , from a vertex to another vertex , is an alternating sequence of vertices and edges, say , where the endpoints of are . The length of this walk is said to be . A trail is a walk without repeated edges, and a path, a trail without repeated vertices. A shortest path between two vertices is a path between and whose length is a minimum. Such a path is also called a geodesic and its length, the geodesic distance between and . In the case that the graph is weighted i.e., there is a collection of edge weights , where , shortest paths may be instead defined as paths for which the total sum of edge weights is a minimum. In the material that follows, we will restrict our exposition primarily to the case of unweighted graphs, but extensions to weighted graphs are straightforward. For additional background of this type, see, for example, the textbook Clark:1991 ().
Let denote the total number of shortest paths that connect vertices and (with ), and let denote the number of shortest paths between and that also run over vertex . Then we define the betweenness centrality of a vertex as a weighted sum of the number of paths through ,
Note that this definition excludes the shortest paths that start or end at . However, in a connected graph we will have whenever or , so the exclusion amounts to removing a constant term that would otherwise be present in the betweenness centrality of every vertex.
As an illustration, which we will use throughout this section and the next, consider the network in Figure 1.
This is the Abilene network, an Internet network that is part of the Internet2 project address0, a research project devoted to development of the ‘next generation’ Internet. It serves as a so-called ‘backbone’ network for universities and research labs across the United States, in a manner analogous to the federal highway system of roads. We use this network for illustration because, as a technological communication network, the notions of connectivity, information, flows, and paths are all explicit and physical, and hence facilitate our initial discussion of betweenness and co-betweenness. Later, in Section V, we will illustrate further with two communication networks from the social network literature.
The information traversing this network takes the form of so-called ‘packets’, and the packets flow between origins and destinations on this network along paths strictly determined according to a set of underlying routing protocols (Technically, the Abilene network is more accurately described by a directed graph. But, given the fact that routing is typically symmetric in this network, we follow the Internet2 convention of displaying Abilene using an undirected graph.). A reasonable first approximation of the routing of information in this network is with respect to a set of unique shortest paths. In this case, the betweenness of any given vertex will be exactly equal to the number of shortest paths through . The vertices in Figure 1 correspond to metropolitan regions, and have been laid out roughly with respect to their true geographical locations. Intuitively and according to earlier work on centrality in spatial networks Barrat:2005 (), one might suspect that vertices near the central portion of the network, such as Denver or Indianapolis, have larger betweenness, being likely forced to support most of the flows of communication between east and west. We will see in Section III that such is indeed the case.
Until recently, standard algorithms for computing betweenness centralities for all vertices in a network had running times, which was a stumbling block to their application in large-scale network analyses. Faster algorithms now exist, such as those introduced in brandes01 (), which have running time of on unweighted networks and on weighted networks, with an space requirement. These improvements derive from exploiting a clever recursive relation for the partial sums . As we will see, the need for efficient algorithms is even more important in the case of the co-betweenness, and we will make similar usage of recursions in developing an efficient algorithm for computing this quantity.
We extend the concept of vertex betweenness centrality to pairs of vertices and by letting denote the number of shortest paths between vertices and that pass through both and , and defining the vertex co-betweenness as
Thus co-betweenness gives us a measure of the number of shortest paths that run through both vertices and .
To gain some insight into the relation between betweenness and co-betweenness, consider the following statistical perspective. Recall the Abilene network described in the previous section, and suppose that is a measure of the information (i.e., Internet packets) flowing between vertices and in the network. Similarly, let be the total information flowing through vertex . Next, define to be the vector of values , where is the total number of pairs of vertices exchanging information, and , to be the vector of values . A common expression modeling the relation between these two quantities is simply , where is an matrix (i.e., the so-called ‘routing matrix’) of ’s and ’s, indicating through which vertices each given routed path goes.
Now if is considered as a random variable, with uncorrelated elements, then its covariance matrix is simply equal to the identity matrix. The elements of , however, will be correlated, and their covariance matrix takes the form , by virtue of the linear relation between and . Importantly, note that the diagonal elements of are the betweenness’ . Furthermore, the off-diagonal elements are the co-betweenness’ . When shortest paths are not unique, the same results hold if the matrix is expanded so that each shortest path between a pair of vertices and is afforded a separate column, and the non-zero entries of each such column has the value , rather than . In this case, may be interpreted as a stochastic routing matrix.
To illustrate, in Figure 2, we show a network graph representation of the matrix for the Abilene network.
The vertices are again placed roughly with respect to their actual geographic location, but are now drawn in proportion to their betweenness. Edges between pairs of vertices now represent non-zero co-betweenness for the pair, and are also drawn with a thickness in proportion to their value. A number of interesting features are evident from this graph. First, we see that, as surmised earlier, the more centrally located vertices tend to have the largest betweenness values. And it is these vertices that typically are involved with the larger co-betweenness values. Since the paths going through both a vertex and a vertex are a subset of the paths going through either one or the other, this tendancy for large co-betweenness to associate with large betweenness should not be a surprise. Also note that the co-betweenness values tend to be smaller between vertices separated by a larger geographical distance, which again seems intuitive.
Somewhat more surprising perhaps, however, is the manner in which the network becomes disconnected. The Seattle vertex is now isolated, as there are no paths that route through that vertex – only to and from. Additionally, the vertices Houston, Atlanta, and Washington now form a separate component in this graph, indicating that information is routed on paths running through both the first two and the last two, but not through all three, and also not through any of these and some other vertex. Overall, one gets the impression of information being routed primarily over paths along the upper portion of the network in Figure 1. A similar observation has been made in chua05:kriging (), using different techniques.
While the raw co-betweenness values appear to be quite informative, one can imagine contexts in which it would be useful to compare co-betweenness’ across pairs of vertices in a manner that adjusts for the unequal betweenness of the participating vertices. The value
is a natural candidate for a standardized version of the co-betweeness in (2), being simply the corresponding entry of the correlation matrix deriving from .
Figure 3 shows a network graph representation of the quantities in for the Abilene network, with edges again drawn in proportion to the values and vertices now naturally all drawn to be the same size.
Much of this network looks like that in Figure 2. The one notable exception is that the magnitude of the values between the three vertices in the lower subgraph component are now of a similar order to most of the other values in the other component. This fact may be interpreted as indicating that among themselves, adjusting for the lower levels of information flowing through this part of the network, these vertices are as strongly ‘correlated’ as many of the others.
The co-betweenness may also be used to define a directed notion of the strength of pairwise relationships. Let
denote the relative proportion of shortest paths through that also go through . This quantity may be interpreted as a measure of the control that vertex has over the information that passes through vertex . Alternatively, under uniqueness of shortest paths, if from among the set of shortest paths through one is chosen uniformly at random, the value is the probabilty that the chosen path will also go through . We call the conditional betweenness of , given . Note that, in general, .
Figure 4 shows a graph representation of the values for the Abilene network.
Due to the asymmetry of these values in and , arcs are used, rather than edges, with an arc from to corresponding to . The thickness of the arcs is proportional to these values, and is therefore indicative of the control exercised on the vertex at the tail by the vertex at the head. For improved visualization, we have used a simple circular layout for the vertices. Examination of this figure shows symmetry in the relationships between some pairs of vertices, but a strong asymmetry between most others. For example, vertices like Indianapolis, which were seen previously to have a large betweenness, clearly exercise a strong degree of control over almost any other vertices with which they share paths. More interestingly, note that certain vertices that are neighbors in the original Abilene network have more symmetric relationships than others. The conditional betweenness’ for Atlanta and Washington, DC, are fairly similar in magnitude, while those for Los Angeles and Sunnyvale are quite dissimilar, with the latter evidently exercising a noticeably greater degree of control over the former.
Iv Computation of Co-Betweenness
We discuss here the calculation of the co-betweenness values in (2), for all pairs , from which the other quantities in (3) and (4) follow trivially. At a first glance, it would appear that an algorithm of running time is necessary, given that the number of vertex pairs grows as the square of the number of vertices. Such an implementation would render the notion of co-betweenness infeasible to implement in any but network graphs of relatively modest size. However, exploiting ideas similar to those underlying the algorithms of brandes01 () for calculating the betweenness’ , a decidedly more efficient implementation may be obtained, as we now describe briefly. Details may be found in the appendix.
Our algorithm for computing co-betweenness involves a three-stage procedure for each vertex . In the first stage, we perform a breadth-first traversal of the network graph , to quickly compute intermediary quantities such as , the number of shortest paths from a source to each other vertex in the network; in the process we form a directed acyclic graph that contains all shortest paths leading from vertex . In the second stage, we iterate through each vertex in order of decreasing distance from and compute a score for each vertex that is related to its contribution to the co-betweenness. These contributions are then aggregated in a depth-first traversal of the directed acyclic graph, which is carried out in the third and final stage.
In order to compute the number of shortest paths in the first stage, we note that the number of shortest paths from to a vertex is the sum of all shortest paths to each parent of in the directed acyclic graph rooted at , namely,
In the case of an undirected graph, this can be computed in the course of a breadth-first search with a running time of .
In the second stage, we compute using the recursive relation established in Theorem 6 of brandes01 (),
where denotes the set of child vertices of in the directed acyclic graph rooted at .
Finally, in the third stage, we compute the co-betweennesses by interpreting the relation
as assigning a contribution of to for each of the shortest paths to that run through . We accumulate these contributions at each step of the depth-first traversal when we visit a vertex by adding to for every ancestor of the current vertex .
Our proposed algorithms exploit recursions analogous to those of brandes01 () to produce run-times that are in the worst case , but in empirical studies were found to vary like in general, or in the case of sparse graphs. Here is related to the total number of shortest paths in the network and seems to lie comfortably between and in our experience. In the case of unique shortest paths, it may be shown rigorously that the running time reduces to , and if the network is sparse as well as ‘small-world’ (i.e., with diameter of size ). See the appendix for details.
V Additional Illustrations
We provide in this section additional illustration of the use of co-betweenness, based on two other networks graphs. Both graphs originally derive from social network analyses in which one goal was to understand the flow of certain information among actors.
v.1 Michael’s Strike Network
Our first illustration involves the strike dataset of michael (), which is also analyzed in detail in Chapter 7 of pajek.book (). New management took over at a forest products manufacturing facility, and this management team proposed certain changes to the compensation package of the workers. The changes were not accepted by the workers, and a strike ensued, which was then followed by a halt in negotiations. At the request of management, who felt that the information about their proposed changes was not being communicated adequately, an outside consultant analyzed the communication structure among relevant actors.
The social network graph in Figure 5 represents the communication structure among these actors, with an edge between two actors indicating that they communicated at some minimally sufficient level of frequency about the strike.
Three subgroups are present in the network: younger, Spanish-speaking employees (black vertices), younger, English-speaking employees (gray vertices), and older, English-speaking employees (white vertices). In addition, the two union negotiators, Sam and Wendle, are indicated by asterix’ next to their names. It is these last two that were responsible for explaining the details of the proposed changes to the employees. When the structure of this network was revealed, two additional actors – Bob and Norm – were approached, had the changes explained to them, which they then discussed with their colleagues, and within two days the employees requested that their union representatives re-open negotiations. The strike was resolved soon thereafter.
That such a result could follow by targeting Bob and Norm is not entirely surprising, from the perspective of the network structure. Both are cut-vertices (i.e., their removal would disconnect the network), and are incident to edges serving as bridges (i.e., their removal similarly would disconnect the network) from their respective groups to at least one of the other groups.
Co-betweenness provides a useful alternative characterization, one which explicitly emphasizes the patterns of communication in the network, as shown in Figure 6.
As with Figure 2, vertices (now arranged in a circular layout) are drawn in proportion to their betweenness, and edges, to their co-betweenness. Bob and Norm clearly have the largest betweenness values, followed by Alejandro, who we remark also is a cut-vertex, but incident to a bridge to a smaller subnetwork than the other two (i.e., four younger Spanish-speakers, in comparison to nine younger English-speakers and 11 older English-speakers, for Bob and Norm, respectively). The importance of these three actors on the communication process is evident from the distinct triangle formed by their large co-betweenness values. Note that for the two union representatives, the co-betweenness values suggest that Sam also plays a non-trivial role in facilitating communication, but that Wendle is not well-situated in this regard. In fact, Wendle is not even connected to the main component of the graph, since his betweenness is zero (as is also true for six other actors).
A plot of the standardized co-betweenness shows similar patterns overall, and we have therefore not included it here. The conditional betweenness for this network primarily shows most of the actors with large arcs pointing to Bob and Norm, and much smaller arcs pointing the opposite direction. This pattern further confirms the influence that these two actors can have on the other actors in the communication process. However, there are also some interesting asymmetrical relationships among the actors with smaller parts. For example, consider Figure 7, which shows the conditional betweenness among the older English-speaking employees.
Ultrecht, for example, clearly has potential for a large amount of control on the communication of information passing through Russ, and similarly, Karl, on that through John.
v.2 Zachary’s Karate Club Network
Our second illustration uses the karate club dataset of zachary77 (). Over the course of a couple of years in the 1970s, Zachary collected information from the members of a university karate club, including the number of situations (both inside and outside of the club) in which interactions occurred between members. During the course of this study, there was a dispute between the club’s administrator and the principal karate instructor. As a result, the club eventually split into two smaller clubs of approximately equal size—one centered around the administrator and the other centered around the instructor.
Figure 8 displays the network of social interactions between club members.
The gray vertices represent members of one of the two smaller clubs and the white vertices represent members who went to the other club. The edges are drawn with a width proportional to the number of situations in which the two members interacted. The graph clearly shows that the original club was already polarized into two groups centered about actors 1 and 34, who were the key players in the dispute that split the club in two.
The co-betweenness for this network is shown in Figure 9.
As in Figure 8, the layout is done using an energy minimization algorithm. Again, as in our other examples, the co-betweenness entries are dominated by a handful of larger values. As might be expected, actors 1 and 34, who were at the center of the dispute, have the largest betweenness centralities and are also involved in the largest co-betweenness’. More interesting, however, is the fact that these two actors have a large co-betweenness with each other – despite not being directly connected in the original network graph. This indicates that they are nevertheless involved in connecting a large number of other pairs – probably through key intermediaries such as actors 3 and 32. These latter two actors, while certainly not cut-vertices, nevertheless seem to operate like conduits between the two groups, quite likely due to their direct ties to both actor 1 and either of actors 33 and 34, the latter of which are both central to the group of white vertices. The co-betweenness for actors 1 and 32 is in fact the largest in the entire network.
Also of potential interest are the 14 vertices that are isolated from the network in the co-betweenness representation. Some of these vertices, such as actor 8, have strong social interactions with certain other actors (i.e., with actors 1, 2, 3 and 4), but evidently play a peripheral role in the communication patterns of the network, as evidenced by their lack of betweenness. Alternatively, there are the vertices like those representing actors 5 and 11, who have some betweenness centrality but nonetheless find themselves cut off from the connected component in the co-betweenness graph. An examination of the definition of the co-betweenness tells us that such vertices must be bridge-vertices, in the sense that they only serve to connect pairs of other vertices, i.e., they only occur in the middle of paths of length two.
We introduced in this paper the notion of co-betweenness as a natural and interpretable metric for quantifying the interplay between pairs of vertices in a network graph. As we discussed in different real world examples, this quantity has several interesting features. In particular, unlike the usual betweenness centrality which orders the vertices according to their importance in the information flow on the network, the co-betweenness gives additional information about the flow structure and the correlations between different actors. Using this quantity, we were able to identify vertices which are not the most central ones, but which however play a very important role in relaying the information and which therefore appear as crucial vertices in the control of the information flow.
In principle, of course, one could continue to define higher-order analogues, involving three or more vertices at a time. But the computational requirements associated with calculating such analogues would soon become burdensome. In the case of triplets of vertices, one can expect algorithms analogous to those presented here to scale no better than . Additionally, we remark that, in keeping with the statistics analogy made in Section III, it is likely that the pairwise ‘correlations’ picked up by co-betweenness captures to a large extent the more important elements of vertex interplay in the network, with respect to shortest paths.
Following the tendancies in the statistical physics literature on complex networks Barabasi (); Vespignani (), it can be of some interest to explore the statistical properties of co-betweenness in large-scale networks. Some work in this direction may be found in Chua.thesis (), where co-betweenness and functions thereof were examined in the context of standard network graph models. The most striking properties discovered were certain basic scaling relations with distance between vertices.
On a final note, we point out that, while our discussion here has been focused on co-betweenness for pairs of vertices in unweighted graphs, we have also developed the analogous quantities and algorithms for vertex co-betweenness on weighted graphs and for edge co-betweenness on unweighted and weighted graphs. Also see chua05:kriging (), where a result is given relating edge betweenness to the eigen-values of the matrix edge-betweenness ‘covariance’ matrix, defined in analogy to the matrix in Section III.
This appendix contains details specific to the proposed algorithm for computing co-betweeness, including a derivation of key expressions, a rough analysis of algorithmic complexity. The pseudo-codes can be found at the address address1 (). Actual software implementing our algorithm, written in the Matlab software enviroment, is available at address2 ().
Appendix A Derivation of Key Expressions
Central to our algorithm are the expressions in (6) and (7), the derivations for which we present here. Before doing so, however, we need to introduce some definitions and relations. First note that a simple combinatorial argument will show that
For the the sake of notational simplicity, we will assume, without loss of generality, that
for the remainder of this discussion.
The remaining quantities we need to introduce are notions of the path-dependency of vertices. In the spirit of brandes01 (), we define the “dependency” of vertices and on the vertex pair as
and we define the dependency of alone on the pair of vertices as
Similarly, we define the pair-wise dependency of and on a single vertex as
and the dependency of alone on as
These two relations allow us to show that
We use this result to re-express the co-betweenness defined in (2) as
Lastly, to establish the recursive relation in (6), note that for a child vertex every path to gives rise to exactly one path to by following the edge . This means that
Also note that for we have
This allows us to decompose in essentially the same manner as brandes01 (), namely,
Where the last equality is due to the fact that since is a child of we have and thus .
Appendix B Algorithmic Complexity
Standard breadth-first search results put the running time for the first stage of our algorithm at , and since we touch each edge at most twice when we compute the dependency scores , the running time for the second stage is also . Since we repeat each stage for each vertex in the network, the first two stages have a running time of . The running time for the depth-first traversal, that occurs during the third stage, depends on the number and length of all shortest paths in the network. Overall, we visit every shortest path once and compute a co-betweenness contribution for each edge of every shortest path. For ‘small-world’ networks i.e., networks with an diameter, we must compute contributions, where is the total number of shortest paths in the network. So the overall running time for the algorithm is . Empirical evidence suggests that the upper bound for the average ranges from to for common random graph models, and at worst has been seen to reach in the case of a network of airports. (In the latter case, there were extreme fluctuations in so the total number of shortest paths, , might be much smaller than times this upper bound.) This suggests a running time of , though it is an open question to show this rigorously. In the case of sparse networks, where , this reduces to a running time of .
- (1) U. Brandes, Journal of Mathematical Sociology 25, 163 (2001).
- (2) S. Wasserman and K. Faust, Social Network Analysis: Methods and applications, Cambridge University Press (1994).
- (3) S.P. Borgatti, M.G. Everett, Social Networks 28, 466-484 (2006).
- (4) L.C. Freeman, Sociometry 40, 35-41 (1977).
- (5) J. Clark and D.A. Holton, A first look at graph theory, World Scientific (1991).
- (6) http://www.internet2.edu/
- (7) A. Barrat, M. Barthélemy, A. Vespignani, J. Stat. Mech. (2005) P05003.
- (8) D.B. Chua, E.D. Kolaczyk, M. Crovella, IEEE Journal on Selected Areas in Communications, Special issue on ‘Sampling the Internet’, 24, 2263-2272 (2006).
- (9) J.H. Michael, Forest Products Journal 47, 41-45 (1997).
- (10) W. de Nooy, A. Mrvar, V. Batagelj, Exploratory Social Network Analysis with Pajek, Cambridge University Press (Cambridge, UK, 2005).
- (11) W. Zachary, Journal of Anthropological Research 33, 452-473 (1977).
- (12) R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2000).
- (13) R. Pastor-Satorras and A. Vespignani, Evolution and structure of the Internet: A statistical physics approach (Cambridge University Press, Cambridge, 2003).
- (14) D.B. Chua, PhD thesis (2007).
- (15) http://math.bu.edu/people/kolaczyk/pubs/ChuaThesis/
- (16) http://math.bu.edu/people/kolaczyk/software.html