Faster Betweenness Centrality Updates in Evolving Networks1footnote 11footnote 1This work was partially supported by DFG grant ME-3619/3-1 (FINCA) and Br 2158/11-1 within the SPP 1736 Algorithms for Big Data. A. S. acknowledges support by the RISE program of DAAD.

Faster Betweenness Centrality Updates in Evolving Networks111This work was partially supported by DFG grant ME-3619/3-1 (FINCA) and Br 2158/11-1 within the SPP 1736 Algorithms for Big Data. A. S. acknowledges support by the RISE program of DAAD.

Elisabetta Bergamini Karlsruhe Institute of Technology (KIT), Germany
{elisabetta.bergamini, meyerhenke} @ kit.edu
Henning Meyerhenke Karlsruhe Institute of Technology (KIT), Germany
{elisabetta.bergamini, meyerhenke} @ kit.edu
Mark Ortmann University of Konstanz, Germany
mark.ortmann @ uni-konstanz.de
Arie Slobbe Australian National University, Australia
arieslobbe1 @ gmail.com
July 3, 2019
Abstract

Finding central nodes is a fundamental problem in network analysis. Betweenness centrality is a well-known measure which quantifies the importance of a node based on the fraction of shortest paths going though it. Due to the dynamic nature of many today’s networks, algorithms that quickly update centrality scores have become a necessity. For betweenness, several dynamic algorithms have been proposed over the years, targeting different update types (incremental- and decremental-only, fully-dynamic). In this paper we introduce a new dynamic algorithm for updating betweenness centrality after an edge insertion or an edge weight decrease. Our method is a combination of two independent contributions: a faster algorithm for updating pairwise distances as well as number of shortest paths, and a faster algorithm for updating dependencies. Whereas the worst-case running time of our algorithm is the same as recomputation, our techniques considerably reduce the number of operations performed by existing dynamic betweenness algorithms. Our experimental evaluation on a variety of real-world networks reveals that our approach is significantly faster than the current state-of-the-art dynamic algorithms, approximately by one order of magnitude on average.

Graph algorithms, shortest paths, distances, dynamic algorithms
\Copyright

Elisabetta Bergamini, Henning Meyerhenke, Mark Ortmann and Arie Slobbe\subjclassG.2.2 Graph Theory

1 Introduction

Over the last years, increasing attention has been devoted to the analysis of complex networks. A common sub-problem for many graph based applications is to identify the most central nodes in a network. Examples include facility location [13], marketing strategies [12] and identification of key infrastructure nodes as well as disease propagation control and crime prevention [1]. As the meaning of “central” heavily depends on the context, various centrality measures have been proposed (see [4] for an overview). Betweenness centrality is a well-known measure which ranks nodes according to their participation in the shortest paths of the network. Formally, the betweenness of a node is defined as , where is the number of shortest paths between two nodes and and is the number of these paths that go through node . The fastest algorithm for computing betweenness centrality is due to Brandes [6], which we refer to as BA, from Brandes’s algorithm. This algorithm is composed of two parts: an augmented APSP (all-pairs shortest paths) step, where pairwise distances and shortest paths are computed, and a dependency accumulation step, where the actual betweenness scores are computed. The augmented APSP is computed by running a SSSP (single-source shortest paths) computation from each node and the dependency accumulation is performed by traversing only once the edges that lie in shortest paths between and the other nodes. Therefore, BA requires time on unweighted and time on weighted graphs (i.e. the time of running SSSPs).

Networks such as the Web graph and social networks continuously undergo changes. Since an update in the graph might affect only a small fraction of nodes, recomputing betweenness with BA after each update would be very inefficient. For this reason, several dynamic algorithms have been proposed over the last years [9, 14, 11]. As BA, these approaches usually solve two sub-tasks: the update of the augmented APSP data structures and the update of the betweenness scores. Although none of these algorithms is in general asymptotically faster than recomputation with BA, good speedups over BA have been reported for some of them, in particular for [11] and [14]. Nonetheless, an exhaustive comparison of these methods is missing in the literature.

In our paper, we only consider incremental updates, i.e. edge insertions or edge weight decreases (node insertions can be handled treating the new node as an isolated node and adding its neighboring edges one by one). Although it might seem reductive to only consider these kinds of updates, it is important to note that several real-world dynamic networks evolve only this way and do not shrink. For example, in a co-authorship network, a new author (node) or a new edge (coauthored publication) might be added to the network, but existing nodes or edges will not disappear. Another possible application is the centrality maximization problem, which consists in finding a set of edges that, if added to the graph, would maximize the centrality of a certain node. The problem can be approximated with a heuristic [7], which requires to add several edges to the graph and to recompute distances after each edge insertion.

Our contribution

We present a new algorithm for updating betweenness centrality after an edge insertion or an edge weight decrease. Our method is a combination of two contributions: a new dynamic algorithm for the augmented APSP, and a new approach for updating the betweenness scores. Based on properties of the newly-created shortest paths, our dynamic APSP algorithm efficiently identifies the node pairs affected by the edge update (i.e. those for which the distance and/or number of shortest paths change as a consequence of the update). The betweenness update method works by accumulating values in a fashion similar to that of BA. However, differently from BA, our method only processes nodes that lie in shortest paths between affected pairs.

We compare our new approach with two of the dynamic algorithms for which the best speedups over recomputation have been reported in the literature, i.e. KWCC [11] and KDB [14]. Compared to them, our algorithm for the augmented APSP update is asymptotically faster on dense graphs: in the worst case versus . This is due to the fact that we iterate over the edges between affected nodes only once, whereas KDB and KWCC do it several times. Moreover, our dependency update works also for weighted graphs (whereas KDB does not) and it is asymptotically faster than the dependency update of KWCC for sparse graphs ( in the worst case versus ).

Our experimental evaluation on a variety of real-world networks reveals that our approach is significantly faster than both KDB and KWCC, on average by a factor 14.7 and 7.4, respectively.

2 Preliminaries

2.1 Notation

Let be a graph with node set , edge set and edge weights . In the following we will use to denote the number of nodes and for the number of edges. Let be the shortest-path distance between any two nodes . On a shortest path from to in , we say is a predecessor of , or is a successor of , if and . We denote the set of predecessors of as . For a given source node , we call the graph composed of the nodes reachable from and the edges that lie in at least one shortest path from to any other node the SSSP DAG of . We use to denote the number of shortest paths between and and we use for the number of shortest paths between and that go through . Then, the betweenness centrality of a node is defined as: .

Our goal is to keep track of the betweenness scores of all nodes after an update in the graph, which could either be an edge insertion or an edge weight decrease. We use to denote the new graph after the edge update and , and to denote the new distances, numbers of shortest paths and sets of predecessors, respectively. Also, we define the set of affected sources of a node as . Analogously, we define the set of affected targets of as . In the following we will assume to be directed. However, the algorithms can be easily extended to undirected graphs.

2.2 Related Work

The basic idea of dynamic betweenness algorithms is to keep track of the old betweenness scores (and additional data structures) and efficiently update the information after some modification in the graph. Based on the type of updates they can handle, dynamic algorithms are classified as incremental (only edge insertions and weight decreases), decremental (only edge deletions and weight increases) or fully-dynamic (all kinds of edge updates). However, one commonality of all these approaches is that they build on the techniques used by BA [6], which we therefore describe in Section 3 in more detail.

The approach proposed by Green et al. [9] for unweighted graphs maintains all previously calculated betweenness values and additional information, such as pairwise distances, number of shortest paths and lists of predecessors of each node in the shortest paths from each source node . Using this information, the algorithm tries to limit the recomputation to the nodes whose betweenness has been affected by the edge insertion. Kourtellis et al. [14] modify the approach by Green et al. [9] in order to reduce the memory requirements from to . Instead of being stored, the predecessors are recomputed every time the algorithm requires them. The authors show that not only using less memory allows them to scale to larger graphs, but their approach (which we refer to as KDB, from the authors’s initials) turns out to be also faster than the one by Green et al. [9] in practice (most likely because of the cost of maintaining the data structure of the algorithm by Green et al.).

Kas et al. [11] extend an existing algorithm for the dynamic all-pairs shortest paths (APSP) problem by Ramalingam and Reps [21] to also update betweenness scores. Differently from the previous two approaches, this algorithm can handle also weighted graphs. Although good speedups have been reported for this approach, no experimental evaluation compares its performance with that of the approaches by Green et al. [9] and Kourtellis et al. [14]. We refer to this algorithm as KWCC, from the authors’s initials.

Nasre et al. [19] compare the distances between each node pair before and after the update and then recompute the dependencies from scratch as in BA (see Section 3). Although this algorithm is faster than recomputation on some graph classes (i.e. when only edge insertions are allowed and the graph is sparse and weighted), it was shown in [3] that its practical performance is much worse than that of the algorithm proposed by Green et al. [9]. This is quite intuitive, since recomputing all dependencies requires time independently of the number of nodes that are actually affected by the insertion.

Pontecorvi and Ramachandran [20] extend existing fully-dynamic APSP algorithms with new data structures to update all shortest paths and then recompute dependencies as in BA. To our knowledge, this algorithm has never been implemented, probably because of the quite complicated data structures it requires. Also, since it recomputes dependencies from scratch as Nasre et al. [19], we expect its practical performance to be similar.

Differently from the other algorithms, the approach by Lee et al. [16] is not based on dynamic APSP algorithms. The idea is to decompose the graph into its biconnected components and then recompute the betweenness values from scratch only for the nodes in the component affected by the update. Although this allows for a smaller memory requirement ( versus needed by the other approaches), the speedups on recomputation reported in [16] are significantly worse than those reported for example by Kourtellis et al. [14].

To summarize, KDB [14] and KWCC [11] are the most promising methods for a comparison with our new algorithm. For this reason, we will describe them in more detail in Section 4 and Section 5 and evaluate them in our experiments.

Since computing betweenness exactly can be too expensive for large networks, several approximation algorithms and heuristics have been introduced in the literature [5, 8, 22, 23] and, recently, also dynamic algorithms that update an approximation of betweenness centrality have been proposed [2, 3, 10, 23]. However, we will not consider them in our experimental evaluation since our focus here is on exact methods.

3 Brandes’s algorithm (Ba)

Betweenness centrality can be easily computed in time by simply applying its definition. In 2001, Brandes proposed an algorithm (BA[6] which requires time for unweighted and for weighted graphs, i.e. the time of computing single-source shortest paths (SSSPs). The algorithm is composed of two parts: the augmented APSP computation phase based on SSSPs and the dependency accumulation phase. As dynamic algorithms based on BA build on these two steps as well, we explain them now in more detail.

Augmented APSP

In this first part, BA needs to perform an augmented APSP, meaning that instead of simply computing distances between all node pairs , it also finds the number of shortest paths and the set of predecessors . This can be done while computing an SSSP from each node (i.e. BFS for unweighted and Dijkstra for weighted graphs). When a node is extracted from the SSSP (priority) queue, BA computes as and as .

Dependency accumulation

Brandes defines the one-side dependency of a node on a node as . It can be proven [6] that

(1)

Intuitively, the term in Eq. (1) represents the contribution of the sub-DAG (of the SSSP DAG of ) rooted in to the betweenness of , whereas the term is the contribution of itself. For all nodes such that (i.e. the nodes that have no successors), we know that . Starting from these nodes, we can compute by “walking up” the SSSP DAG rooted in , using Eq. (1). Notice that it is fundamental that we process the nodes in order of decreasing distance from , because to correctly compute , we need to know for all successors of . This can be done by inserting the nodes into a stack as soon as they are extracted from the SSSP (priority) queue in the first step. The betweenness of is then simply computed as .

4 Dynamic augmented APSP

As mentioned in Section 3, also dynamic algorithms based on BA build on its two steps. In the following, we will see how KDB [14] and KWCC [11] update the augmented APSP data structures (i.e. distances and number of shortest paths) after an edge insertion or a weight decrease. One difference between these two approaches is that KDB does not store the predecessors explicitly, whereas KWCC does. However, since in [14] it was shown that keeping track of the predecessors only introduces overhead, we report a slightly-modified version of KWCC that recomputes them “on the fly” when needed (we will also use this version in our experiments in Section 7). We will then introduce our new approach in Section 4.3.

4.1 Algorithm by Kourtellis et al. (Kdb)

Let be the new edge inserted into (we recall that KDB works only on unweighted graphs, so edge weight modifications are not supported). For each source node , there are three possibilities: , and (in case and , let us assume that without loss of generality). We recall that is the distance before the edge insertion.

In the first case, it is easy to see that the insertion does not affect any shortest path rooted in , and therefore nothing needs to be updated for .

In case , the distance between and the other nodes is not affected, since there already existed an alternative shortest-path from to . However, the insertion creates new shortest paths from to to and consequently to all the nodes in the sub-DAG (of the SSSP DAG from ) rooted in . To account for this, for each of these nodes , we add to the old value of (where is the number of new shortest paths between and going through ).

Finally, in case , a part of the sub-DAG rooted in might get closer to . This case is handled with a BFS traversal rooted in . In the traversal, all neighbors of nodes extracted from the BFS queue are examined and all the ones such that are also enqueued. For each traversed node , the new distance is computed as and the number of shortest paths as .

4.2 Algorithm by Kas et al. (Kwcc)

Figure 1: Insertion of .

KWCC updates the augmented APSP based on a dynamic APSP algorithm by Ramalingam and Reps [21]. Instead of checking for each source whether the new edge (or the weight decrease) changes the SSSP DAG rooted in , KWCC first identifies the affected sources . These are exactly the nodes for which there is some change in the SSSP DAG. The affected sources are identified by running a pruned BFS rooted in on transposed (i.e. the graph obtained by reversing the direction of edges in ). For each node traversed in the BFS, KWCC checks whether the neighbors of are also affected sources and, if not, it does not continue the traversal from them. Notice that even on weighted graphs, a (pruned) BFS is sufficient since we already know all distances to and we can basically sidestep the use of a priority queue.

Once all affected sources are identified, KWCC starts a pruned BFS rooted in for each of them. In the pruned BFS, only nodes such that are traversed (the affected targets of ). The new distance is set to and the new number of shortest paths is set to as in KDB. Compared to KDB, the augmented APSP update of KWCC requires fewer operations. First, it efficiently identifies the affected sources instead of checking all nodes. Second, in case , KDB might traverse more nodes than KWCC. For example, assume is a new edge and the resulting SSSP DAG of is as in Figure 1. Then, KWCC will prune the BFS in , since , skipping all the SSSP DAGs rooted in . On the contrary, KDB will traverse the whole subtree rooted in , although neither the distances nor the number of shortest paths from to those nodes are affected. The reason for this will be made clearer in Section 5.1.

4.3 Faster augmented APSP update

Figure 2: Affected targets (in green) and affected sources ().

To explain our idea for improving the APSP update step, let us start with an example, shown in Figure 2. The insertion of decreases the distance from nodes to all the nodes shown in green. KWCC would first identify the affected sources and, for each of them, run a pruned BFS rooted in . This means we are repeating almost exactly the same procedure for each of the affected sources. We clearly have to update the distances and number of shortest paths between each affected source and the affected targets (and this cannot be avoided). However, KWCC also goes through the outgoing edges of each affected target multiple times, leading to a worst-case running time of .222Notice that this is true also for KDB, with the difference that KDB starts a BFS from each node instead of first identifying the affected sources and that it also visits additional nodes. Our basic idea is to avoid this redundancy and is based on the following proposition (a similar result was proven also in [18]).

Proposition \thetheorem.

Let and be given. Then, .

Proof.

Let be any node in , i.e. either and (case ), or (case ). We want to show that .

Before proving this, we show that has to be in . In fact, if , there have to be shortest paths between and going through , i.e. . On the other hand, we know and thus

(2)

Now, cannot be larger than , or this would mean that , which contradicts the triangle inequality. Also, cannot be smaller than by definition of distance. Thus, . If we substitute this in Eq. (2), we obtain , which means .

Now, let us consider case . We have two options: either was a predecessor of from also before the edge update, i.e. , or it was not. If it was not, it means , which implies and thus . If it was, we can similarly show that . Since we have seen before that , there has to be at least one new shortest path from to in going through , which means and therefore .

Case can be easily proven by contradiction. We know (by the triangle inequality) and that . Thus, if it were true that then

(3)

which contradicts our hypothesis that (case ). Thus, . Since pairwise distances in can only be equal to or shorter than pairwise distances in , implies and thus .

In particular, this implies that for each . Consequently, it is sufficient to compute and once via two pruned BFSs. Our approach is described in Algorithm 1. The pruned BFS to compute is performed in Line 1. Then, a pruned BFS from is executed, whereby for each we store one of its predecessors in the BFS (Line 1).

Let be the length of a shortest path between and going through , i.e. . To finally compute all that is left to do is to test whether for each once we remove from the queue (Lines 1 - 1). Note that this implies that was already computed. In case , the path from to via edge is shorter than before and therefore we set to and to , since all new shortest paths now go through ). Also in case of equality (), is in , since its number of shortest paths has changed. Consequently we set to (since in this case also old shortest paths are still valid). If , the edge does not lie on any shortest path from to , hence (and is not added to in Lines 1 - 1).

Input : Graph , edge insertion/weight decrease , ,
Output : Updated ,
Assume : Initially and
1 ;
2 if  then
3        findAffectedSources();
4        ;
5        ;
6        ;
7        ;
8        ;
9        while  do
10               ;
11               foreach  do
12                      if  then
13                             if  then
14                                    ;
15                                    ;
16                                   
17                             end if
18                            ;
19                             if  then
20                                    ;
21                                   
22                             end if
23                            
24                      end if
25                     
26               end foreach
27               foreach  s.t.  do
28                      if not and  then
29                             ;
30                             ;
31                             ;
32                            
33                      end if
34                     
35               end foreach
36              
37        end while
38       
39 end if
Algorithm 1 Augmented APSP update

5 Dynamic dependency accumulation

After updating distances and number of shortest paths, dynamic algorithms need to update the betweenness scores. This means increasing the score of all nodes that lie in new shortest paths, but also decreasing that of nodes that used to be in old shortest paths between affected nodes. Again, we will first see how KDB and KWCC update the dependencies and then we will present our new approach in Section 5.3.

5.1 Algorithm by Kourtellis et al. (Kdb)

In addition to and , KDB keeps track of the old dependencies . The dependency update is done in a way similar to BA (see Section 3). Also in this case, nodes are processed in decreasing order of their new distance from (otherwise it would not be possible to apply Eq. (1)). However, in this case we would only like to process nodes for which the dependency has actually changed. To do this, while still making sure that the nodes are processed in the right order, KDB replaces the stack used in BA with a bucket list. Every node that is traversed during the APSP update is inserted into the bucket list in a position equal to its new distance from s. Then, nodes are extracted from the bucket list starting from the ones with maximum distance. Every time a node is extracted, we compute its new dependency as . Since we are processing the nodes in order of decreasing new distance, we can be sure that is computed correctly. The score of is then updated by adding the new dependency and subtracting the old , which was previously stored. Also, all neighbors that are not in the bucket list yet are inserted at level . Notice that, in the example in Figure 1, all the nodes in the sub-DAG of are necessary to compute the new dependency of , although they have not been affected by the insertion. This is why they are traversed during the APSP update.

5.2 Algorithm by Kas et al. (Kwcc)

KWCC does not store dependencies. On the contrary, for every node pair for which either or has been affected by the insertion, all the nodes in the new shortest paths and the ones in the old shortest paths between and are processed. More specifically, starting from , all the nodes are inserted into a queue. When a node is extracted, we increase its betweenness by (i.e. the fraction of shortest paths between and going through ). Then, also enqueues all nodes in and the process is repeated until we reach . Decreasing the betweenness of nodes in the old paths is done in a similar fashion, with the only difference that nodes in are enqueued (instead of nodes in ) and that is subtracted from the scores of processed nodes. Notice that the worst-case complexity of this approach is , whereas that of KDB is . This cubic running time is due to the fact that, for each affected node pair (at most , there could be up to nodes lying in either one of the old or new shortest paths between and . (In the running time analysis of [14], this is represented by the term .) This means that, if many nodes are affected, KWCC can even be slower than recomputation with BA. On the other hand, we have seen in Section 4.2 that KDB also processes nodes for which the betweenness has not changed (see Figure 1 and its explaination), which in some cases might result in a higher running time than KWCC.

5.3 Faster betweenness update

We propose a new approach for updating the betweenness scores. As KWCC, we do not store the old dependencies (resulting in a lower memory requirement) and we only process the nodes whose betweenness has actually been affected. However, we do this by accumulating contributions of nodes only once for each affected source, in a fashion similar to KDB. For an affected source and for any node , let us define as . This is the contribution of nodes whose old shortest paths from went through , but which have been affected by the edge insertion. Analogously, we can define as . Then, the new dependency can be expressed as:

(4)

Notice that for all nodes , and , therefore their contribution to is not affected by the edge update. The new betweenness can then be computed as . The following theorem allows us to compute and efficiently. {theorem} For any :

Similarly:

Proof.

We prove only the equation for , the one for can be proven analogously. Let t be any node in , . Then, can be rewritten as , where is the number of shortest paths between and going through both and . Then:

Now, of the paths from to , there are many that also go through . Therefore, for , there are shortest paths from to containing both and , i.e. . On the other hand, if , is simply . Therefore, we can rewrite the equation above as:

Theorem 5.3 allows us to accumulate the dependency changes in a way similar to BA. To compute , we need to process nodes in decreasing order of , whereas to compute we need to process them in decreasing order of . To do this, we use two priority queues and (if the graph is unweighted, we can use bucket lists as the ones used in KDB). Notice that nodes such that do not need to be added to the queue. and are filled with all nodes in during the APSP update in Algorithm 1. In , nodes are inserted with priority and with priority . Algorithm 2 shows how we decrease betweenness of nodes that lied in old shortest paths from (notice that this is repeated for each ). In Lines 2 - 2, Theorem 5.3 is applied to compute for each predecessor of . Then, is also enqueued and this is repeated until is empty (i.e. when we reach ). The betweenness update of nodes in the new shortest paths works in a very similar way. The only difference is that is used instead of , that and are used instead of and and that is added to and not subtracted in Line 2. At the end of the update, is set to and is set to .

In undirected graphs, we can notice that . Thus, to account also for the changes in the shortest paths between and the nodes in , is subtracted from in Line 2 (and analogously is added in the update of nodes in the new shortest paths).

1 ;
2 while  do
3        .extractMax();
4        ;
5        foreach  s.t.  do
6               if  and  then
7                      if  then
8                             ;
9                            
10                      end if
11                     else
12                             ;
13                            
14                      end if
15                      if  then
16                             Insert into with priority ;
17                            
18                      end if
19                     ;
20                     
21               end if
22              
23        end foreach
24       
25 end while
Algorithm 2 Betweenness update for nodes in old shortest paths

6 Time complexity

Let us study the complexity of our two new algorithms for updating APSP and betweenness scores described in Section 4.3 and Section 5.3, respectively. We define the extended size of a set of nodes as the sum of the number of nodes in and the number of edges that have a node of as their endpoint. Then, the following holds. {theorem} The running time of Algorithm 1 for updating the augmented APSP after an edge insertion (or weight decrease) is , where can be any node in .

Proof.

The function findAffectedSources in Line 1 identifies the set of affected sources starting a BFS in and visiting only the nodes . This takes , since this pruned BFS visits all nodes in and their incident edges. Then, the while loop of Lines 11 identifies all the affected targets with a pruned BFS. This part (excluding Lines 11) requires operations, since all affected targets and their incident edges are visited. In Lines 11, for each affected node , all the affected sources of the predecessor of are scanned. This part requires in total operations. ∎

Notice that, since is and both and are , the worst-case complexity of Algorithm 1 is . To show the complexity of the dependency update described in Algorithm 2, let us introduce, for a given source node , the set . Then, the following theorem holds. {theorem} The running time of Algorithm 2 is for weighted graphs and for unweighted graphs.

Proof.

In the following, we assume a binary heap priority queue for weighted graphs and a bucket list priority queue for unweighted graphs. Then, the extractMax() operation in Line 2 requires constant time for unweighted and logarithmic time for weighted graphs. Also, for each node extracted from , all neighbors are visited in Lines 2 - 2. Therefore, it is sufficient to prove that the set of nodes inserted into (and therefore extracted from) is exactly . As we said in the description of Algorithm 2, is initially populated with the nodes in . Then, all nodes inserted into in Line 2 are nodes that lied in at least one shortest path between and a node in before the insertion. This means that there is at least one such that , which implies that , by definition of . ∎

The running time necessary to increase the betweenness score of nodes such that can be computed analogously, defining . Overall, the running time of the betweenness update score described in Section 5.3 is for unweighted and for weighted graphs. Consequently, in the worst case, this is for unweighted and for weighted graphs, which matches the running time of BA. For sparse graphs, this is asymptotically faster than KWCC, which requires operations in the worst case.

7 Experimental Results

Implementation and settings

For our experiments, we implemented BA, KDB, KWCC, and our new approach, which we refer to as iBet (from Incremental Betweenness). All the algorithms were implemented in C++, building on the open-source NetworKit framework [24]. All codes are sequential; they were executed on a 64bit machine with 2 x 8 Intel(R) Xeon(R) E5-2680 cores at 2.7 GHz with 256 GB RAM with a single thread on a single CPU.

Data sets and experimental design

For our experiments, we consider a set of real-world networks belonging to different domains, taken from SNAP [17], KONECT [15], and LASAGNE (piluc.dsi.unifi.it/lasagne). Since KDB cannot handle weighted graphs and the pseudocode given in [14] is only for undirected graphs, all graphs used in the experiments are undirected and unweighted. The networks are reported in Table 1. Due to the time required by the static algorithm and the memory constraints of all dynamic algorithms (), we only considered networks with up to about 26000 nodes.

To simulate real edge insertions, we remove an existing edge from the graph (chosen uniformly at random), compute betweenness on the graph without the edge and then re-insert the edge, updating betweenness with the incremental algorithms (and recomputing it with BA). For all networks, we consider 100 edge insertions and report the average over these 100 runs.

Experimental results

Speedup on BA
Graph Nodes Edges Type BA [s] iBet KDB KWCC
HC-BIOGRID 4 039 10 321 bio. network 6.06 77.87 10.91 18.33
Mus-musculus 4 610 5 747 bio. network 3.32 119.23 9.40 11.21
Caenor-elegans 4 723 9 842 metabolic 5.12 130.89 9.58 23.64
ca-GrQc 5 241 14 484 coauthorship 4.19 206.55 7.53 14.28
advogato 7 418 42 892 social 14.65 295.39 27.69 18.45
hprd-pp 9 465 37 039 bio. network 30.29 304.24 11.33 45.90
ca-HepTh 9 877 25 973 coauthorship 21.06 199.04 8.24 34.03
dr-melanogaster 10 625 40 781 bio. network 40.76 235.54 7.94 48.57
oregon1-010526 11 174 23 409 aut. systems 24.43 237.47 15.20 21.64
oregon2-010526 11 461 32 730 aut. systems 30.07 113.10 17.23 23.08
Homo-sapiens 13 690 61 130 bio. network 68.58 237.61 10.29 58.67
GoogleNw 15 763 148 585 hyperlinks 90.42 577.49 90.01 33.80
dip20090126 19 928 41 202 bio. network 115.56 51.54 5.38 5.73
as-caida20071105 26 475 53 381 aut. systems 154.36 173.90 18.66 19.65
Geometric mean 179.1 13.0 22.9
Table 1: The table shows the average time taken by the static algorithm BA and the average speedups on BA of the incremental algorithms (geometric means). The best result of each row is shown in bold font.

In Table 1 the running times of BA for each graph and the speedups of the three incremental algorithms on BA are reported. The last line shows the geometric mean of the speedups on BA over all tested networks. Our new method iBet clearly outperforms the other two approaches and is always faster than both of them. On average, iBet is faster than BA by a factor 179.1, whereas KDB by a factor 13.0 and KWCC by a factor 22.9.

Figure 3 compares the APSP update (on the left) and dependency update (on the right) steps for the oregon1-010526 graph (a similar behavior was observed also for the other graphs of Table 1. On the left, the running time of the APSP update phase of the three incremental algorithms on 100 edge insertions are reported, sorted by the running time taken by KDB. It is clear that the APSP update of iBet is always faster than the competitors. This is due to the fact that iBet processes the edges between the affected targets only once instead of doing it once for each affected source as both KDB and KWCC. Also, the running time of the APSP update of KDB varies significantly. On about one third of the updates, it is basically as fast as KWCC. This means that in these cases, KDB only visits a small amount of nodes in addition to the affected ones (see Figure 1 and its explanation). However, in other cases KDB can be much slower, as shown in the figure.

On the right of Figure 3, the running times of the dependency update step are reported. Also for this step, iBet is faster than both KDB and KWCC. However, for this part there is not a clear winner between KWCC and KDB. In fact, in some cases KDB needs to process additional nodes in order to recompute dependencies, whereas KWCC only processes nodes in the shortest paths between affected nodes. However, KDB processes each node at most once for each source node , whereas KWCC might process the same node several times if it lies in several shortest paths between and other nodes (we recall that the worst-case running time of KWCC is , whereas that of KDB is ). Notice also that in some rare cases KDB is slightly faster than iBet in the dependency update. This is probably due to the fact that our implementation of iBet is based on a priority queue, whereas KDB on a bucket list.

Figure 3: Running times of iBet, KDB and KWCC for 100 edge updates on oregon1-010526. Left: times for the APSP update step. Right: times for the dependency update step.
Figure 4: Left: Running times of iBet, KDB, KWCC and BA on the oregon1-010526 graph for 100 edge updates. Right: Average speedups on recomputation with BA (geometric mean) over all networks of Table 1 for the three incremental algorithms. The column on the left shows the speedup of the complete update, the one in the middle the speedup of the APSP update only and the one on the right the speedup of the dependency update only.

Figure 4 on the left reports the total running times of iBet, KDB, KWCC and BA on oregon1-010526. Although the running times vary significantly among the updates, iBet is always the fastest among all algorithms. On the contrary, there is not always a clear winner between KDB and KWCC. On the right, Figure 4 shows the geometric mean of the speedups on recomputation for the three incremental algorithms, considering the complete update, the APSP update step only and the dependency update step only, respectively. iBet is the method with the highest speedup both overall and on the APSP update and dependency update steps separately, meaning that each of the improvements described in Section 4.3 and Section 5.3 contribute to the final speedup. On average, iBet is a factor 82.7 faster than KDB and a factor 28.5 faster than KWCC on the APSP update step and it is a factor 9.4 faster than KDB and a factor 4.9 faster than KWCC on the dependency update step. Overall, the speedup of iBet on KDB ranges from 6.6 to 29.7 and is on average (geometric mean of the speedups) 14.7 times faster. The average speedup on KWCC is 7.4, ranging from a factor 4.1 to a factor 16.0.

8 Conclusions and future work

Computing betweenness centrality is a problem of great practical relevance. In this paper we have proposed and evaluated new techniques for the betweenness update after the insertion (or weight decrease) of an edge. Compared to other approaches, our new algorithm is easy to implement and significantly reduces the number of operations of both the APSP update and the dependency update. Our experiments on real-world networks show that our approach outperforms existing methods, on average approximately by one order of magnitude.

Future work might include parallelization for further acceleration. Furthermore, we plan to extend our techniques also to the decremental case (where an edge can be deleted from the graph or its weight can be increased) and to batch updates, where several edge updates might occur at the same time.

Although dynamic betweenness algorithms can be much faster than recomputation, a major limitation for their scalability is their memory requirement of . An interesting research direction is the design of scalable dynamic algorithms with a smaller memory footprint.

Our implementations are based on NetworKit [24], the open-source framework for network analysis, and we will publish our source code in upcoming releases of the package.

References

  • [1] D. C. Bell, J. S. Atkinson, and J. W. Carlson. Centrality measures for disease transmission networks. Social Networks, 21(1):1–21, 1999.
  • [2] E. Bergamini and H. Meyerhenke. Approximating betweenness centrality in fully dynamic networks. Internet Mathematics, 12(5):281–314, 2016.
  • [3] E. Bergamini, H. Meyerhenke, and C. Staudt. Approximating betweenness centrality in large evolving networks. In 17th Workshop on Algorithm Enginnering and Experiments, ALENEX 2015, pages 133–146. SIAM, 2015.
  • [4] P. Boldi and S. Vigna. Axioms for centrality. Internet Mathematics, 10(3-4):222–262, 2014.
  • [5] M. Borassi and E. Natale. KADABRA is an adaptive algorithm for betweenness via random approximation. In 24th Annual European Symposium on Algorithms, ESA 2016, volume 57 of LIPIcs, pages 20:1–20:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.
  • [6] U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25:163–177, 2001.
  • [7] P. Crescenzi, G. D’Angelo, L. Severini, and Y. Velaj. Greedily improving our own centrality in A network. In Experimental Algorithms - 14th International Symposium, SEA 2015, Proceedings, volume 9125 of Lecture Notes in Computer Science, pages 43–55. Springer, 2015.
  • [8] R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In 10th Workshop on Algorithm Engineering and Experiments (ALENEX ’08), pages 90–100. SIAM, 2008.
  • [9] O. Green, R. McColl, and D. A. Bader. A fast algorithm for streaming betweenness centrality. In SocialCom/PASSAT, pages 11–20. IEEE, 2012.
  • [10] T. Hayashi, T. Akiba, and Y. Yoshida. Fully dynamic betweenness centrality maintenance on massive networks. Proceedings of 41st International Conference on Very Large Data Bases (PVLDB 2015), 9(2):48–59, 2015.
  • [11] M. Kas, M. Wachs, K. M. Carley, and L. R. Carley. Incremental algorithm for updating betweenness centrality in dynamically growing networks. In Advances in Social Networks Analysis and Mining 2013 (ASONAM ’13), pages 33–40. ACM, 2013.
  • [12] C. Kiss and M. Bichler. Identification of influencers – measuring influence in customer networks. Decision Support Systems, 46(1):233 – 253, 2008.
  • [13] D. Koschützki, K. A. Lehmann, L. Peeters, S. Richter, D. Tenfelde-Podehl, and O. Zlotowski. Centrality indices. In Network Analysis, volume 3418 of LNCS, pages 16–61. Springer Berlin Heidelberg, 2005.
  • [14] N. Kourtellis, G. De Francisci Morales, and F. Bonchi. Scalable online betweenness centrality in evolving graphs. Knowledge and Data Engineering, IEEE Transactions on, PP(99):1–1, 2015.
  • [15] J. Kunegis. KONECT: the koblenz network collection. In 22nd International World Wide Web Conference, WWW ’13, pages 1343–1350, 2013.
  • [16] M. Lee, S. Choi, and C. Chung. Efficient algorithms for updating betweenness centrality in fully dynamic graphs. Information Sciences, 326:278–296, 2016.
  • [17] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
  • [18] C.-C. Lin and R.-C. Chang. On the dynamic shortest path problem. Journal of Information Processing, 13(4):470–476, Apr. 1991.
  • [19] M. Nasre, M. Pontecorvi, and V. Ramachandran. Betweenness centrality - incremental and faster. In Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, volume 8635 of Lecture Notes in Computer Science, pages 577–588. Springer, 2014.
  • [20] M. Pontecorvi and V. Ramachandran. Fully dynamic betweenness centrality. In Algorithms and Computation - 26th International Symposium, ISAAC 2015, Proceedings, volume 9472 of Lecture Notes in Computer Science, pages 331–342. Springer, 2015.
  • [21] G. Ramalingam and T. W. Reps. On the computational complexity of dynamic graph problems. Theoretical Computer Science, 158(1&2):233–277, 1996.
  • [22] M. Riondato and E. M. Kornaropoulos. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery, 30(2):438–475, 2016.
  • [23] M. Riondato and E. Upfal. ABRA: approximating betweenness centrality in static and dynamic graphs with rademacher averages. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pages 1145–1154. ACM, 2016.
  • [24] C. L. Staudt, A. Sazonovs, and H. Meyerhenke. NetworKit: A tool suite for high-performance network analysis. Network Science, To appear.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
11470
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description