Fully-dynamic Approximation of Betweenness Centrality

Fully-dynamic Approximation
of Betweenness Centrality

Elisabetta Bergamini Institute of Theoretical Informatics
Karlsruhe Institute of Technology (KIT), Germany
Email: 11email: {elisabetta.bergamini, meyerhenke} @ kit.edu
   Henning Meyerhenke Institute of Theoretical Informatics
Karlsruhe Institute of Technology (KIT), Germany
Email: 11email: {elisabetta.bergamini, meyerhenke} @ kit.edu
Abstract

Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in evolving networks. In previous work we proposed the first semi-dynamic algorithms that recompute an approximation of betweenness in connected graphs after batches of edge insertions.

In this paper we propose the first fully-dynamic approximation algorithms (for weighted and unweighted undirected graphs that need not to be connected) with a provable guarantee on the maximum approximation error. The transfer to fully-dynamic and disconnected graphs implies additional algorithmic problems that could be of independent interest. In particular, we propose a new upper bound on the vertex diameter for weighted undirected graphs. For both weighted and unweighted graphs, we also propose the first fully-dynamic algorithms that keep track of this upper bound. In addition, we extend our former algorithm for semi-dynamic BFS to batches of both edge insertions and deletions.

Using approximation, our algorithms are the first to make in-memory computation of betweenness in fully-dynamic networks with millions of edges feasible. Our experiments show that they can achieve substantial speedups compared to recomputation, up to several orders of magnitude.
Keywords: betweenness centrality, algorithmic network analysis, fully-dynamic graph algorithms, approximation algorithms, shortest paths

1 Introduction

The identification of the most central nodes of a network is a fundamental problem in network analysis. Betweenness centrality (BC) is a well-known index that ranks the importance of nodes according to their participation in shortest paths. Intuitively, a node has high BC when it lies on many shortest paths between pairs of other nodes. Formally, BC of a node is defined as , where is the number of nodes, is the number of shortest paths between two nodes and and is the number of these paths that go through node . Since it depends on all shortest paths, the exact computation of BC is expensive: the best known algorithm [4] is quadratic in the number of nodes for sparse networks and cubic for dense networks, prohibitive for networks with hundreds of thousands of nodes. Many graphs of interest, however, such as web graphs or social networks, have millions or even billions of nodes and edges. For this reason, approximation algorithms [5, 8, 1] must be used in practice. In addition, many large graphs of interest evolve continuously, making the efficient recomputation of BC a necessity. In a previous work, we proposed the first two approximation algorithms [3] (IA for unweighted and IAW for weighted graphs) that can efficiently recompute the approximate BC scores after batches of edge insertions or weight decreases. IA and IAW are the only semi-dynamic algorithms that can actually be applied to large networks. The algorithms build on RK [18], a static algorithm with a theoretical guarantee on the quality of the approximation, and inherit this guarantee from RK. However, IA and IAW target a relatively restricted configuration: only connected graphs and edge insertions/weight decreases.

Our contributions.

In this paper we present the first fully-dynamic algorithms (handling edge insertions, deletions and arbitrary weight updates) for BC approximation in weighted and unweighted undirected graphs. Our algorithms extend the semi-dynamic ones we presented in [3], while keeping the theoretical guarantee on the maximum approximation error. The transfer to fully-dynamic and disconnected graphs implies several additional problems compared to the restricted case we considered previously [3]. Consequently, we present the following intermediate results, all of which could be of independent interest. (i) We propose a new upper bound on the vertex diameter (i. e. number of nodes in the shortest path(s) with the maximum number of nodes) for weighted undirected graphs. This can improve significantly the one used in the RK algorithm [18] if the network’s weights vary in relatively small ranges (from the size of the largest connected component to at most twice the vertex diameter times the ratio between the maximum and the minimum edge weights). (ii) For both weighted and unweighted graphs, we present the first fully-dynamic algorithm for updating an approximation of , which is equivalent to the diameter in unweighted graphs. (iii) We extend our previous semi-dynamic BFS algorithm [3] to batches of both edge insertions and deletions. In our experiments, we compare our algorithms to recomputation with RK on both synthetic and real dynamic networks. Our results show that our algorithms can achieve substantial speedups, often several orders of magnitude on single-edge updates and are always faster than recomputation on batches of more than 1000 edges.

2 Related work

2.1 Overview of algorithms for computing BC

The best static exact algorithm for BC (BA) is due to Brandes [4] and requires operations for unweighted graphs and for graphs with positive edge weights. The algorithm computes a single-source shortest path (SSSP) search from every node in the graph and adds to the BC score of each node the fraction of shortest paths that go through . Several static approximation algorithms have been proposed that compute an SSSP search from a set of randomly chosen nodes and extrapolate the BC scores of the other nodes [5, 8, 1]. The static approximation algorithm by Riondato and Kornaropoulos (RK[18] samples a set of shortest paths and adds a contribution to each node in the sampled paths. This approach allows a theoretical guarantee on the quality of the approximation and will be described in Section 2.2. Recent years have seen the publication of a few dynamic exact algorithms [14, 10, 12, 11, 16, 9]. Most of them store the previously calculated BC values and additional information, like the distance of each node from every source, and try to limit the recomputation to the nodes whose BC has actually been affected. All the dynamic algorithms perform better than recomputation on certain inputs. Yet, none of them is in general better than BA. In fact, they all require updating an all-pairs shortest paths (APSP) search, for which no algorithm has an improved worst-case complexity compared to the best static algorithm [19]. Also, the scalability of the dynamic exact BC algorithms is strongly compromised by their memory requirement of . To overcome these problems, we presented two algorithms that efficiently recompute an approximation of the BC scores instead of their exact values [3]. The algorithms have shown significantly high speedups compared to recomputation with RK and a good scalability, but they are limited to connected graphs and batches of edge insertions/weight decreases (see Section 2.3).

2.2 Rk algorithm

The static approximation algorithm RK [18] is the foundation for the incremental approach we presented in [3] and our new fully-dynamic approach. RK samples a set of shortest paths between randomly-chosen source-target pairs . Then, RK computes the approximated betweenness of a node as the fraction of sampled paths that go through , by adding to ’s score for each of these paths. In each of the iterations, the probability of a shortest path to be sampled is . The number of samples required to approximate the BC scores with the given error guarantee is , where and are constants in and . Then, if shortest paths are sampled according to , with probability at least the approximations are within from their exact value: To sample the shortest paths according to , RK first chooses a source-target node pair uniformly at random and performs a shortest-path search (Dijkstra or BFS) from to , keeping also track of the number of shortest paths between and and of the list of predecessors (i. e. the nodes that immediately precede in the shortest paths between and ) for any node between and . Then one shortest path is selected: starting from , a predecessor is selected with probability . The sampling is repeated iteratively until node is reached.

Approximating the vertex diameter.

RK uses two upper bounds on that can be both computed in . For unweighted undirected graphs, it samples a source node for each connected component of , computes a BFS from each and sums the two shortest paths with maximum length starting in . The approximation is the maximum of these sums over all components. For weighted graphs, RK approximates with the size of the largest connected component, which can be a significant overestimation for complex networks, possibly of orders of magnitude. In this paper, we present a new approximation for weighted graphs, described in Section 3.

2.3 Ia and Iaw algorithms

IA and IAW are the incremental approximation algorithms (for unweighted and weighted graphs, respectively) that we presented previously [3]. The algorithms are based on the observation that if only edge insertions are allowed and the graph is connected, cannot increase, and therefore also the number of samples required by RK for the theoretical guarantee. Instead of recomputing new shortest paths after a batch of edge insertions, IA and IAW replace each old shortest path with a new shortest path between the same node pair . In IAW the paths are recomputed with a slightly-modified T-SWSF [2], whereas IA uses a new semi-dynamic BFS algorithm. The BC scores are updated by subtracting to the BC of the nodes in the old path and adding to the BC of nodes in the new shortest path.

2.4 Batch dynamic SSSP algorithms

Dynamic SSSP algorithms recompute distances from a source node after a single edge update or a batch of edge updates. Algorithms for the batch problem have been published [17, 7, 2] and compared in experimental studies [2, 6]. The experiments show that the tuned algorithm T-SWSF presented in [2] performs well on many types of graphs and edge updates. For batches of only edge insertions in unweighted graphs, we developed an algorithm asymptotically faster than T-SWSF [3]. The algorithm is in principle similar to T-SWSF, but has an improved complexity thanks to different data structures.

3 New approximation for weighted graphs

Let be an undirected graph. For simplicity, let be connected for now. If it is not, we compute an approximation for each connected component and take the maximum over all the approximations. Let be an SSSP tree from any source node . Let denote a shortest path between and in and let denote a shortest path between and in . Let be the number of nodes in and be the distance between and in , and analogously for and . Let and be the maximum and minimum edge weights, respectively. Let and be the nodes with maximum distance from , i. e. .

We define the approximation . Then:

Proposition 1

. (Proof in Section B.1, Appendix)

To obtain the upper bound , we can simply compute an SSSP search from any node , find the two nodes with maximum distance and perform the remaining calculations. Notice that extends the upper bound proposed for RK [18] for unweighted graphs: When the graph is unweighted and thus , becomes equal to the approximation used by RK. Complex networks are often characterized by a small diameter and in networks like coauthorship, friendship, communication networks, and can be several order of magnitude smaller than the size of the largest component. This translates into a substantially improved approximation.

4 New fully-dynamic algorithms

Overview.

We propose two fully-dynamic algorithms, one for unweighted (DA, dynamic approximation) and one for weighted (DAW, dynamic approximation weighted) graphs. Similarly to IA and IAW, our new fully-dynamic algorithms keep track of the old shortest paths and substitute them only when necessary. However, if is not connected or edge deletions occur, can grow and a simple substitution of the paths is not sufficient anymore. Although many real-world networks exhibit a shrinking-diameter behavior [15], to ensure our theoretical guarantee, we need to keep track of over time and sample new paths in case increases. The need for an efficient update of augments significantly the difficulty of the fully-dynamic problem, as well as the necessity to recompute the SSSPs after batches of both edge insertions and deletions. The building block for the BC update are basically two: a fully-dynamic algorithm that updates distances and number of shortest paths from a certain source node (SSSP update) and an algorithm that keeps track of a approximation for each connected component of . The following paragraphs give an overview of such building blocks, which could be of independent interest. The last paragraph outlines the dynamic BC approximation algorithm. Due to space constraints, a detailed description of the algorithms as well as the pseudocodes and the omitted proofs can be found in the Appendix.

SSSP update in weighted graphs.

Our SSSP update is based on T-SWSF [2], which recomputes distances from a source node after a batch of weight updates (or edge insertions/deletions). For our BC algorithm, we need two extensions of T-SWSF: an algorithm that also recomputes the number of shortest paths between and the other nodes (updateSSSP-W) and one that also updates a approximation for the connected component of (updateApprVD-W). The approximation is computed as described in Section 3. Thus, updateApprVD-W keeps track of the two maximum distances and from and the minimum edge weight . We call affected nodes the nodes whose distance (or also whose number of shortest paths, in updateSSSP-W) from has changed as a consequence of . Basically, the idea is to put the set of affected nodes into a priority queue with priority equal to the candidate distance of . When is extracted, if there is actually a path of length from to , the new distance of is set to , otherwise is reinserted into with a higher candidate distance. In both cases, the affected neighbors of are inserted into . In updateApprVD-W, and are recomputed while updating the distances and is updated while scanning . In updateSSSP-W, the number of shortest paths of is recomputed as the sum of the of the new predecessors of .

Let represent the cardinality of and let represent the sum of the nodes in and of the edges that have at least one endpoint in . Then, the following complexity derives from feeding with the batch and inserting into/extracting from the affected nodes and their neighbors.

Lemma 1

The time required by updateApprVD-W (updateSSSP-W) to update the distances and (the number of shortest paths) is .

SSSP update in unweighted graphs.

For unweighted graphs, we basically replace the priority queue of updateApprVD-W and updateSSSP-W with a list of queues, as the one we used in [3] for the incremental BFS. Each queue represents a level from 0 (which only the source belongs to) to the maximum distance . The levels replace the priorities and also in this case represent the candidate distances for the nodes. In order not to visit a node multiple times, we use colors to distinguish the unvisited nodes from the visited ones. The replacement of the priority queue with the list of queues decreases the complexity of the SSSP update algorithms for unweighted graphs, that we call updateApprVD-U and updateSSSP-U, in analogy with the ones for weighted graphs.

Lemma 2

The time required by updateApprVD-U (updateSSSP-U) to update the distances and (the number of shortest paths) is , where is the maximum distance from reached during the update.

Fully-dynamic approximation.

The algorithm keeps track of a approximation for the whole graph , i. e. for each connected component of . It is composed of two phases. In the initialization, we compute an SSSP from a source node for each connected component . During the SSSP search from , we also compute a approximation for , as described in Sections 2.2 and 3. In the update, we recompute the SSSPs and the approximations with updateApprVD-W (or updateApprVD-U). Since components might split or merge, we might need to compute new approximations, in addition to update the old ones. To do this, for each node, we keep track of the number of times it has been visited. This way we discard source nodes that have already been visited and compute a new approximation for components that have become unvisited. The complexity of the update of the approximation derives from the update in the single components, using updateApprVD-W and updateApprVD-U.

Theorem 4.1

The time required to update the approximation is in weighted graphs and in unweighted graphs, where is the number of components in before the update and is the sum of affected nodes in and their incident edges.

Dynamic BC approximation.

Let be an undirected graph with connected components. Now that we have defined our building blocks, we can outline a fully-dynamic BC algorithm: we use the fully dynamic approximation to recompute after a batch, we update the sampled paths with updateSSSP and, if (and therefore ) increases, we sample new paths. However, since updateSSSP and updateApprVD share most of the operations, we can “merge” them and update at the same time the shortest paths from a source node and the approximation for the component of . We call such hybrid function updateSSSPVD. Instead of storing and updating SSSPs for the approximation and SSSPs for the BC scores, we recompute a approximation for each of the samples while recomputing the shortest paths with updateSSSPVD. This way we do not need to compute an additional SSSP for the components covered by sampled paths (i. e. in which the paths lie), saving time and memory. Only for components that are not covered by any of them (if they exist), we compute and store a separate approximation. We refer to such components as (and to as ).

1 applyBatch();
2 for  to  do
3        updateSSSPVD();
        replacePath() ;
         /* update of BC scores */
4       
5 end for
6 foreach  do
7        updateApprVD();
8       
9 end foreach
10 foreach unvisited  do
11        add to ;
12        initApprVD();
13       
14 end foreach
15 ;
16 ;
17 if  then
        sampleNewPaths() ;
         /* update of BC scores */
18        foreach  do
               ;
                /* renormalization of BC scores */
19              
20        end foreach
21       ;
22       
23 end if
return
Algorithm 1 BC update after a batch of edge updates

The high-level description of the update after a batch is shown as Algorithm 1. After changing the graph according to (Line 1), we recompute the previous samples and the approximations for their components (Lines 1 - 1). Then, similarly to IA and IAW, we update the BC scores of the nodes in the old and in the new shortest paths. Thus, we update a approximation for the components in (Lines 1 - 1) and compute a new approximation for new components that have formed applying the batch (Lines 1 - 1). Then, we use the results to update the number of samples (Lines 1 - 1). If necessary, we sample additional paths and normalize the BC scores (Lines 1 - 1). The difference between DA and DAW is the way the SSSPs and the approximation are updated: in DA we use updateApprVD-U and in DAW updateApprVD-W. Differently from RK and our previous algorithms IA and IAW, in DA and DAW we scan the neighbors every time we need the predecessors instead of storing them. This allows us to use memory per sample (i. e., in total) instead of per sample, while our experiments show that the running time is hardly influenced. The number of samples depends on , so in theory this can be as large as . However, the experiments conducted in [3] show that relatively large values of (e. g. ) lead to good ranking of nodes with high BC and for such values the number of samples is typically much smaller than , making the memory requirements of our algorithms significantly less demanding than those of the dynamic exact algorithms () for many applications.

Theorem 4.2

Algorithm 7 preserves the guarantee on the maximum absolute error, i. e. naming and the new exact and approximated BC values, respectively, .

Theorem 4.3

Let be the difference between the value of before and after the batch and let be the sum of affected nodes and their incident edges in the -th SSSP. The time required for the BC update in unweighted graphs is . In weighted graphs, it is .

Notice that, if does not increase, and the complexities are the same as the only-incremental algorithms IA and IAW we proposed in [3]. Also, notice that in the worst case the complexity can be as bad as recomputing from scratch. However, no dynamic SSSP (and so probably also no BC approximation) algorithm exists that is faster than recomputation.

5 Experiments

Implementation and settings.

We implement our two dynamic approaches DA and DAW in C++, building on the open-source NetworKit framework [20], which also contains the static approximation RK. In all experiments we fix to 0.1 and to 0.05, as a good tradeoff between running time and accuracy [3]. This means that, with a probability of at least , the computed BC values deviate at most from the exact ones. In our previous experimental study [3], we showed that for such values of and , the ranking error (how much the ranking computed by the approximation algorithm differs from the rank of the exact algorithm) is low for nodes with high betweenness. Since our algorithms simply update the approximation of RK, our accuracy in terms or ranking error does not differ from that of RK (see [3] for details). Also, our experiments in [3] have shown that dynamic exact algorithms are not scalable, because of both time and memory requirements, therefore we do not include them in our tests. The machine used has 2 x 8 Intel(R) Xeon(R) E5-2680 cores at 2.7 GHz, of which we use only one core, and 256 GB RAM.

Graph Type Nodes Edges Type

repliesDigg
communication 30,398 85,155 Weighted
emailSlashdot communication 51,083 116,573 Weighted
emailLinux communication 63,399 159,996 Weighted
facebookPosts communication 46,952 183,412 Weighted
emailEnron communication 87,273 297,456 Weighted
facebookFriends friendship 63,731 817,035 Unweighted
arXivCitations coauthorship 28,093 3,148,447 Unweighted
englishWikipedia hyperlink 1,870,709 36,532,531 Unweighted

Table 1: Overview of real dynamic graphs used in the experiments.

Data sets and experiments.

We concentrate on two types of graphs: synthetic and real-world graphs with real edge dynamics. The real-world networks are taken from The Koblenz Network Collection (KONECT) [13] and are summarized in Table 1. All the edges of the KONECT graphs are characterized by a time of arrival. In case of multiple edges between two nodes, we extract two versions of the graph: one unweighted, where we ignore additional edges, and one weighted, where we replace the set of edges between two nodes with an edge of weight . In our experiments, we let the batch size vary from 1 to 1024 and for each batch size, we average the running times over 10 runs. Since the networks do not include edge deletions, we implement additional simulated dynamics. In particular, we consider the following experiments. (i) Real dynamics. We remove the edges with the highest timestamp from the network and we insert them back in batches, in the order of timestamps. (ii) Random insertions and deletions. We remove edges from the graph, chosen uniformly at random. To create batches of both edge insertions and deletions, we add back the deleted edges with probability and delete other random edges with probability . (iii) Random weight changes. In weighted networks, we choose edges uniformly at random and we multiply their weight by a random value in the interval .

For synthetic graphs we use a generator based on a unit-disk graph model in hyperbolic geometry [21], where edge insertions and deletions are obtained by moving the nodes in the hyperbolic plane. The networks produced by the model were shown to have many properties of real complex networks, like small diameter and power-law degree distribution (see [21] and the references therein). We generate seven networks, with ranging from about to about and approximately equal to .

Speedups.

Figure 1: Speedups of DA on RK in real unweighted networks using real dynamics.
Real Random
Time [s] Speedups Time [s] Speedups
Graph
repliesDigg 0.078 1.028 76.11 5.42 0.008 0.832 94.00 4.76
emailSlashdot 0.043 1.055 219.02 9.91 0.038 1.151 263.89 28.81
emailLinux 0.049 1.412 108.28 3.59 0.051 2.144 72.73 1.33
facebookPosts 0.023 1.416 527.04 9.86 0.015 1.520 745.86 8.21
emailEnron 0.368 1.279 83.59 13.66 0.203 1.640 99.45 9.39
facebookFriends 0.447 1.946 94.23 18.70 0.448 2.184 95.91 18.24
arXivCitations 0.038 0.186 2287.84 400.45 0.025 1.520 2188.70 28.81
englishWikipedia 1.078 6.735 3226.11 617.47 0.877 5.937 2833.57 703.18
Table 2: Times and speedups of DA on RK in unweighted real graphs under real dynamics and random updates, for batch sizes of 1 and 1024.
Figure 2: Speedups of DA on RK in hyperbolic unit-disk graphs.

Figure 1 reports the speedups of DA on RK in real graphs using real dynamics. Although some fluctuations can be noticed, the speedups tend to decrease as the batch size increases. We can attribute fluctuations to two main factors: First, different batches can affect areas of of varying sizes, influencing also the time required to update the SSSPs. Second, changes in the approximation can require to sample new paths and therefore increase the running time of DA (and DAW). Nevertheless, DA is significantly faster than recomputation on all networks and for every tested batch size. Analogous results are reported in Figure 3 of the Appendix for random dynamics. Table 2 summarizes the running times of DA and its speedups on RK with batches of size 1 and 1024 in unweighted graphs, under both real and random dynamics. Even on the larger graphs (arXivCitations and englishWikipedia) and on large batches, DA requires at most a few seconds to recompute the BC scores, whereas RK requires about one hour for englishWikipedia. The results on weighted graphs are shown in Table 3 in Section C in the Appendix. In both real dynamics and random updates, the speedups vary between and for single-edge updates and between and for batches of size 1024. On hyperbolic graphs (Figure 2), the speedups of DA on RK increase with the size of the graph. Table 4 in the Appendix contains the running times and speedups on batches of 1 and 1024 edges. The speedups vary between and for single-edge updates and between and for batches of 1024 edges. The results show that DA and DAW are faster than recomputation with RK in all the tested instances, even when large batches of 1024 edges are applied to the graph. With small batches, the algorithms are always orders of magnitude faster than RK, often with running times of fraction of seconds or seconds compared to minutes or hours. Such high speedups are made possible by the efficient update of the sampled shortest paths, which limit the recomputation to the nodes that are actually affected by the batch. Also, processing the edges in batches, we avoid to update multiple times nodes that are affected by several edges of the batch.

6 Conclusions

Betweenness is a widely used centrality measure, yet expensive if computed exactly. In this paper we have presented the first fully-dynamic algorithms for betweenness approximation (for weighted and for unweighted undirected graphs). The consideration of edge deletions and disconnected graphs is made possible by the efficient solution of several algorithmic subproblems (some of which may be of independent interest). Now BC can be approximated with an error guarantee for a much wider set of dynamic real graphs compared to previous work.

Our experiments show significant speedups over the static algorithm RK. In this context it is interesting to remark that dynamic algorithms require to store additional memory and that this can be a limit to the size of the graphs they can be applied to. By not storing the predecessors in the shortest paths, we reduce the memory requirement from per sampled path to – and are still often more than 100 times faster than RK despite rebuilding the paths.

Future work may include the transfer of our concepts to approximating other centrality measures in a fully-dynamic manner, e. g. closeness, and the extension to directed graphs, for which a good approximation is the only obstacle. Moreover, making the betweenness code run in parallel will further accelerate the computations in practice. Our implementation will be made available as part of a future release of the network analysis tool suite NetworKit [20].


Acknowledgements. This work is partially supported by DFG grant FINCA (ME-3619/3-1) within the SPP 1736 Algorithms for Big Data. We thank Moritz von Looz for providing the synthetic dynamic networks and the numerous contributors to the NetworKit project. We also thank Matteo Riondato (Brown University) and anonymous reviewers for their constructive comments.

References

  • [1] D. A. Bader, S. Kintali, K. Madduri, and M. Mihail. Approximating betweenness centrality. In 5th Workshop on Algorithms and Models for the Web-Graph (WAW ’07), volume 4863 of Lecture Notes in Computer Science, pages 124–137. Springer, 2007.
  • [2] R. Bauer and D. Wagner. Batch dynamic single-source shortest-path algorithms: An experimental study. In 8th Int. Symp. on Experimental Algorithms (SEA ’09), volume 5526 of LNCS, pages 51–62. Springer, 2009.
  • [3] E. Bergamini, H. Meyerhenke, and C. Staudt. Approximating betweenness centrality in large evolving networks. In 17th Workshop on Algorithm Engineering and Experiments, ALENEX 2015, pages 133–146. SIAM, 2015.
  • [4] U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25:163–177, 2001.
  • [5] U. Brandes and C. Pich. Centrality estimation in large networks. I. J. Bifurcation and Chaos, 17(7):2303–2318, 2007.
  • [6] A. D’Andrea, M. D’Emidio, D. Frigioni, S. Leucci, and G. Proietti. Experimental evaluation of dynamic shortest path tree algorithms on homogeneous batches. In 13th Int. Symp. on Experimental Algorithms (SEA ’14), volume 8504 of LNCS, pages 283–294. Springer, 2014.
  • [7] D. Frigioni, A. Marchetti-Spaccamela, and U. Nanni. Semi-dynamic algorithms for maintaining single-source shortest path trees. Algorithmica, 22:250–274, 2008.
  • [8] R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In 10th Workshop on Algorithm Engineering and Experiments (ALENEX ’08), pages 90–100. SIAM, 2008.
  • [9] K. Goel, R. R. Singh, S. Iyengar, and Sukrit. A faster algorithm to update betweenness centrality after node alteration. In Algorithms and Models for the Web Graph - 10th Int. Workshop, WAW 2013, volume 8305 of Lecture Notes in Computer Science, pages 170–184. Springer, 2013.
  • [10] O. Green, R. McColl, and D. A. Bader. A fast algorithm for streaming betweenness centrality. In SocialCom/PASSAT, pages 11–20. IEEE, 2012.
  • [11] M. Kas, K. M. Carley, and L. R. Carley. An incremental algorithm for updating betweenness centrality and k-betweenness centrality and its performance on realistic dynamic social network data. Social Netw. Analys. Mining, 4(1):235, 2014.
  • [12] N. Kourtellis, G. De Francisci Morales, and F. Bonchi. Scalable online betweenness centrality in evolving graphs. Knowledge and Data Engineering, IEEE Transactions on, PP(99):1–1, 2015.
  • [13] J. Kunegis. KONECT: the koblenz network collection. In 22nd Int. World Wide Web Conf., WWW ’13, pages 1343–1350, 2013.
  • [14] M. Lee, J. Lee, J. Y. Park, R. H. Choi, and C. Chung. QUBE: a quick algorithm for updating betweenness centrality. In 21st World Wide Web Conf. 2012, WWW 2012, pages 351–360. ACM, 2012.
  • [15] J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In 11th Int. Conf. on Knowledge Discovery and Data Mining, pages 177–187. ACM, 2005.
  • [16] M. Nasre, M. Pontecorvi, and V. Ramachandran. Betweenness centrality - incremental and faster. In Mathematical Foundations of Computer Science 2014 - 39th Int. Symp., MFCS 2014, volume 8635 of Lecture Notes in Computer Science, pages 577–588. Springer, 2014.
  • [17] G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms, 21:267–305, 1992.
  • [18] M. Riondato and E. M. Kornaropoulos. Fast approximation of betweenness centrality through sampling. In 7th ACM Int. Conf. on Web Search and Data Mining (WSDM ’14), pages 413–422. ACM, 2014.
  • [19] L. Roditty and U. Zwick. On dynamic shortest paths problems. Algorithmica, 61(2):389–401, 2011.
  • [20] C. Staudt, A. Sazonovs, and H. Meyerhenke. NetworKit: An interactive tool suite for high-performance network analysis. http://arxiv.org/abs/1403.3005, 2014.
  • [21] M. von Looz, C. L. Staudt, H. Meyerhenke, and R. Prutkin. Fast generation of complex networks with underlying hyperbolic geometry. http://arxiv.org/abs/1501.03545v2, 2015.

Appendix A Description of the fully-dynamic algorithms

a.1 Dynamic approximation

Algorithm 2 describes the initialization. Initially, we put all the nodes in a queue and compute an SSSP from the nodes we extract. During the SSSP search, we mark as visited all the nodes we scan. When extracting the nodes, we skip those that have already been visited: this avoids us to compute multiple approximations for the same component. In the update (Algorithm 3), we recompute the SSSPs and the approximations with updateApprVD-W (or updateApprVD-U). Since components might split, we might need to add approximations for some new subcomponents, in addition to recompute the old ones. Also, if components merge, we can discard the superfluous approximations. To do this, we keep track, for each node, of the number of times it has been visited. Let denote this number for node . Before the update, all the nodes are visited exactly once. While updating an SSSP from , we increase (decrease) by one of the nodes that become reachable (unreachable) from . This way we can skip the update of the SSSPs from nodes that have already been visited. After the update, for all nodes that have become unvisited (), we compute a new approximation from scratch.

1 ;
2 foreach node  do
3        ; insert into ;
4       
5 end foreach
6;
7 while  do
8        extract from ;
9        if  then
10               ;
               // initApprVD adds 1 to of the nodes it visits
11               initApprVD();
12               ;
13              
14        end if
15       
16 end while
17 ;
18 ;
return
Algorithm 2 Dynamic approximation (initialization)
1 ;
2 foreach  do
3        if vis( then
4               remove and ; decrease ;
5              
6        end if
7       else
               // updateApprVD updates , inserts all for which into and computes a approximation
8               updateApprVD() ;
9              
10        end if
11       
12 end foreach
13 ;
14 while  do
15        extract from ;
16        if  then
17               ;
18               initApprVD();
19               ; ;
20              
21        end if
22       
23 end while
24 reset to 1 for nodes such that ;
25 ;
return
Algorithm 3 Dynamic approximation (updateApprVD)

a.2 Dynamic SSSP update for weighted graphs

Algorithm 4 describes the SSSP update for weighted graphs. The pseudocode updates both the approximation for the connected component of and the number of shortest paths from , so it basically includes both updateSSSP-W and updateApprVD-W. Initially, we scan the edges in and, for each , we insert the endpoint with greater distance from into (w.l.o.g., let be such endpoint). The priority of represents the candidate new distance of . This is the minimum between the and plus the weight of the edge . Notice that we use the expression "insert into " for simplicity, but this can also mean update if is already in and the new priority is smaller than . When we extract a node from , we have two possibilities: (i) there is a path of length and is actually the new distance or (ii) there is no path of length and the new distance is greater than . In the first case (Lines 4 - 4), we set to and insert the neighbors of such that into (to check if new shorter paths to that go through exist). In the second case (Lines 4 - 4), we assume there is no shortest path between and anymore, setting to . We compute as (the new candidate distance for ) and insert into . Also its neighbors could have lost one (or all of) their old shortest paths, so we insert them into as well. The update of can be done while scanning the batch and of and when we update . When updating , we also increase in case the old was equal to (i. e. w has become reachable) and we decrease when we set to (i. e. has become unreachable). We update the number of shortest paths after updating , as the sum of the shortest paths of the predecessors of (Lines 4 - 4).

1 empty priority queue;
2 foreach  do
3        insertOrDecreaseKey();
4       
5 end foreach
6 ;
7 while there are nodes in  do
8        extractMin();
9        ;
10        if  then
11               update and ;
12               if  then
13                      ;
14                     
15               end if
16              ; ;
17               foreach incident edge  do
18                      if  then
19                             ;
20                            
21                      end if
22                      if  then
23                             insertOrDecreaseKey();
24                            
25                      end if
26                     
27               end foreach
28              
29        end if
30        else
31               if  then
32                      ;
33                      if vis(w)=0 then
34                             insert into ;
35                            
36                      end if
37                     if  then
38                             insertOrDecreaseKey();
39                             foreach incident edge  do
40                                    if  then
41                                           insertOrDecreaseKey();
42                                          
43                                    end if
44                                   
45                             end foreach
46                            ;
47                            
48                      end if
49                     
50               end if
51              
52        end if
53       
54 end while
Algorithm 4 SSSP update for weighted graphs (updateSSSP-W)
1 Assumption: ;
2 array of empty queues;
3 foreach  do
4        ; enqueue ;
5       
6 end foreach
7 ;
8 while there are nodes in  do
9        while  do
10               dequeue ;
11               if then continue;
12               ;
13               if  then
14                      update and ;
15                      if then ;
16                      ; ; ;
17                      foreach incident edge  do
18                             if  then
19                                    ;
20                                   
21                             end if
22                             if  then
23                                    enqueue ;
24                                   
25                             end if
26                            
27                      end foreach
28                     
29               end if
30               else
31                      if  then
32                             ;
33                             ;
34                             if vis(w)=0 then
35                                    insert into ;
36                                   
37                             end if
38                            if  then
39                                    enqueue ;
40                                    foreach incident edge  do
41                                           if  then
42                                                  enqueue ;
43                                                 
44                                           end if
45                                          
46                                    end foreach
47                                   
48                             end if
49                            
50                      end if
51                     
52               end if
53              
54        end while
55       ;
56       
57 end while
58 Set to white all the nodes that have been in ;
Algorithm 5 SSSP update for unweighted graphs (updateSSSP-U)

a.3 Dynamic SSSP update for unweighted graphs

Algorithm 5 shows the pseudocode. As in Algorithm 4, we first scan the batch (Lines 55) and insert the nodes in the queues. Then (Lines 55), we scan the queues in order of increasing distance from , in a fashion similar to that of a priority queue. In order not to insert a node in the queues multiple times, we use colors: Initially we set all the nodes to white and then we set a node to black only when we find the final distance of (i. e. when we set to ) (Line 5). Black nodes extracted from a queue are then skipped (Line 5). At the end we reset all nodes to white.

a.4 Fully-dynamic BC approximation

Similarly to IA and IAW, we replace the sampled paths between vertex pairs with new shortest paths between the same vertex pairs. However, here we also check whether (and consequently the number of samples) has increased after the batch of edge updates. If so, we sample additional paths (computing new SSSPs from scratch) according to the new value of . Instead of updating and then the paths in two successive steps, we use the SSSPs from the source nodes to compute and update also , computing new SSSPs only for the components that are not covered by any of the source nodes. In the initialization (Algorithm 6), we first compute the SSSP, like in RK (Lines 6 - 6). However, we also check which nodes have been visited, as in Algorithm 2. While we compute the SSSPs, in addition to the distances and number of shortest paths, we also compute a approximation for each of the source nodes and increase of all the nodes we visit during the sources with initSSSPVD (Line 6). Since it is possible that the shortest paths do not cover all the components of , we compute an additional VD approximation for nodes in the unvisited components, like in Algorithm 2 (Lines 6 - 6). Basically we can divide the SSSPs into two sets: the set of SSSPs used to compute the shortest paths and the set of SSSPs used for a approximation in the components that were not scanned by the initial SSSPs. We call the number of the SSSPs in . The BC update after a batch is described in Algorithm 7. First (Lines 77), we recompute the shortest paths like in our incremental algorithms IA and IAW [3]: we update the SSSPs from each source node in and we replace the old shortest path with a new one (subtracting to the nodes in the old shortest path and adding to those in the new shortest path). Notice that here we do not store the predecessors so we need to recompute them (Lines 7 and 7). Instead of using an incremental SSSP algorithm like in IA-IAW, here we use the fully-dynamic updateSSSPVD that updates also the approximation and updates and keeps track of the nodes that become unvisited. Then (Lines 77), we add a new SSSP to for each component that has become unvisited (by both and ). After this, we have at least a approximation for each component of . We take the maximum over all these approximations and recompute the number of samples (Lines 77). If has increased, we need to sample new paths and therefore new SSSPs to add to . Finally, we normalize the BC scores, i. e. we multiply them by the old value of divided by the new value of (Line 7).

1 foreach node  do
2        ; ;
3       
4 end foreach
5 getApproxVertexDiameter();
6 ;
7 for  to  do
8        sampleUniformNodePair();
9        initSSSPVD();
10        ;
11        empty list;
12        ;
13        while  do
14               sample with probability ;
15               ;
16               add ; ;
17               ;
18              
19        end while
20       
21 end for
22 ;
23 ;
24 while  do
25        extract from ;
26        if  then
27               ;
28               initApprVD();
29               ;
30              
31        end if
32       
33 end while
34 ;
return
Algorithm 6 BC initialization
1 ;
2 for  to  do
3        ;
4        ;
        // updateSSSPVD updates , inserts all into and updates the approximation
5        updateSSSPVD();
        // we replace the shortest path between and
6        foreach  do
7               () ;
8              
9        end foreach
10       ;
11        empty list;
12        ;
13        while  do
14               sample with probability ;
15               ;
16               add to ;
17               ;
18               ;
19              
20        end while
21       
22 end for
23 for  to  do
24        updateApprVD();
25       
26 end for
27 ;
28 while  do
29        extract from ;
30        if  then
31               ;
32               initApprVD();
33               ; ;
34              
35        end if
36       
37 end while
// compute the maximum over all the computed by updateApprVD
38 ;
39 ;
40 if  then
41        sample new paths;
42        foreach  do
43              
44        end foreach
45       ;
46       
47 end if
return
Algorithm 7 Dynamic update of BC approximation (DA)

Appendix B Omitted proofs

b.1 Proof of Proposition 1

Proof

To prove the first inequality, we can notice that for all , since all the edges of are contained in those of . Also, since every edge has weight at least , . Therefore, , which can be rewritten as , for all . Thus, , where the last expression equals by definition.

To prove the second inequality, we first notice that , and analogously . Consequently, , supposing that