Fully Dynamic Single-Source Reachability in Practice: An Experimental Study

Fully Dynamic Single-Source Reachability in Practice: An Experimental Study

Kathrin Hanauer    Monika Henzinger    Christian Schulz
Abstract

Given a directed graph and a source vertex, the fully dynamic single-source reachability problem is to maintain the set of vertices that are reachable from the given vertex, subject to edge deletions and insertions. While there has been theoretical work on this problem, showing both linear conditional lower bounds for the fully dynamic problem and insertions-only and deletions-only upper bounds beating these conditional lower bounds, there has been no experimental study that compares the performance of fully dynamic reachability algorithms in practice. Previous experimental studies in this area concentrated only on the more general all-pairs reachability or transitive closure problem and did not use real-world dynamic graphs.

In this paper, we bridge this gap by empirically studying an extensive set of algorithms for the single-source reachability problem in the fully dynamic setting. In particular, we design several fully dynamic variants of well-known approaches to obtain and maintain reachability information with respect to a distinguished source. Moreover, we extend the existing insertions-only or deletions-only upper bounds into fully dynamic algorithms. Even though the worst-case time per operation of all the fully dynamic algorithms we evaluate is at least linear in the number of edges in the graph (as is to be expected given the conditional lower bounds) we show in our extensive experimental evaluation that their performance differs greatly, both on random as well as on real-world instances.

University of Vienna, Faculty of Computer Science, Vienna, Austriakathrin.hanauer@univie.ac.athttps://orcid.org/0000-0002-5945-837X University of Vienna, Faculty of Computer Science, Vienna, Austriamonika.henzinger@univie.ac.at University of Vienna, Faculty of Computer Science, Vienna, Austriachristian.schulz@univie.ac.athttps://orcid.org/0000-0002-2823-3506

 

Fully Dynamic Single-Source Reachability in Practice: An Experimental Study


 

Christian Schulz


1 Introduction

Many real-world problems can be expressed using graphs and in turn be solved using graph algorithms. Often the underlying graphs or input instances change over time, i.e., vertices or edges are inserted or deleted while time is passing. For example, in a social network, new users and relations between them may be created or removed over time. Another typical example is the OpenStreetMap road network which changes over time as roads are temporarily closed or simply because new information is added to the system by users. Given a concrete graph problem, computing a new solution for every change that occurs in the graph can be an expensive task on huge networks and ignores the previously gathered information on the instance under consideration. Hence, a whole body of algorithms and data structures for dynamic graphs has been discovered in the last decades. It is not surprising that dynamic algorithms and data structures are in most cases more difficult to design and analyze than their static counterparts.

Typically, dynamic graph problems are classified by the types of updates allowed. A problem is said to be fully dynamic if the update operations include insertions and deletions of edges. If only insertions are allowed, the problem is called incremental; if only deletions are allowed, it is called decremental.

One of the most basic questions that one can pose is that of reachability in graphs, i.e., answering the question whether there is a directed path between two distinct vertices. Already this simple problem has many applications such as in source code analysis [19], in the analysis of social networks—e.g., if somebody is a friend of a friend—in computational biology when analyzing metabolic or protein-protein interaction networks [6], or in the computation of (dynamic) maximum flows [7].

The single-source reachability problem has been extensively analyzed theoretically. The fully dynamic single-source reachability (SSR) problem is to maintain the set of vertices that are reachable from a given source vertex, subject to edge deletions and insertions. For the static version of the problem, i.e., when the graph does not change over time, reachability queries can be answered in constant time after linear preprocessing time by running, e.g., breadth-first search from the source vertex and marking each reachable vertex. This approach can be extended in the insertions-only case by using incremental breadth-first search so that each insertion takes amortized constant time and each query takes constant time. In the fully dynamic case, however, conditional lower bounds [12, 1] give a strong indication that no faster solution than the naive recomputation from scratch is possible after each change in the graph. There has been a large body of research on the deletions-only case [21, 10, 3] leading to a  [2] amortized expected time per deletion. However, to the best of our knowledge, there has been no prior experimental evaluation of fully dynamic single-source reachability algorithms.

In this paper, we attempt to start bridging this gap by empirically studying an extensive set of algorithms for the single-source reachability problem in the fully dynamic setting. In particular, we design several fully dynamic variants of well-known static approaches to obtain and maintain reachability information with respect to a distinguished source. Moreover, we modify existing algorithms that provide theoretical guarantees under the insertions-only or deletions-only setting to be fully dynamic. We then perform an extensive experimental evaluation on random as well as real-world instances in order to compare the performance of these algorithms. In addition, we introduce and assess different thresholds that trigger a recomputation from scratch to mitigate extreme update costs.

2 Preliminaries

2.1 Basic Concepts

Let be a directed multigraph with vertex set and edge multiset , where denotes the multiplicity of an edge . Throughout this paper, let and . The density of is . An edge has tail and head and and are said to be adjacent. The in-neighborhood (out-neighborhood ) of a vertex is the set of vertices such that (). The neighborhood of a vertex is the set of vertices adjacent to and its degree is the size of . A sequence of vertices such that each pair of consecutive vertices is connected by an edge, is called an - path and can reach . A graph is strongly connected if there is an - path between every pair of vertices . The paper deals with the fully dynamic single-source reachability problem (SSR): Given a directed graph and a source vertex , answer reachability queries starting at , subject to edge insertions and deletions.

2.2 Related Work

A whole body of algorithms [21, 9, 14, 10, 11, 3, 2, 13, 20] for SSR has been discovered in the last decades and has been complemented by several results on lower bounds [12, 1, 22]. In the incremental setting, an incremental breadth-first or depth-first search yields a total update time of . The same update time can be achieved also in the decremental setting if the graph is acyclic [13]. For general graphs, the currently best decremental algorithm maintains reachability information in time [2]. In the fully dynamic setting, the fastest algorithm has a worst-case time of per update [20]. Assuming the OMV conjecture, no algorithm for SSR exists with a worst-case update time of and a worst-case query time of ,  [12]. Moreover, a combinatorial SSR algorithm with a worst-case update or query time of would also imply faster combinatorial algorithms for Boolean matrix multiplication and other problems [1, 22]. See Section A.1 for more details.

In extensive studies, Frigioni et al. [5] as well as Krommidas and Zaroliagis [15] have evaluated a huge set of algorithms for the more general fully dynamic all-pairs reachability problem experimentally on random dynamic graphs of size up to vertices as well as two static real-world graphs with randomly generated update operations. They concluded that, despite their simple-mindedness, static breadth-first or depth-first search outperform their dynamic competitors on a large number of instances. There has also been recent development in designing algorithms that maintain a reachability index in the static setting [18, 23, 4, 24], which were evaluated experimentally [18] on acyclic random and real-world graphs of similar sizes as in this paper.

3 Algorithms

We implemented and tested a variety of combinatorial algorithms. An overview is given in Table 1. Additionally, Table 2 subsumes the corresponding theoretical worst-case running times and space requirements. Not all of them are fully dynamic or even dynamic in their original form and have therefore been “dynamized” by us in a more or less straightforward manner. In this section, we provide a short description of these algorithms, their implementation, and the variants we considered. Each algorithm consists of up to four subroutines: initialize(), edgeInserted(), edgeDeleted(), and query(), which define the algorithm’s behavior during its initialization phase, in case that an edge  is added or removed, and if it is queried whether a vertex is reachable from the source, respectively. We distinguish three groups: The first group comprises algorithms that are based on static breadth-first and depth-first search with some improvements. Algorithms in the second group are based on a simple incremental algorithm that maintains an arbitrary, not necessarily height-minimal, reachability tree, and algorithms in the third group use Even-Shiloach trees and thus maintain a (height-minimal) breadth-first search tree. We did not implement the more sophisticated deletions-only single-source reachability algorithms [10, 11, 3, 2] as they are very involved and due to their complexity we expect them to perform poorly in practice. In the following, we assume an incidence list representation of the graph, i.e., each vertex has a list of incoming and outgoing edges.

3.1 Dynamized Static Algorithms

Depth-first search (DFS) and breadth-first search (BFS) are the two classic approaches to obtain reachability information in a static setting. Despite their simplicity, studies for all-pairs reachability [5, 15] report even their pure versions to be at least competitive with genuine dynamic algorithms and even superior on various instances. We consider three variants each: For our variants SDFS and SBFS (Static DFS/BFS), we do not maintain any information and start the pure, static algorithm for each query anew from the source. Thus, all work is done in query().

Algorithm Long name Algorithm Long name
SDFS / CDFS / LDFS Static/Caching/Lazy DFS ES(/) Even-Shiloach
SBFS / CBFS / LBFS Static/Caching/Lazy BFS MES(/) Multi-Level Even-Shiloach
SI(//) Simple Incremental SES(/) Simplified Even-Shiloach
Table 1: Algorithms and abbreviations overview.

Second, we introduce a cache as a simple means to speedup queries for our variants CDFS and CBFS (Caching DFS/BFS). The cache contains reachability information for all vertices and is recomputed entirely in query() if it has been invalidated by an update. The rules for cache invalidation are as follows: An edge insertion is considered critical if it connects a reachable vertex to a previously unreachable vertex. Similarly, an edge deletion is critical if its head is reachable. The algorithms keep track of whether a critical insertion or deletion has occurred since the last recomputation. The cache is invalidated if either a critical edge insertion has occurred and the cached reachability state of a queried vertex is unreachable, or if a critical deletion has occurred and the cached reachability state of is reachable. Both algorithms may use initialize() to build their cache.

Finally, we also implemented lazy, caching variants LDFS and LBFS (Lazy DFS/BFS). In contrast to the former two, these algorithms only keep reachability information of vertices they have encountered while answering a query. As a vertex can only be assumed to be unreachable if the graph traversal has been exhaustive, the algorithms additionally maintain a flag exhausted. For query(), the cached state of is hence returned if ’s cached state is reachable and no critical edge deletion has occurred. Otherwise, in case that there was no critical edge insertion and ’s cached state is unreachable, the algorithm has to check the flag exhausted. If it is not set, the graph traversal that has been started at a previous query is resumed, thereby updating the cache, until either is encountered or all reachable vertices have been visited. Then, the algorithm returns ’s (cached) state. In all other cases, the cache is invalidated and the traversal must be started anew.

3.2 Reachability-Tree Algorithms

In a pure incremental setting, i.e., without edge deletions, an algorithm that behaves like LDFS or LBFS, but updates its cache on edge insertions rather than queries, can answer queries in time and spends only in total for all edge insertions, i.e., its amortized time for an edge insertion is . We refer to this algorithm as SI (Simple Incremental) and describe various options to make it fully dynamic. For every vertex , SI maintains a flag , which is used to implement query() in constant time, as well as a pointer to its parent in the reachability tree. More specifically, the algorithm implements the different operations as follows:

initialize(): During initialization, the algorithm traverses the graph using BFS starting from and sets and for each vertex accordingly.

edgeInserted(): If , but not was reachable before, update reachable and parent of all vertices that can be reached from and were unreachable before by performing a BFS starting at .

edgeDeleted(): If , the deletion of requires to check and update all vertices in the subtree rooted at . We consider two basic options: Updating the stored reachability information or recomputing it entirely from scratch. For the former, we first identify a list L of vertices whose reachability is possibly affected by the edge deletion, which comprises all vertices in the subtree rooted at and is obtained by a simple preorder traversal. Their state is temporarily set to unknown and their parent pointers are reset. Then, the reachability of every vertex in L is recomputed by traversing the graph by a backwards BFS starting from until a reachable ancestor is found or the graph is exhausted. If is reachable, the vertices on the path from to are added to the reachability tree using a vertex’s predecessor on the path as its parent. If is unreachable, so must be all vertices encountered during the backwards traversal. In both cases, this may, thus, reduce the number of vertices with state unknown. Optionally, if is reachable, the algorithm may additionally start a forward BFS traversal from to update the reachability information of all vertices with status unknown in L that are reachable from . Moreover, L can be processed in order either as constructed or reversed. Independently of this choice, the worst-case running time is in . Recomputing from scratch, the second option, requires worst-case update time.

Time Space
Algorithm Insertion Deletion Query Permanent Update
SBFS, SDFS
CBFS, CDFS, LBFS, LDFS
SI(,,)
ES(,), MES(,)
SES(,)
Table 2: Worst-case running times and space requirements.

Thus, our implementation of SI takes three parameters: two boolean flags (negated: ) and (negated: ), specifying whether L should be processed in reverse order and whether a forward search should be started for each re-reachable vertex, respectively, as well as a ratio indicating that if L contains more than elements, the reachability information for all vertices is recomputed from scratch.

3.3 Shortest-Path-Tree Algorithms

In 1981, Even and Shiloach [21] described a simple decremental connectivity algorithm for undirected graphs that is based on the maintenance of a BFS tree and requires amortized update time. Such a tree is also called Even-Shiloach tree or ES tree for short. Henzinger and King [9] were the first to observe that ES trees immediately also yield a decremental algorithm for SSR on directed graphs with the same amortized update time if the source s is used as the tree’s root. We extend this data structure to make it fully dynamic and consider various variants.

For every vertex , an ES tree maintains its BFS level , which corresponds to ’s distance from s, as well as an ordered list of in-neighbors . To efficiently manage this list in the fully dynamic setting, the algorithm additionally uses an index of size that maps each edge to ’s position in . If is reachable, its parent in the BFS tree is the in-neighbor at level whose index is the smallest in (invariant). The algorithm stores the parent’s index in as . If is unreachable, (invariant). A reachability query query() can thus be answered in by testing whether .

initialize(): The ES tree is built during initialization by a BFS traversal starting from the source. In doing so, is populated for each vertex in the order in which the edges are encountered. Thus, after the initialization, . The update operations are implemented as follows.

edgeInserted(): Update the data structure in worst-case time by starting a BFS from and checking for each vertex that is encountered whether either its level or, subordinately, its parent index can be decreased.

edgeDeleted(): If is a tree edge, the algorithm tries to find a substitute parent for . To this end, is added to an initially empty FIFO-queue Q containing vertices whose parent and, if necessary, whose level has to be newly determined. Vertices in Q are processed one-by-one as follows: For each vertex , the parent index is increased until it either points to an in-neighbor at level or is exhausted. In the latter case, if , ’s level is increased by one, is reset to zero, and all children of in the BFS tree as well as itself are added to Q. Otherwise, is unreachable and . This operation has a worst-case running time of .

In view of this large update cost, we again introduce an option to alternatively recompute the BFS tree from scratch. We use two parameters to control the algorithm’s behavior: a factor that limits the number of vertices that may be processed in the queue to as well as an upper bound on how often each vertex may be reinserted into the queue before the update operation is aborted and a recomputation is triggered. We refer to this algorithm as ES (Even-Shiloach). Observe that if the algorithm recomputes immediately, i.e., if , or each vertex may be processed in Q only a constant number of times i.e., if , the worst-case theoretical running time is only .

We also implemented a variation of ES that sets the parent index of a vertex in the queue directly to that of the lowest-level in-neighbor and updates accordingly, which avoids the immediate re-insertion of into the queue. More precisely, while iterating through the list of in-neighbors , as realized by increasing , this variation keeps track of the minimum level and the corresponding index of an in-neighbor encountered thereby. If reaches , i.e., no in-neighbor at level has been found, is set to and the search continues until attains the value it had when removed from Q. Then, is set to , , and, if has increased, all children of in the BFS tree, but not itself, are added to Q. As vertices may skip several levels in one step, we refer to this version of ES as MES (Multi-Level Even-Shiloach).

We also consider an even further simplification of ES, SES (Simplified Even-Shiloach), which does no longer maintain an ordered list of in-neighbors for each vertex and hence also no parent index . Instead, it stores for each reachable vertex a direct pointer to its parent in the BFS tree. For each vertex in Q, SES simply iterates over all in-neighbors in arbitrary order and sets ’s parent to one of minimum level. If this increases ’s level, all children of in the BFS tree are added to Q. Both MES and SES take the same two parameters as ES to control when to recompute the data structure from scratch.

4 Experiments

4.1 Environmental Conditions and Methodology

We evaluated the performance of all algorithms described in Section 3 with all available parameters on both random and real-world instances. All algorithms were implemented in C++17 and compiled with GCC 7.3.0 using full optimization (-O3 -march=native -mtune=native). Experiments were run on a machine with two Intel Xeon E5-2643 v4 processors clocked at 3.4 GHz and 1.5TB of RAM under Ubuntu Linux 18.04 LTS. Each experiment was assigned exclusively to one core.

For each algorithm and graph, we measured the time spent during initialization as well as for each insertion, deletion, and query. From these, we obtained the total insertion time, total deletion time, total update time, and total query time as the respective sums. For the smaller random instances, we ran each experiment three times and use the medians of the aggregations for the evaluation to counteract artifacts of measurement and accuracy.

In the following, we use and as abbreviations for and , respectively.

4.2 Instances

Random Instances.

We generated a set of smaller random directed graphs according to the Erdős-Renyí model with vertices and edges, where , in each case along with a random sequence of operations consisting of edge insertions, edge deletions, as well as reachability queries. In the same fashion, we generated a set of larger instances with vertices and edges. For insertions, we drew pairs of vertices uniformly at random from , allowing also for parallel edges. For deletions and reachability queries, each edge or vertex, respectively, was equally likely to be chosen. For a fixed source vertex, we tested sequences of operations, where insertions, deletions, and queries appear in batches of ten, but are processed individually by the algorithms. We evaluated different proportions of the three types of operations.

It is well-known that for simple, random graphs with vertices, the probability of for a pair of vertices to be connected by an edge is a threshold for strong connectivity [8]. Thus, we expect to observe the largest differences in the algorithms’ performances on graphs up to a density of around , and a decline in the update costs for denser graphs.

Real-World Instances from Konect.

We used all six directed, dynamic instances available from the Koblenz Network Collection [16], KONECT, a collection of real-world graphs from various application scenarios. The graphs consist of a list of edge insertions and deletions, each of which is assigned a timestamp, and model the hyperlink structure between Wikipedia articles for six different languages. The edge insertions and deletions with the smallest timestamp form the initial graph for our evaluation, and all further updates are grouped by their timestamp. We set the source vertex to be the tail of the first edge with minimum timestamp. Our instances have between (simple English) and vertices (French) and from initially less than five up to to edges, which result from between and update operations, consisting of both edge insertions and deletions. We refer to these instances as FR, DE, IT, NL, PL, and SIM.

To see whether differences in the algorithms’ performance are rather due to the structure of the graphs or the order of updates, we generated five new, “shuffled” instances per language by randomly assigning new timestamps to the update operations. As for the original instances provided by KONECT, we ignored removals of non-existing edges.

Real-World Instances from Snap.

Additionally, we use a collection of snapshots of the graph describing relationships in the CAIDA Internet Autonomous System, which is made available via the Stanford Large Network Dataset Collection SNAP [17]. We built a dynamic, directed graph AS-CAIDA with and to  from this collection by using the differences between two subsequent snapshots as updates. Edges are directed from provider to customer and there is a pair of anti-parallel edges between peers and siblings. We obtained ten instances from this graph by choosing one of the ten vertices with highest out-degree, respectively, as source.

Table A.3 lists the detailed numbers for all real-world instances. In each case, the updates are dominated by insertions, which constitute for AS-CAIDA and   to   for KONECT. The average density varies between (AS-CAIDA) and (IT).

4.3 Experimental Results

4.3.1 Random graphs

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

(\subref*fig:random-100k-static-queries-rel)

(\subref*fig:random-100k-si-deletions-rel)

(\subref*fig:random-100k-es-deletions-rel)

(\subref*fig:random-100k-xes-deletions-rel)

(\subref*fig:random-100k-fast-additions-rel)

(\subref*fig:random-100k-fast-deletions-rel)

(\subref*fig:random-100k-fast-additions-abs)

(\subref*fig:random-100k-fast-deletions-abs)

(\subref*fig:random-100k-fast-updates-rel)

(\subref*fig:random-100k-all-init-abs)

(\subref*fig:random-100k-all-total-abs)

(\subref*fig:random-100k-all-total-rel)

Figure 1: Results on random instances with , , and .

For , we generated graphs per density along with a sequence of operations, where edge insertions, edge deletions, and queries were equally likely. In consequence, the density of each dynamic graph remains more or less constant during the update sequence. The timeout was set to one hour. Figure 1 depicts the results, which we will discuss in the following. A vertical dark gray line marks the strong connectivity threshold of , which is about . Note that the plots use logarithmic axis in both dimensions.

Relative Performances within Groups (Figures 1\subreffig:random-100k-xes-deletions-rel).

For the discussion of the results, we group the algorithms as in Section 3 and start with the six dynamized static algorithms SBFS, SDFS, CBFS, CDFS, LBFS, and LDFS. Recall that all work is done in query() here, which is why we evaluate them based on their mean total query time. Figure 1 shows the relative performance of this algorithm group compared to LBFS, which was the best algorithm on average over all densities and for each density always seven to times faster on average than the “pure” static algorithms SBFS and SDFS. Up to a density of , LBFS is beaten by LDFS, however, the performance gap between LBFS and LDFS increases at least linearly as the graphs become denser. The eager caching versions CBFS and CDFS show similar performance to their lazy counterparts on sparse graphs, but then deteriorate exponentially compared to the latter and eventually even fall behind the pure static variants SBFS and SDFS, respectively. In all cases, the algorithms based on DFS are only faster than their BFS-based counterparts on sparser instances and distinctly slower on denser ones.

The second group of algorithms consists of the fully dynamic variants of the simple incremental algorithm SI. These algorithms only differ in their implementation of edgeDeleted() and, thus, we evaluated them on their mean deletion time. We tested different combinations of the boolean flags and along with different values for the recomputation threshold . Regardless of , the algorithms SI(//) were faster than the algorithms using other combinations of the flags, but the same value , where the worst-performing was SI(//). If the flags and were fixed, smaller values for showed better performance than larger, except for extremely small ones. Recall that if is zero, the algorithm always discards its current reachability tree and recomputes it from scratch using BFS, whereas if is one, it always reconstructs a reachability tree. Hence, may be seen as a means to control outliers that necessitate the re-evaluation of the reachability of a large number of vertices. To keep the number of variants manageable, Figure 1 only shows the relative mean total deletion time of SI with four different parameter sets: / with , , and , respectively, and / with . The fastest algorithm on average across all densities in this set was SI(//.25), which is therefore also used as reference. The same algorithm with disabled forward search, i.e., SI(//.25), was up to a factor of around slower on sparse graphs. As the graphs become denser, this factor decreases exponentially down to less than for graphs having and above. SI(//.5) and SI(//1) show similar performance as SI(//.25) for densities of at least and , respectively, however with extreme spikes at and if . In conclusion, low values for can effectively control outliers and speed up the average deletion time by factors of up to .

The third group of algorithms comprises those based on ES trees: ES, MES, and SES. We tested each of them with different values for the parameters and . Here, both parameters serve to limit excessive update costs that occur when either the levels of a smaller set of vertices in the ES tree increase multiple times () or a large set of vertices is affected (). We tested three parameter sets: An early abortion of the update process and recomputation with and , a late variant with and , and finally and , which does not impose any limits. Similar as in case of SI, the algorithms only differ in their implementation of edgeDeleted(). Figure 1 reports the mean total deletion time relative to the (on average) best algorithm in this set, SES(5/.5). For sparse graphs, the ES algorithms were up to approximately times slower than SES(5/.5). This factor drops super-exponentially as the graphs become denser and reaches a value of around near the strong connectivity threshold at . The unlimited variants showed an even worse performance on graphs up to a density of with several timeouts, but a performance similar to, or, in case of ES, even better one than their limited versions for denser graphs.

Differences between the limited versions of MES and SES are barely observable on this scale. Figure 1 zooms in on the values of interest for these algorithms. Evidently, SES(5/.5) outperforms MES(5/.5) both on very instances up to as well as denser ones from and onward. In the middle range, it is less than slower than MES(5/.5). Recall that in contrast to SES, MES stores information about the incoming neighbors of a vertex. However, for very sparse as well as denser instances, the additional knowledge available to MES seemingly cannot outweigh the increased workload that comes with the maintenance of this information: In the former case, the list of in-neighbors is short and therefore scanned very quickly in SES, whereas in the latter case, a replacement parent on the same level can be expected to be found very early in SES’s scanning process. For both SES and MES, the variants that are more reluctant to recompute from scratch perform slightly worse than their respective counterparts. The ES algorithms are almost always outperformed by all variants of MES and SES.

Update Performances (Figures 1\subreffig:random-100k-fast-updates-rel).

Next, we compare the relative performances of the SI and the ES/MES/SES algorithm classes using SI(//.25), SI(//.25), MES(5/.5), and SES(5/.5) as representatives. Figure 1 depicts the mean average total insertion times. Despite identical implementation, SI(//.25) is slightly faster than SI(//.25) on sparser instances, which may be due to structural differences in their reachability trees. MES(5/.5) and SES(5/.5) are four to approximately times slower than SI(//.25), where the maximum is reached at a density of . These experimental results conform with the theoretical performance analysis of SI, which yields a “perfect” amortized update time of in the incremental setting. MES(5/.5) is slightly slower than SES(5/.5) due to the additional information it maintains. The overall situation is inverted in case of deletions, as Figure 1 shows. Here, MES(5/.5) and SES(5/.5) outperform both SI(//.25) and SI(//.25), the latter even by a factor of almost on very sparse instances. SI(//.25) is to slower on average than SES(5/.5).

These findings suggest that SI(//.25) would be the best choice among these algorithms unless the proportion of edge deletions is markedly high. However, insertions and deletions are not equally costly, as Figures 1 and 1 demonstrate. The best and worst mean total running times for insertions are roughly by a factor of faster than for deletions. Figure 1 depicts the relative mean total update times, where insertions and deletions occur with equal probability. As deletions are distinctly more time-consuming than insertions, SES(5/.5) shows the best performance on average over all densities. Again, MES(5/.5) is slower on very sparse and slightly denser instances by up to approximately . SI(//.25)’s performance is roughly similar to MES(5/.5)’s, however with a largest deviation of from SES(5/.5)’s at .

Overall Performances (Figures 1\subreffig:random-100k-all-total-rel).

Even though it is of less importance if the operation sequences are long, we take a brief look at the initialization time. The algorithms are split into three groups here: Whereas SBFS, SDFS, LBFS, and LDFS do not use this phase, all other algorithms traverse the graph once and build up their data structures. CBFS, CDFS, SI, and SES reserve and access space, but ES and MES need to setup space, which is clearly reflected in the running time, as Figure 1 shows. Note that Figure 1 does not use logarithmic scales.

Finally, Figures 1 and 1 depict the mean total running time if insertions, deletions, and queries occur with equal probability. The fastest dynamized static algorithm, LBFS, is clearly outperformed by SI(//.25), MES(5/.5), and SES(5/.5) on all densities. For sparser graphs up to , however, the lazy and caching variants are faster than ES. On dense instances, where the update costs decrease rapidly, the initialization time begins to show through for SI and the ES family. The SES algorithms performed best in these experiments, with SES(5/.5) being the overall fastest on average.

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

(\subref*fig:ratio-25)

(\subref*fig:ratio-50)

(\subref*fig:ratio-100)

(\subref*fig:ratio-200)

Figure 2: Mean total update times in s relative to the mean average number of edges for varying ratios of insertions on random instances with , , and initial density  (\subreffig:ratio-25),  (\subreffig:ratio-50),  (\subreffig:ratio-100), and  (\subreffig:ratio-200).

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

(\subref*fig:random-10m-all-additions-rel)

(\subref*fig:random-10m-all-deletions-rel)

(\subref*fig:random-10m-all-updates-abs)

Figure 3: Results on random instances with , , and .
Ratios of Insertions, Deletions, and Queries (Figure 2).

We next investigate whether and how the picture changes if the proportion of insertions and deletions varies. Taking up on the observation that the SI algorithms were considerably faster on insertions than MES and SES, but slower on deletions, we compare the performance of the fastest of each of them, i.e., SI(//.25), MES(5/.5), and SES(5/.5) on random instances with vertices, different initial densities , and . We sampled ten graphs per density. As unequal ratios of insertions and deletions change the density of the graphs over time, Figure 2 shows the mean total update time divided by the average number of edges. As expected, MES(5/.5), and SES(5/.5) outperform SI(//.25) for low ratios of insertions, whereas the opposite holds if there are many insertions among the updates. The threshold is around for all densities. MES(5/.5) is similarly fast as SES(5/.5) if the proportion of deletions is high (and is small), and becomes relatively slower as the ratio of insertions grows.

In our setting, all dynamized static algorithms were clearly inferior to their competitors. We expected a performance increase if queries occur either very rarely or, if a cache is used, very frequently. We reviewed this assumption experimentally and found it confirmed, however, none of the dynamized static algorithms could compete with the dynamic ones. See Section A.2 for details.

Large Graphs (Figure 3).

We repeated our experiments on larger graphs with vertices for the algorithms MES, SES, and SI. Figure 3 shows the mean total insertion and deletion time, respectively, relative to the best algorithm SI(//.25), as well as the absolute mean total update time. As for the instances with , the update time is dominated heavily by the deletion time and decreases with growing density. The mean total update time relative to SI(//.25) here almost equals the deletion time, which is shown together with further plots in Figure A.6. SI still outperforms MES and SES for insertions on these instances, however, SI(//.25) also outperforms MES for deletions. Up to densities of at most five, SES(5/.5) is up to slower than SI(//.25), but almost faster for denser graphs.

Figure 4: Update times on real-world instances from KONECT and SNAP. Each bar consists of two sections: the lower, barely visible one represents the (mean) total insertion time, the upper one the (mean) total deletion time. The total height of each bar and the label on top correspond to the total (mean) update time.

4.3.2 Real-World Graphs

We evaluated the algorithms MES, SES, and SI also on real-world graphs. Figure 4 shows the results for the KONECT and SNAP instances. On all instances, SI(//.25) distinctly outperforms all competitors. SES(5/.5) and SES(100/1) behave very similar and are always faster than MES(5/.5) by several factors. SI(//.25)’s relative performance varies heavily between being second-best and by far the worst. The picture did not change for the shuffled KONECT instances, as depicted in Figure A.7. Since the operation sequences are long, the majority of updates are insertions, and SI(//.25) is reasonably fast also for deletions, the results are consistent with those for the random instances.

5 Conclusion

The fully dynamic version of the simple incremental algorithm, SI, with parameters // showed the best overall performance across all tested instances. It was the fastest algorithm on all real-world instances and among the top five for random ones. On almost all instances where it was not the best, the simplified Even-Shiloach algorithm SES with parameters was the fastest. In particular, SES was superior in handling edge deletions, which heavily dominated the update costs in general. All algorithms benefitted considerably from introducing recomputation thresholds. Breadth-first search and depth-first search, even with enhancements, were unable to compete with the dynamic algorithms, irrespective of the proportion of queries.

In a nutshell: For random, especially somewhat denser, instances with at least deletions, we recommend to use SES(5/.5), and otherwise SI(//.25).

References

  • [1] A. Abboud and V. V. Williams. Popular conjectures imply strong lower bounds for dynamic problems. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS ’14, pages 434–443. IEEE, 2014.
  • [2] A. Bernstein, M. Probst, and C. Wulff-Nilsen. Decremental strongly-connected components and single-source reachability in near-linear time. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing, STOC ’19, 2019.
  • [3] S. Chechik, T. D. Hansen, G. F. Italiano, J. Łącki, and N. Parotsidis. Decremental single-source reachability and strongly connected components in total update time. In 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 315–324. IEEE, 2016.
  • [4] James Cheng, Silu Huang, Huanhuan Wu, and Ada Wai-Chee Fu. Tf-label: a topological-folding labeling scheme for reachability querying in a large graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 193–204. ACM, 2013.
  • [5] D. Frigioni, T. Miller, U. Nanni, and C. Zaroliagis. An experimental study of dynamic algorithms for transitive closure. Journal of Experimental Algorithmics (JEA), 6:9, 2001.
  • [6] A. Gitter, A. Gupta, J. Klein-Seetharaman, and Z. Bar-Joseph. Discovering pathways by orienting edges in protein interaction networks. Nucleic Acids Research, 39(4):e22–e22, 11 2010.
  • [7] A. V. Goldberg, S. Hed, H. Kaplan, R. E. Tarjan, and R. F. Werneck. Maximum flows by incremental breadth-first search. In European Symposium on Algorithms, pages 457–468. Springer, 2011.
  • [8] A. J. Graham and D. A. Pike. A note on thresholds and connectivity in random directed graphs. Atlantic Electronic Journal of Mathematics, 3(1):1–5, 2008.
  • [9] M. Henzinger and V King. Fully dynamic biconnectivity and transitive closure. In 36th Annual Symposium on Foundations of Computer Science (FOCS), pages 664–672. IEEE, 1995.
  • [10] M. Henzinger, S. Krinninger, and D. Nanongkai. Sublinear-time decremental algorithms for single-source reachability and shortest paths on directed graphs. In 46th ACM Symposium on Theory of Computing, pages 674–683. ACM, 2014.
  • [11] M. Henzinger, S. Krinninger, and D. Nanongkai. Improved algorithms for decremental single-source reachability on directed graphs. In Automata, Languages, and Programming. Springer, 2015.
  • [12] M. Henzinger, S. Krinninger, D. Nanongkai, and T. Saranurak. Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In 47th ACM Symposium on Theory of Computing, STOC’15, pages 21–30. ACM, 2015.
  • [13] G. F. Italiano. Finding paths and deleting edges in directed acyclic graphs. Information Processing Letters, 28(1):5–11, 1988.
  • [14] V. King. Fully dynamic algorithms for maintaining all-pairs shortest paths and transitive closure in digraphs. In 40th Symposium on Foundations of Computer Science (FOCS), pages 81–89. IEEE, 1999.
  • [15] I. Krommidas and C. D. Zaroliagis. An experimental study of algorithms for fully dynamic transitive closure. ACM Journal of Experimental Algorithmics, 12:1.6:1–1.6:22, 2008.
  • [16] J. Kunegis. Konect: the Koblenz network collection. In 22nd International Conference on World Wide Web, pages 1343–1350. ACM, 2013.
  • [17] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
  • [18] Florian Merz and Peter Sanders. Preach: A fast lightweight reachability index using pruning and contraction hierarchies. In Andreas S. Schulz and Dorothea Wagner, editors, European Symposium on Algorithms, pages 701–712, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.
  • [19] T. Reps. Program analysis via graph reachability. Information and software technology, 40(11-12):701–726, 1998.
  • [20] P. Sankowski. Dynamic transitive closure via dynamic matrix inverse. In 45th Symposium on Foundations of Computer Science (FOCS), pages 509–517. IEEE, 2004.
  • [21] Y. Shiloach and S. Even. An on-line edge-deletion problem. Journal of the ACM, 28(1):1–4, 1981.
  • [22] V. V. Williams and R. Williams. Subcubic equivalences between path, matrix and triangle problems. In 51st Symposium on Foundations of Computer Science (FOCS), pages 645–654, 2010.
  • [23] Yosuke Yano, Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 1601–1606. ACM, 2013.
  • [24] Hilmi Yıldırım, Vineet Chaoji, and Mohammed J Zaki. Grail: a scalable index for reachability queries in very large graphs. The VLDB Journal—The International Journal on Very Large Data Bases, 21(4):509–534, 2012.

Appendix A Appendix

a.1 Related Work

In an incremental setting, where edges may only be inserted, but never are deleted, a total update time of for insertions can be achieved by an incremental breadth-first or depth-first search starting from the source vertex. For a long time, the best algorithm to handle a series of edge deletions and no insertions required a total update time of and actually solved the more general all-pairs shortest path problem. The algorithm is due to Even and Shiloach [21, 9, 14] and maintains a breadth-first tree under edge deletions. It is widely known as ES tree. Recently, Henzinger et al. [10, 11] broke the time barrier by giving a probabilistic algorithm with an expected total update time of . Shortly thereafter, Chechik et al. [3] improved this result further by presenting a randomized algorithm with total update time. Only lately, Bernstein et al. [2] showed that reachability information in the decremental setting can be maintained in total expected update time. Whereas these algorithms all operate on general graphs, Italiano [13] observed that a running time of may indeed be achieved also in the decremental setting if the input graph is acyclic. Finally, if both edge insertions and deletions may occur, Sankowski’s algorithms [20] for transitive closure imply a worst-case per-update running time of for the fully dynamic single-source reachability problem.

On the negative side, Henzinger et al. [12] showed that unless the Online Matrix-Vector Multiplication problem can be solved in time , , no algorithm for the fully dynamic single-source reachability problem exists with a worst-case update time of and a worst-case query time of , . Furthermore, if there is a combinatorial, fully dynamic s-t reachability algorithm with a worst-case running time of per update or query, then there are also faster combinatorial algorithms for Boolean matrix multiplication and other problems, as shown by Abboud and Vassilevska Williams [1] and Williams and Vassilevska Williams [22], respectively.

In extensive studies, Frigioni et al. [5] as well as Krommidas and Zaroliagis [15] have evaluated a huge set of algorithms for the more general fully dynamic all-pairs reachability problem experimentally on random dynamic graphs of size up to vertices as well as two static real-world graphs with randomly generated update operations. They concluded that, despite their simple-mindedness, static breadth-first or depth-first search outperform their dynamic competitors on a large number of instances. There has also been recent development in designing algorithms that maintain a reachability index in the static setting [18, 23, 4, 24], which were evaluated experimentally [18] on acyclic random and real-world graphs of similar sizes as in this paper.

a.2 Updates vs. Queries

All dynamized static algorithms were clearly inferior to their competitors on random instances with if all types of operations occurred with equal probability, which corresponds to a proportion of queries of . However, we expect a relative performance increase if either queries occur either very rarely or very frequently, where the latter naturally only applies to those algorithms that use a cache. We review this assumption experimentally by examining the performance of CBFS, CDFS, LBFS, and LDFS in comparison to SI(//.25), MES(5/.5), and SES(5/.5) for varying ratios of queries among the operations. We did not include SBFS and SDFS, as LBFS and LDFS are always at least as fast. We again sampled ten instances with vertices for each density , in each case along with operations. To keep the density of the graphs constant, insertions and deletions occur with equal probabilities. Figure A.5 depicts the mean total operation times. Although the results confirm our assumption, none of the dynamized static algorithms can compete with the dynamic ones, neither for sparse nor for denser graphs.

a.3 Additional Tables and Plots

Instance success
FR
DE
IT
NL
PL
SIM
FR_SHUF
DE_SHUF
IT_SHUF
NL_SHUF
PL_SHUF
SIM_SHUF
AS-CAIDA
Table A.3: Number of vertices , initial, average, and final number of edges , , and , average density , total number of updates with percentage of additions , and query success rate of real-world instances.

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

(\subref*fig:ratio-q-25)

(\subref*fig:ratio-q-50)

(\subref*fig:ratio-q-100)

(\subref*fig:ratio-q-200)

Figure A.5: Mean total operation times in seconds for varying ratios of queries and equal ratio of additions and deletions on random instances with , , and initial density  (\subreffig:ratio-25),  (\subreffig:ratio-50),  (\subreffig:ratio-100), and  (\subreffig:ratio-200).

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

\phantomsubcaption

(\subref*fig:app:random-10m-all-additions-abs)

(\subref*fig:app:random-10m-all-deletions-abs)

(\subref*fig:app:random-10m-all-updates-abs)

(\subref*fig:app:random-10m-all-additions-rel)

(\subref*fig:app:random-10m-all-deletions-rel)

(\subref*fig:app:random-10m-all-updates-rel)

(\subref*fig:app:random-10m-all-init-abs)

(\subref*fig:app:random-10m-all-total-abs)

(\subref*fig:app:random-10m-all-total-rel)

Figure A.6: Results on random instances with , , and .
Figure A.7: Update times on shuffled real-world instances from KONECT. Each bar consists of two sections: the lower, barely visible one represents the mean total insertion time, the upper one the mean total deletion time. The total height of each bar and the label on top correspond to the total mean update time.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
362608
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description