A New Approach to Incremental Cycle Detectionand Related Problems

A New Approach to Incremental Cycle Detection
and Related Problems

Michael A. Bender
Department of Computer Science
Stony Brook University
   Jeremy T. Fineman
Department of Computer Science
Georgetown University
   Seth Gilbert
Department of Computer Science
National University of Singapore
   Robert E. Tarjan
HP
and
Department of Computer Science
Princeton University
Abstract

We consider the problem of detecting a cycle in a directed graph that grows by arc insertions, and the related problems of maintaining a topological order and the strong components of such a graph. For these problems we give two algorithms, one suited to sparse graphs, the other to dense graphs. The former takes time to insert arcs into an -vertex graph; the latter takes time. Our sparse algorithm is considerably simpler than a previous -time algorithm; it is also faster on graphs of sufficient density. The time bound of our dense algorithm beats the previously best time bound of for dense graphs. Our algorithms rely for their efficiency on topologically ordered vertex numberings; bounds on the size of the numbers give bounds on running time.

\footnotenonumber

This work was supported in part by the National Science Foundation, under grants CCF-0621439/0621425, CCF-0540897/05414009, CCF-0634793/0632838, and CNS-0627645 for M. A. Bender, CCF-0621511, CNS-0615215, CCF-0541209, and NSF/CRA sponsored CIFellows program for J. T. Fineman, and CCF-0830676 and CCF-0832797 for R. E. Tarjan. The information contained herein does not necessarily reflect the opinion or policy of the federal government and no official endorsement should be inferred.

\secput

introIntroduction

Perhaps the most basic algorithmic problem on directed graphs is cycle detection. We consider an incremental version of this problem: given an initially empty graph that grows by on-line arc insertions, report the first insertion that creates a cycle. We also consider two related problems, that of maintaining a topological order of an acyclic graph as arcs are inserted, and maintaining the strong components of such a graph.

We use the following terminology. We order pairs lexicographically: if and only if either , or and . We denote a list by square brackets around its elements; “[ ]” denotes the empty list. We denote list catenation by “&”. In a directed graph, we denote an arc from to by . We disallow multiple arcs and loops (arcs of the form ). We assume that the set of vertices is fixed and known in advance, although our results extend to handle on-line vertex insertions. We denote by and the number of vertices and arcs, respectively. We assume that is known in advance; our results extend to handle the alternative. To simplify expressions for bounds we assume and ; both are true if there are no isolated vertices. A vertex is a \defnpredecessor of if is an arc. The \defnsize of a vertex is the number of vertices such that there is a path from to . Two vertices, two arcs, or a vertex and an arc are \defnrelated if they are on a common path, \defnmutually related if they are on a common cycle (not necessarily simple), and \defnunrelated if they are not on a common path. Relatedness is a symmetric relation. The \defnstrong components of a directed graph are the subgraphs induced by the maximal subsets of mutually related vertices.

A \defndag is a directed acyclic graph. A \defnweak topological order of a dag is a partial order of the vertices such that if is an arc, ; a \defntopological order of a dag is a total order of the vertices that is a weak topological order. A \defnweak topological numbering of a dag is a numbering of the vertices such that increasing numeric order is a weak topological order; a \defntopological numbering of a dag is a numbering of the vertices from 1 through n such that increasing numeric order is a topological order.

There has been much recent work on incremental cycle detection, topological ordering, and strong component maintenance [1, 2, 3, 8, 10, 11, 12, 15, 16, 19, 21, 7]. For a thorough discussion of this work see [8, 7]; here we discuss the heretofore best results and others related to our work. A classic result of graph theory is that a directed graph is acyclic if and only if it has a topological order [23]; a more recent generalization is that the strong components of a directed graph can be ordered topologically (so that every arc lies within a component or leads from a smaller component to a larger one) [9]. For static graphs, there are two -time algorithms to find a cycle or a topological order: repeated deletion of vertices with no predecessors [13, 14] and depth-first search [24]: the reverse postorder [25] defined by such a search is a topological order if the graph is acyclic. Depth-first search extends to find the strong components and a topological order of them in time [24]

For incremental cycle detection, topological ordering, and strong component maintenance, there are two known fastest algorithms, one suited to sparse graphs, the other suited to dense graphs. Both are due to Haeupler et al. [8, 7]. Henceforth we denote the coauthors of these papers by HKMST. The HKMST sparse algorithm takes time for arc additions; the HKMST dense algorithm takes time. Both of these algorithms use two-way search; each is a faster version of an older algorithm. These algorithms, and the older ones on which they are based, bound the total running time by counting the number of arc pairs or vertex pairs that become related as a result of arc insertions. The HKMST sparse algorithm uses a somewhat complicated dynamic list data structure [6, 4] to represent a topological order, and it uses either linear-time selection or random sampling to guide the searches. There are examples on which the algorithm takes time, so its time bound is tight for sparse graphs. The time bound of the HKMST dense algorithm is not known to be tight, but there are examples on which it takes  [7].

Our approach to incremental cycle detection and the related problems is different. We maintain a weak topological numbering and use it to facilitate cycle detection. Our algorithms pay for cycle-detecting searches by increasing the numbers of appropriate vertices; a bound on the numbers gives a bound on the running time. One insight is that the size function is a weak topological numbering. Unfortunately, maintaining this function as arcs are inserted seems to be expensive. But we are able to maintain in time a weak topological numbering that is a lower bound on size. This gives an incremental cycle detection algorithm with the same running time, substantially improving the time bound of the HKMST dense algorithm. Our algorithm uses one-way rather than two-way search. For sparse graphs, we use a two-part numbering scheme. The first part is a scaled lower bound on size, and the second part breaks ties. This idea yields an algorithm with a running time of . Our algorithm is substantially simpler than the HKMST sparse algorithm and asymptotically faster on sufficiently dense graphs. The algorithm appeared previously in [5], but the other algorithm is new to this paper.

The remainder of our paper consists of four sections. \secrefsparse describes the two versions of our cycle-detection algorithm for sparse graphs. \secrefdense describes our cycle-detection algorithm for dense graphs. \secrefextensions describes several simple extensions of the algorithms. \secrefcomponents extends the algorithms to maintain the strong components of the graph as arcs are inserted instead of stopping as soon as a cycle exists. The extensions in \secreftwoextensionscomponents preserve the asymptotic time bounds of the algorithms. \secrefconc contains concluding remarks.

\secput

sparseA Two-Way-Search Algorithm for Sparse Graphs

Our algorithm for sparse graphs uses two-way search. Unlike the entirely symmetric forward and backward searches in the HKMST sparse algorithm, the two searches in our algorithm have different functions. Also unlike the HKMST sparse algorithm, our algorithm avoids the use of a dynamic list data structure, and it does not use selection or random sampling: all of its data structures are simple, as is the algorithm itself.

We use a two-part numbering scheme whose lexicographic order is topological. Specifically, we partition the vertices into levels. We maintain a weak topological numbering of the levels; within each level, we give the vertices indices ordered topologically. Each backward search proceeds entirely within a level. If the search takes too long, we stop it and increase the level of a vertex. This bounds the backward search time. Each forward search traverses only arcs that lead to a lower level, and it increases the level of each vertex visited. An overall bound on such increases gives a bound on the time of all the forward searches. If the backward and forward searches do not detect a cycle, we update vertex indices to restore topological order. To facilitate this, we make the searches depth-first.

Each vertex has a \defnlevel and an \defnindex . Indices are distinct. While the graph remains acyclic, lexicographic order on level and index is a topological order. That is, if is an arc, . Levels are positive integers and indices are negative integers. We make indices negative because newly assigned indices must be smaller than old ones. An alternative is to maintain the negatives of the indices and reverse the sense of all index comparisons. Initially, each vertex has and an integer between and inclusive, distinct for each vertex.

In addition to levels and indices, we maintain a variable equal to the smallest index assigned so far. To represent the graph, we maintain for each vertex the set of outgoing arcs and the set of incoming arcs such that . Initially and all incident arc sets are empty. Each backward search marks the vertices it visits. Initially all vertices are unmarked. To bound backward searches, we count arc traversals. Let . Recall that we denote a list with square brackets “[ ]” around its elements and list concatenation with “&.”

The algorithm for inserting a new arc consists of the following steps:

Step 1 (test order): If go to Step 5 (lexicographic order remains topological).

Step 2 (search backward): Let , , and , where denotes the assignment operator. Do , where mutually recursive procedures \procBvisit and \procBtraverse are defined as follows:

{codebox}\Procname

\limark \li\For \li\Do \End\li {codebox} \Procname \li\If \li\Thenstop the algorithm and report the detection of a cycle. \End\li \li\If \li\Then(The search ends, having traversed at least arcs without reaching .) \li \li \li \liunmark all marked vertices \ligo to Step 3 (aborting the backward search) \End\li\If is unmarked \li\Then \End

If the search ends without detecting a cycle or traversing at least arcs, test whether . If so, go to Step 4; if not, let and .

Step 3 (forward search): Do , where mutually recursive procedures \procFvisit and \procFtraverse are defined as follows:

{codebox}\Procname

\li\For \li\Do \End\li

{codebox}\Procname

\li\If or is in \li\Thenstop the algorithm and report the detection of a cycle \End\li\If \li\Then \li \li \End\li\CommentNow, \li\If \li\Thenadd to \End

Step 4 (re-index): Let . While is nonempty, let , delete the last vertex on , and let .

Step 5 (insert arc): Add to . If , add to .

{theorem}

If a new arc creates a cycle, the insertion algorithm stops and reports a cycle. If not, lexicographic order on level and index is a topological order.\thmlabelsparsecorrect {proof} By inspection, the algorithm correctly maintains the incident arc sets and the value of . We prove the theorem by induction on the number of arc insertions. Initially, lexicographic order on level and index is a topological order since there are no arcs: any total order is topological. Suppose the lexicographic order is topological just before the insertion of an arc . If before the insertion, then lexicographic order on level and index remains topological. Thus assume .

Clearly, if the algorithm stops and reports a cycle, there is one. Suppose the insertion of creates a cycle. Such a cycle consists of the arc and a pre-existing path from to , along which levels are nondecreasing before he insertion. If and have the same level, then all vertices on the path have the same level, and either the backward search will traverse the entire path and report a cycle, or it will report a different cycle, or it will stop, will increase in level, and the algorithm will proceed to Step 3. If has larger level than , will increase in level in Step 2, and the algorithm will do Step 3. Suppose the algorithm does Step 3. At the beginning of this step, vertex has maximum level on the cycle, and is the set of vertices from which is reachable by a path all of whose vertices have level . (Either , in which case is in , or , and .) Every vertex on the cycle that is not and not in must have level less than , and the forward search from will eventually visit each such vertex, traversing the cycle forward, until traversing an arc with or in and reporting a cycle. We conclude that the algorithm reports a cycle if and only if an arc insertion creates one.

Suppose the insertion of does not create a cycle. After the backward search stops, contains the vertices from which is reachable by a path all of whose vertices have level . Also, if is an arc with not in , but in , . Step 3 increases to the level of every vertex in . After the re-indexing in Step 4, consider any arc . If neither nor is in , , since this was true before the insertion. If both and are in , because is in topological order. If is in but is not in , then . If is in but is not in , then since increased in level but did not, and before the arc insertion. If is in but is not in , then and after the insertion. We conclude that after the insertion, lexicographic order on level and index is topological

{lemma}

The algorithm assigns no index less than . \lemlabelminindex {proof} All initial indices are at least . Each arc insertion decreases the minimum index by at most , so after insertions the minimum index is at least .

{lemma}

No vertex level exceeds . \lemlabelarcmaxlevel {proof} Fix a topological order just before the last arc insertion. Let be a level assigned before the last arc insertion, and let be the lowest vertex in the fixed topological order assigned level . For to be assigned level , the insertion of an arc must cause a backward search from that traverses at least arcs both ends of which are on level . All the ends of these arcs must still be on level just before the last insertion. Thus these sets of arcs are distinct for each , as are their sets of ends. Since there are only arcs, there are most distinct values of . Also, for each there must be at least distinct arc ends, since there are no loops or multiple arcs. Since there are only vertices, there are at most distinct values of . It follows that no vertex level exceeds , which gives the lemma.

{theorem}

The insertion algorithm takes time for arc insertions. \thmlabelsparsetime {proof} By \lemreftwominindexarcmaxlevel, all levels and indices are polynomial in , so assignments and comparisons of levels and indices take time. Each backward search takes time. The time spent adding and removing arcs from incidence sets is per arc added or removed. An arc can be added or removed only when it is inserted into the graph or when the level of one of its ends increases. By \lemrefarcmaxlevel, this can happen at most times per arc. The time for a forward search is plus per arc such that increases in level as the result of the arc insertion that triggers the search. By \lemrefarcmaxlevel, this happens times per arc.

The space needed by the algorithm is .

{theorem}

For any and with , there exists a sequence of arc insertions causing the algorithm to run in total time.

{proof}

Assume without loss of generality that and is sufficiently large. Let the vertices be through , numbered in the initial topological order. We first add arcs consistent with the initial order (so that no reordering takes place) to construct a number of cliques of consecutive vertices. An \defn-clique of vertices through is formed by adding arc for such that . An -clique consists of vertices and arcs.

Let . Construct an -clique of the first vertices. This is the \defnmain clique. The main clique contains at most vertices and at most arcs. Let . Starting with vertex , construct -cliques on disjoint sets of consecutive vertices, until running out of vertices or until arcs have been added, including those added to make the main clique. Each of the -cliques is an \defnanchor clique. The number of arcs in each anchor clique is and at least . Number the anchor cliques from though in increasing topological order. Then . So far there has been no vertex reordering, and all vertices have level 1.

Next, for from through in decreasing order, add arcs from the last vertex of anchor clique to each vertex of anchor clique . Add these arcs in decreasing topological order with respect to the end of the arc that is in anchor clique . There are at most such arc additions. Each addition of an arc from the last vertex of anchor clique to a vertex w in anchor clique triggers a backward search that traverses at least arcs and causes the level of to increase from to . Each forward search visits only a single vertex. Once all arcs from anchor clique are added, all vertices in anchor clique have level . Addition of the arcs from the last vertex of anchor clique to the vertices in anchor clique moves all vertices in anchor clique to level . After all the arcs between anchor cliques are added, every vertex in anchor clique is on level . The number of arcs added to obtain these level increases is at most .

Finally, for each anchor clique from through in decreasing order, add an arc from its first vertex in topological order to the first vertex in the main clique. There are at most such arc additions. Each addition triggers a backward search that visits only one vertex, followed by a forward search that traverses all the arcs in the main clique and increases the level of all vertices in the main clique by one. These forward searches do arc traversals altogether. At most arcs are added during the entire construction.

\secput

denseA One-Way-Search Algorithm for Dense Graphs

The two-way-search algorithm becomes less and less efficient as the graph density increases; for sufficiently dense graphs, one-way-search is better. In this section we present a one-way search algorithm that takes time for all arc insertions. The algorithm maintains for each vertex a level that is a weak topological numbering satisfying . The algorithm pays for its searches by increasing vertex levels, using the following lemma to maintain for all .

{lemma}

In an acyclic graph, if a vertex has predecessors, each of size at least , then . \lemlabelsize {proof} Order the vertices of the graph in topological order and let be the smallest predecessor of . Then . Here “” counts and the predecessors of other than .

The algorithm uses \lemrefsize on a hierarchy of scales. For each vertex , in addition to a level , it maintains a bound and a count for each integer , where is the base-2 logarithm. Initially for all , and for all and . To represent the graph, for each vertex the algorithm stores the set of outgoing arcs in a heap (priority queue) , each arc having a \defnpriority that is at most . (This priority is either or a previous value of .) Initially all such heaps are empty.

The arc insertion algorithm maintains a set of arcs to be traversed, initially empty. To insert an arc , add to and repeat the following step until a cycle is detected or is empty:

Traversal Step: {codebox} \lidelete some arc from \li\If \li\Thenstop the algorithm and report a cycle \End\li\If \li\Then \li\Else\Comment \li \li \li\If \li\Then \li \li. \End\End\lidelete from every arc with priority at most and add these arcs to . \liadd to with priority .

In a traversal step, an arc that is deleted from may have , because may have increased since was last inserted into . Subsequent traversal of such an arc may not increase k(z). It is to pay for such traversals that we need the mechanism of bounds and counts.

We implement each heap as an array of buckets indexed from through , with bucket containing the arcs with priority . We also maintain the smallest index of a nonempty bucket in the heap. This index never decreases, so the total time to increment it over all deletions from the heap is . The time to insert an arc into a heap is . The time to delete a set of arcs from a bucket is per arc deleted. The time for heap operations is thus per arc traversal plus per heap. Since there are heaps, this time totals per arc traversal plus .

To analyze the algorithm, we begin by bounding the total number of arc traversals, thereby showing that the algorithm terminates. Then we prove its correctness. Finally, we fill in one detail of the implementation and bound the running time.

{lemma}

While the graph remains acyclic, the insertion algorithm maintains for every vertex . \lemlabeldensemaxlevel {proof} The proof is by induction on the number of arc insertions. The inequality holds initially. Suppose it holds just before the insertion of an arc that does not create a cycle. Consider a traversal step during the insertion that deletes from and increases . If increases to , , maintaining the inequality for . The more interesting case is when and k(y) increases to . Each of the increases to since it was last zero corresponds to the traversal of an arc . When was last zero, . Since cannot decrease, when this traversal of occurs, since at this time . We consider two cases. If there were at least traversals of distinct arcs since was last zero, then by \lemrefsize, and the increase in maintains the inequality for . If not, by the pigeonhole principle there were at least three traversals of a single arc since was last zero. When each traversal happens, , but each of the second and third traversals cannot happen until increases to at least the value of at the previous traversal. This implies that when the third traversal happens, , so will not in fact increase as a result of this traversal.

{lemma}

If a new arc creates a cycle, the insertion algorithm maintains , where sizes are before the addition of . \lemlabeldensecycle {proof} Before the addition of , for every vertex , by \lemrefdensemaxlevel. Traversal of the arc can increase by at most , so the desired inequality holds after this traversal. Every subsequent traversal is of an arc other than : to traverse , an arc into must be traversed, which results in reporting of a cycle. Thus the subsequent traversals are of arcs in the acyclic graph before the addition of . The proof of \lemrefdensemaxlevel extends to prove that these traversals maintain the desired inequality: \lemrefsize holds if the size function is replaced by the size plus any constant, in particular by the size plus .

{lemma}

The total number of arc traversals over arc additions is . \lemlabeldensetraversals {proof} By \lemreftwodensemaxleveldensecycle, every label , and hence every bound , remains below . Every arc traversal increases a vertex level or increases a count. The number of level increases is . Consider a count . Each time is reset to zero from , increases by at least . Since , the total amount by which can decrease as a result of being reset is at most . Since starts at zero and cannot exceed , the total number of times increases is at most . Summing over all counts for all vertices gives a bound of on the number of count increases and hence on the number of arc traversals.

{theorem}

If the insertion of an arc creates a cycle, the insertion algorithm stops and reports a cycle. If not, the insertion algorithm maintains the invariant that is a weak topological numbering. \thmlabeldensecorrect {proof} By \lemrefdensetraversals the algorithm terminates. A straightforward induction shows that every arc traversed by the insertion algorithm is such that is reachable from , so if the algorithm stops and reports a cycle, there is one. Suppose the insertion of creates a cycle. Before the insertion of , is a weak topological numbering, so the path from to existing before the addition of has vertices in strictly increasing order. Thus has the largest level on the path. A straightforward induction shows that the algorithm will eventually traverse every arc on the path and report a cycle, unless it reports another cycle first.

Suppose addition of an arc does not create a cycle. Before the addition, is a weak topological numbering. The algorithm maintains the invariant that every arc such that is either on or is the arc being processed. Thus, once is empty, is a weak topological numbering.

{theorem}

The algorithm runs in total time. \thmlabeldensetime {proof} The running time is per arc traversal plus . This is by \lemrefdensetraversals.

The space needed by the algorithm is for the labels, bounds, and counts, and for the n heaps. Storing the heaps in hash tables reduces their total space to but makes the algorithm randomized. By using a two-level data structure [27] to store each heap, the space for the heaps can be reduced to without using randomization. This bound is if ; if not, the sparse algorithm of Section 2 is faster.

The following theorem states that our analysis for this algorithm is tight.

{theorem}

For any sufficiently large , there exists a sequence of arc insertions that causes our algorithm to do arc traversals. {proof} Without loss of generality, suppose , where is a power of . The graph we construct consists of three categories of vertices: (1) vertices , (2) sets of vertices with (so ), and (3) a set of vertices with . Initially there are no arcs in the graph, and all levels are .

First, add arcs in order for . After these arc additions, . These levels are invariant over the remainder of the arc insertions — we use these vertices as anchors to increase the levels of all the other vertices. In fact, the only time the level of any other vertex will increase is when adding an arc .

The arc insertions proceed in phases ranging from to . In phase , first insert arc for all , thereby increasing to . Next, consider each for which there exists a constant such that , i.e., is a sufficiently large multiple of . There are two cases here, described in more detail shortly. If , insert arcs from to , not causing a level increase to . If , the algorithm traverses the arcs from to again, but without causing any level increases to . Moreover, the only time any or changes, for , is when the algorithm traverses an arc from to .

Case 1 (add arcs from to ): If for some , add arcs for all , causing to increase to . Also add arcs for all and . Observe that before these arc additions . Moreover, and . For each , when the last arc insertion occurs, increases to . We have, however, that , and hence does not increase. The counter is subsequently reset to and . Finally, the priority of each of these arcs is updated to in .

Case 2 (follow arcs from to ): Otherwise, , for . Since , the arcs already exist. Before this step, we have , for each . Moreover, we have . Insert arcs , for all . Such an arc insertion causes to increase to the next multiple of . After the update, we have equal to the priority of each arc in , and hence the algorithm traverses each of the outgoing arcs. Moreover, , and hence the counter is affected. For each , the counter again reaches . Since , the level of again does not increase. The counter is subsequently reset to , each , and the priority of each of the arcs is set to in .

In both cases, whenever the phase number is a large enough multiple of , the algorithm traverses all arcs such that and . Consider a fixed . There are such arcs. Summing over all phases during which the phase number is a large enough multiple of , there are arc traversals from vertices in to vertices in . Summing over all values of yields a total of arc traversals.

The proof extends to give a slightly more general result: for any , there is a sequence of arc insertions causing the algorithm to do arc traversals. To prove this, omit from the proof of \thmrefdense-lower the sets with . The generalization implies that arcs are enough to make the algorithm take time, and arcs, for any constant , are enough to make the algorithm take time.

\secput

extensionsSimple Extensions

In this section we extend our sparse and dense algorithms to provide some additional capabilities possessed by previous algorithms. All the extensions are simple and preserve the asymptotic time bounds of the unextended algorithms. Our first extension eliminates ties in the vertex numbering maintained by the dense algorithm presented in \secrefdense. We break ties by giving each vertex a distinct index as in the sparse algorithm and ordering the vertices lexicographically by level and index. The indices can be arbitrary, as long as they are distinct within each level: we can use fixed indices, or we can assign new indices when vertices change level.

Assigning new indices is useful in our second extension, which explicitly maintains a doubly-linked list of the vertices in lexicographic order by level and index, and hence in a topological order. We maintain a pointer to the first vertex on the list. We also maintain for each non-empty level a pointer to the last vertex on the level. We store these pointers in an array indexed by level. When a vertex increases in level, we delete it from its current list position and re-insert it after the last vertex on its old level, unless it was the last vertex on its old level, in which case its position in the list does not change. This takes O(1) time, including all needed pointer updates. In the sparse algorithm, when moving a group of vertices whose levels change as a result of an arc insertion, we move them in decreasing order by new index. In the dense algorithm, we can move such a group of vertices in arbitrary order, but we then assign each vertex moved a new index that is less than those of vertices previously on the level. As in the sparse algorithm, we can do this by maintaining the smallest index and counting down.

Our third extension explicitly returns a cycle when one is discovered, rather than just reporting that one exists. We augment each search to grow a spanning tree represented by parent pointers as each search proceeds. In the sparse algorithm, the backward search generates an in-tree rooted at containing all visited vertices; the forward search generates an out-tree rooted at containing all vertices whose level increases. If the backward search causes to increase to and to become empty, the forward search may visit vertices previously visited by the backward search. Each such vertex acquires a new parent when the forward search visits it for the first time. When the algorithm stops and reports a cycle, a cycle can be obtained explicitly by following parent pointers. Specifically, if the backward search traverses an arc , following parent pointers from gives a path from to , which forms a cycle with and . If the forward search traverses an arc with or in , traversing parent pointers from and from gives a path from to and a path from to , which form a cycle with and . In the dense algorithm, there is only one tree, an out-tree rooted at , containing and all vertices whose level increases. Vertex has one child, . If the search traverses an arc , following parent pointers from gives a path from through to , which forms a cycle with .

Our fourth extension is to handle vertex insertions and to allow and to be unknown in advance. In the sparse algorithm, we insert a vertex by giving a level of 1, decrementing , and giving an index equal to . We also maintain a running count of and . Each time or doubles, we recompute , but we replace only if it doubles. It is straightforward to verify that \thmrefsparsetime remains true. In the dense algorithm, we insert a vertex by giving it a level of 1. We also maintain a running count of . Each time increases, we add a corresponding new set of bounds and counts. \thmrefdensetime remains true. We can combine this extension with any of the other extensions in this section, and with the extension described in the next section.

\secput

componentsMaintenance of Strong Components

A less straightforward extension of our algorithms is to the maintenance of strong components. This has been done for some of the earlier algorithms by previous authors. Pearce [17] and Pearce and Kelly [18] sketched how to extend their incremental topological ordering algorithm and that of Marchetti-Spaccamela et al. [16] to maintain strong components; HKMST showed in detail how to extend their algorithms. Our strong component extensions follow the approach of HKMST.

Our extended algorithms store the vertex sets of the strong components in a disjoint set data structure [26]. Such a data structure represents a partition of a set into disjoint subsets. Each subset has a \defncanonical element that names the set. Initially all subsets are singletons; the unique element in each subset is its canonical element. The data structure supports two operations:

: Return the canonical element of the set containing element .

: Form the union of the sets whose canonical elements are and , with the union having canonical element . This operation destroys the old sets containing and .

We store each set as a tree whose nodes are the elements of the set, with each element having a pointer to its parent. If we do \procFinds using path compression, and we do \procLinking by rank or size, then the total time for any number of \procFind and \procLink operations on a partition of elements is plus per operation [26]. In our application the number of set operations is per arc examined, so the time for the set operations does not increase the asymptotic time bound.

0.1 An Extension for Sparse Graphs

Our extension of the sparse algorithm maintains for each component a level, an index, a set of arcs such that is in the component, and a set of arcs such that is in the component and the components containing and are on the same level. We store the level, index, and incident arc sets with the canonical vertex of the component. Initially every vertex is in its own component, all components are on level 1, the components have distinct indices between and , inclusive, all the incident arc lists are empty, and \idindex, the smallest index, is . Each search generates a spanning tree, represented by parent pointers: if is a canonical vertex, is its parent; if is a root, . Initially all parents are \constnull. The algorithm marks canonical vertices it finds to be in a new component. Initially all vertices are unmarked.

The algorithm adds a new arc to the appropriate incident arc lists only if its ends are in different components after its insertion. The backward searches delete arcs whose ends are in the same component, as well as the second and subsequent arcs between the same pair of components. To facilitate the latter deletions, it uses a bit matrix indexed by vertex. Initially all entries of are zero.

The algorithm for inserting a new arc consists of the following steps:

Step 1 (test order): Let and . If , go to Step 6.

Step 2 (search backward): Let ,, , , and . Do , where mutually recursive procedures \procEBvisit and \procEBtraverse are defined as follows:

: For do . Let .

: If , or , delete from and from . Otherwise, do the following. Let and . If , let , let , let , unmark any canonical vertices marked as being in a new component, make all parents null except that of , reset to zero, and go to Step 3. If , mark as being in a new component. If is marked, follow parent pointers from , marking every canonical vertex reached (including ) as being in a new component, until marking or reaching a marked vertex. If , let and do .

If the search ends before traversing at least arcs, test whether . If so, go to Step 4; if not, let and .

Step 3 (search forward): Do , where mutually recursive procedures \procEFvisit and \procEFtraverse are defined as follows:

: For in do . Let .

: If , delete from . If or is on , follow parent pointers from , marking each canonical vertex reached (including ), until marking or reaching a marked vertex. If is marked, follow parent pointers from , marking each canonical vertex reached (including ), until marking or reaching a marked vertex. If , let , let , let , and do . If , add to .

Step 4 (form component): Let . If is marked, combine the old components containing the marked vertices into a single new component with canonical vertex by uniting the incoming and outgoing arc sets of the marked vertices, and uniting the vertex sets of the components using \procUnite. Delete from all marked vertices. Unmark all marked vertices.

Step 5 (re-index): While is non-empty, let , delete the last vertex on , and let .

Step 6 (add arc): If , add to and, if ), to .

In the proofs to follow we denote levels and indices just before and just after the insertion of an arc by unprimed and primed values, respectively.

{theorem}

The extended sparse algorithm is correct. That is, it correctly maintains the strong components, all the data structures, and the following invariant on the levels and indices: if is an arc, either or . {proof} The proof is by induction on the number of arc insertions. Initially all the data structures are correct. It is straightforward to verify that the algorithm correctly maintains them, assuming that it correctly maintains the strong components and the desired invariant on levels and indices. Suppose the strong components are correct and the invariant holds before the insertion of an arc . If this insertion does not create a new component, then the algorithm does the same thing as the unextended algorithm, except that it operates on components instead of vertices. Thus after the insertion the components are correct and the invariant holds.

Suppose on the other hand that the insertion of creates a new component. Until a vertex is marked, the algorithm does the same thing as the unextended algorithm, except that it operates on components instead of vertices. Thus it will mark at least one vertex. We consider three cases: ; ; .

If , then , the insertion changes no levels, and the backward search finishes without traversing at least arcs. An old canonical vertex is in the new component if and only it is on a simple path from to . All components along such a path must have level . An induction shows that the backward search marks a canonical vertex if and only if it is on such a path. After Step 2, contains the canonical vertices on level from which there is a path to avoiding the old component containing . The algorithm skips the forward search. It correctly forms the new component in Step 4 and deletes from all old canonical vertices in the new component except z. This includes . The canonical vertices remaining in at the end of Step 4 are not reachable from . It follows that Step 5 restores the invariant that if is an arc, or .

A similar argument applies if . In this case in Step 3. At the end of Step 3, contains all old canonical vertices reachable from by a path through components whose old levels are at most . This includes . An old canonical vertex is in the new component if and only it is on a simple path from to . All components along such a path must have old level at most . An induction shows that the forward search marks a canonical vertex if and only if it is on such a path. It follows that the algorithm correctly forms the new component in Step 4 and restores the invariant on levels and indices in Step 5.

The remaining case, , is the most interesting: the backward search runs out of arcs to traverse, and there is a forward search. After Step 2, contains all old canonical vertices from which is reachable by a path through components of old level . After Step 3, contains all canonical vertices reachable from by a path through components with old levels less than . Thus and are disjoint. An old canonical vertex is in the new component if and only if it is on a simple path from to . Such a path passes through components in non-increasing order by old level. These components have canonical vertices in or , with those in first. An induction shows that Steps 2 and 3 mark a canonical vertex if and only if it is on such a path. It follows that the algorithm correctly forms the new component in Step 4 and restores the invariant on levels and indices in Step 5.

\lemref

minindex remains true for the extended algorithm. To bound the running time, we need to prove \lemrefarcmaxlevel for the extension. This requires some definitions. We call an arc \defnlive if and are in different strong components and \defndead otherwise. A newly inserted arc that forms a new component is dead immediately. The \defnlevel of a live arc is . The level of a dead arc is its highest level when it was live; an arc that was never live has no level. We identify each connected component with its vertex set; an arc insertion either does not change the components or combines two or more components into one. A component is \defnlive if it is a component of the current graph and \defndead otherwise. The \defnlevel of a live component is the level of its canonical vertex; the level of a dead component is its highest level when it was live. A vertex and a component are \defnrelated if there is a path that contains the vertex and a vertex in the component. The number of components, live and dead, is at most .

{lemma}

In the extended sparse algorithm, no vertex level exceeds . \lemlabelsparsecomponentlabel {proof} We claim that for any level and any level , any canonical vertex of level is related to at least arcs of level  and at least components of level . We prove the claim by induction on the number of arc insertions. The claim holds vacuously before the first insertion. Suppose it holds before the insertion of an arc . Let and before the insertion. A vertex is reachable from after the insertion if and only if it is reachable from before the insertion. The insertion increases the level only of and possibly of some vertices and components reachable from . It follows that the claim holds after the insertion for any canonical vertex not reachable from .

Consider a vertex that is reachable from and is canonical after the insertion. Since level order is topological, . For such that , is related to at least arcs of level  and components of level  before the insertion. None of these arcs or components changes level as a result of the insertion, so the claim holds after the insertion for and level . Since any arc or component of level less than that is related to is also related to , the claim holds for after the insertion if it holds for .

After the insertion, is reachable from . Also, . The claim holds for before the insertion. Let be an arc of level less than that is related to before the insertion. If is reachable from , will be dead after the insertion and hence its level will not change. Neither does its level change if is not reachable from . Arc is related to after the insertion. Consider a component of level less than that is related to before the insertion. If the component is reachable from , it is dead after the insertion of and hence does not change level; if it is not reachable from , it also does not change level. After the insertion, the component is related to . It follows that the claim holds for and any level .

One case remains: . For the level of to increase to , the backward search must traverse at least arcs of level before the insertion, each of which is related to and on level after the insertion. The ends of these arcs are in at least components of level , each of which is related to and on level after the insertion. Thus the claim holds for and level after the insertion. This completes the proof of the claim.

The claim implies that for every level other than the maximum, there are at least different arcs and different components. Since there are only arcs and at most components, the maximum level is at most . The lemma follows.

{theorem}

The extended sparse algorithm takes time for arc insertions. {proof} The proof is like the proof of \thmrefsparsetime, using \lemrefsparsecomponentlabel.

The space required by the extended algorithm is , since the bit matrix requires space (or less if bits are packed into words). If we store in a hash table, the space becomes but the algorithm becomes randomized. By using a three-level data structure [27] to store we can reduce the space to without using randomization. We obtain a simpler algorithm with a time bound of by eliminating the deletion of multiple arcs, thus avoiding the need for , and letting . If we run this simpler algorithm until , then start over with all vertices on level one and indexed in topological order and run the more-complicated algorithm with stored in a three-level data structure, we obtain a deterministic algorithm running in time and space.

0.2 An Extension for Dense Graphs

Our extension of the dense algorithm does two searches per arc addition, the first to find cycles, the second to update levels, bounds, and counts. The levels, bounds, counts, and arc heaps are of components, not vertices. We store these values with the canonical vertices of the components. Initially each vertex is in its own component, all levels are one, all bounds and counts are zero, and all heaps are empty. The algorithm deletes arcs with both ends in the same component, as well as the second and subsequent arcs between the same pair of components. As in the sparse extension, to do the latter it uses a bit matrix indexed by vertex, initially identically zero.

To insert an arc , let and . If , add to with priority . If and , do Steps 1-4 below. (If do nothing.)

Step 1 (search for cycles): Let . Mark . Repeat the following step until is empty:

Cycle Traversal: Delete some arc from and add it to . If is marked, follow parent pointers from , marking each canonical vertex reached, until reaching a previously marked vertex. If , let , let , and delete from all arcs with priority at most and add them to .

Step 2 (form component): If is marked, unite the components containing the marked canonical vertices into a single new component whose canonical vertex is . Form the new arc heap of by melding the heaps of the marked vertices, including . Unmark all marked vertices.

Step 3 (update levels, bounds, and counts): Repeat the following step until is empty:

Update Traversal: Delete some arc from . If and , proceed as follows. Let . If , increase to ; otherwise, let