Near-linear Time Algorithms for Approximate Minimum Degree Spanning Trees
Given a graph , , we wish to compute a spanning tree whose maximum vertex degree is as small as possible. Computing the exact optimal solution is known to be NP-hard, since it generalizes the Hamiltonian path problem. For the approximation version of this problem, a time algorithm that computes a spanning tree of degree at most is previously known [Fürer, Raghavachari 1994]; here denotes the optimal tree degree. In this paper we give the first near-linear time algorithm for this problem. Specifically speaking, we first propose a simple time algorithm that achieves an approximation; then we further improve this algorithm to obtain a approximation in time.
Computing minimum degree spanning trees is a fundamental problem that has inspired a long line of research. Let be an undirected graph, and we wish to compute a spanning tree of whose tree degree, or maximum vertex degree in the tree, is the smallest. Clearly this problem is NP-hard as the Hamiltonian path problem can be reduced to it, and so we could only hope for a good approximation in polynomial time. The optimal approximation of this problem was achieved in  where the authors proposed an 111 hides poly-logarithmic factors. time algorithm that computes a spanning tree of tree degree ; conventionally and denotes the optimal tree degree. However, polynomial time algorithms does not always mean efficient on large data sets, so finding approximation algorithms of almost linear time is a very popular and important topic nowadays.
1.1 Our results
The major results of this paper are two near-linear time algorithms for minimum degree spanning trees in undirected graphs. These are the first near-linear time algorithms for this problem. Formally we propose the following two theorems.
There is an 222 refers to the inverse Ackermann function. time algorithm that computes a spanning tree with tree degree .
As in many algorithms of this problem such as , this algorithm iteratively improves the spanning tree T by finding replacement edge connecting two low-degree vertices. To achieve almost linear time, we fix a degree threshold , and repeatedly search for edges connecting two vertices of tree degree such that the tree path between its two endpoints contains a vertex of tree degree . We can efficiently maintain the spanning tree by the link-cut tree structure . When there are not many vertices of tree degree , we can argue a lower bound on in terms of . However, the algorithm may generate a large number of -degree vertices which undermines the lower bound on . To circumvent such difficulties, we iteratively perform this procedure on larger and larger ’s. If the number of vertices of degree becomes smaller and smaller, we can finally bound the number of -degree vertices. The crucial observation is that if a vertex which starts out as a low-degree vertex for previous now becomes -degree, lots of high-degree vertices must have lost some tree neighbours. By carefully selecting a series of threshold ’s, finally we can argue a lower bound on or decrease the degree of T by a constant factor.
For any constant , there is an algorithm that runs in time which computes a spanning tree with tree degree at most .
Theorem 2 refines Theorem 1’s approach by an augmenting path approach. In each iteration, the algorithm conducts a series of tree modifications to remove all augmenting paths of the shortest length, and so in the next iteration the shortest length of augmenting paths would increase. To facilitate our search for shortest augmenting paths, we divide the graph into layers and then look for edges that connect two different tree components on the bottom layer. If such an edge is successfully detected, then we add this edge to the tree and propagate a sequence of tree edge insertions and deletions upwards to higher layers. When no such edges can be found, we argue that every layer can yield a lower bound on which jointly proves a lower bound with a constant multiplicative error.
1.2 Related work
There is a line of works that are concerned with low-degree trees in weighted undirected graphs. In this scenario, the target low-degree that we wish to compute is constrained by two parameters: an upper bound on tree degree, an upper bound on the total weight summed over all tree edges. The problem was originally formulated in . Two subsequent papers [10, 11] proposed polynomial time algorithms that compute a tree with cost and degree , . The cost was improved from to in  while degree upper bound becomes ; the authors also proposed an quasi-polynomial algorithm that finds a tree with cost and degree . ’s result was improved by  where for all , a spanning tree of degree and of cost at most the cost of the optimum spanning tree of maximum degree at most can be computed in polynomial time. The degree bound was later further improved from to the optimal in .
Another variant is minimum degree Steiner trees which is related to network broadcasting [13, 14, 4]. For undirected graphs, authors of  showed that the same approximation guarantee and running time can be achieved as with minimum degree spanning trees in undirected graphs, i.e., a solution of tree degree and a running time of . For the directed case,  showed that directed minimum degree Steiner trees problem cannot be approximated within unless , where is the set of terminals.
The minimum degree tree problem can also be formulated in directed graphs. This problem was first studied in  where the authors proposed a polynomial time algorithm that finds a directed spanning tree of degree at most . The approximation guarantee was improved to roughly in [12, 9] while the time complexity became . The problem becomes much easier when is acyclic, as shown in , where a directed spanning tree of degree is computable in polynomial time. The approximation was greatly advanced to in  by an LP-based polynomial time algorithm, and this problem has become more-or-less closed since then.
Logarithms are taken at base 2. Assume is a connected graph, , and we assume . During the execution of our algorithm, a spanning tree T will be maintained and our algorithm will repeatedly modify T to reduce its degree . For every , let be the tree degree of . For each pair , let be the unique tree path on T that connects and . For each , define , , and let , that is, the total degree of all vertices of degree at least .
3 An Approximation
3.1 Main algorithm
Starting from an arbitrary spanning tree T with degree , the core of the main algorithm is a near-linear time subroutine that, as long as , either reduces to or terminates with the guarantee that ; the main algorithm simply repeatedly apply this subroutine until or . This subroutine consists of two parts: (1) a low-level fast degree reduction algorithm that, given any degree threshold , modifies T to reduce the total number of high-degree vertices; (2) a high-level scheduling algorithm that selects a sequence of degree thresholds and feed them to the low-level degree reduction algorithm as inputs. For every , let us define a sequence of degree thresholds:
Clearly ’s are increasing as ,
The last two inequality holds as and .
The high-level scheduling algorithm (2) is described in Scheduling shown in Algorithm 1. If it returns false, an upper bound would be established; otherwise when it returns true, it means would be reduced to . The low-level degree reduction algorithm (1) is described in FastDegreeReduction in Algorithm 2. The rough idea is that we repeatedly looks for edges that connect two vertices of tree degree from different components of and add these edges to T, while at the same time we delete some edges incident on so T stays a tree. In order to implement this idea in near-linear time, we have to neglect those -degree vertices that have once become -degree. A key operation of our algorithm is marking. During one execution of FastDegreeReduction, a vertex gets marked whenever its tree degree becomes and it stays so even if its degree goes smaller, and instead of searching for edges between two -degree vertices, we only care about edges between two unmarked vertices.
3.2 Implementation and running time
We specify some implementation details of FastDegreeReduction.
To efficiently implement line-4, we enumerate all edges one by one. Using the union-find data structure , we check if one of is marked or both of belong to the same component, we move on to the next edge; otherwise we execute line-5 through line-13. The total running time of this part would be .
For line-5, to efficiently retrieve a vertex given , we maintain T using the link-cut tree data structure . We set the weight of each to be , and weight of each equal to . Then can be found in amortized time by querying the maximum weight vertex on the tree path using the link-cut tree data structure. Note that such is always non-empty because belong to different connected components of . For line-6, edge updates to T can be handled using the link-cut tree as well. Since there are less than components in , the total time would be .
On line-7, merging components and can be done in time using the union-find data structure.
On line-13, when a vertex is removed from , we need to add to and possibly merge some connected components. This can be done by enumerating ’s incident tree edges and using the union-find data structure. The total cost of such operations would be .
To conclude, the overall running time complexity of FastDegreeReduction is by . The running time of Scheduling then becomes since can increase to at most .
To upper bound the running time of the main algorithm, we need the following lemma that characterize the performance of Scheduling.
If Scheduling returns true, then the degree of T has at least dropped by a constant factor of .
When Scheduling returns true, we claim that declines by a factor of after each iteration of the while-loop. In fact, on the one hand, if the condition of line-5 does not hold, i.e. , then as was previous set to and now equal to , declines by a factor of . On the other hand, if the condition of line-5 holds, then because Scheduling returns true, the condition on line-7 always fails, i.e. , and therefore when we set the value of would decrease by a factor of . Hence, the while-loop of Scheduling can iterate for at most times before becomes ; that is to say, for some . By definition,
Here we use the fact that . As , must now be smaller than . ∎
Now we can upper bound the running time of the main algorithm as stated in the following lemma.
The running time of the main algorithm is .
By Lemma 3, every invocation of Scheduling that returns true decreases by a factor of . Therefore, such kind of invocations can be at most many. Also, there can be at most instance of Scheduling that returns false because the main algorithm terminates immediately after that. Overall, the total running time of Scheduling would be . ∎
3.3 Approximation guarantee
To prove approximation guarantee, we will utilize the following lemmas.
Let be disjoint vertex subsets. A set is called “boundary” (with respect to ), if any edge incident on whose both endpoints are not simultaneously contained in any single , is incident on at least one vertex from . Then, .
For any spanning tree, there are at least edges incident on whose both endpoints are not simultaneously contained in any . Then by definition of , any one of these edges is incident on at least on vertex of , and thus by the pigeon-hole principle, there exists a whose tree degree is . ∎
For any vertex subset , the number of connected components in is at least .
Note that there are at least tree edges incident on , and so removing all of these edges would break T into components. Therefore, excluding singleton components formed by , there are components are from . ∎
Now we prove when the main algorithm terminates, . If the main algorithm terminates with , then automatically we have . Next we focus on the case when the main algorithm terminates on a false returned by Scheduling. In this case, there was an execution of Scheduling that returned on line-7. By the branching condition of line-6, we know that by the end of this execution of Scheduling.
Consider the most recent invocation of FastDegreeReduction. By the end of this invocation, let be the sequence of all different connected components spanned by . Let be the set of all marked vertices. To apply Lemma 5, we claim is a boundary set with respect to ; this is because, for any edge such that belong to different connected components of , one of must be marked since otherwise FastDegreeReduction would continue to merge and instead of terminating. Therefore, Lemma 5 immediately yields .
One last thing is to lower-bound and upper-bound .
Lower bounding .
Let and be the snapshots of and before this instance of FastDegreeReduction began. So by the algorithm we have and thus . Then clearly, the number of connected components of is at least
The first inequality holds by Lemma 6; the last two inequalities holds by and .
Upper bounding .
There are two kinds of marked vertices.
Either was marked at the beginning, or whose tree degree later got increased to at some point while the algorithm kept modifying T. The total number of such vertices is at most .
is a marked vertex and . In this case, before this instance of FastDegreeReduction began. Since is marked, increases to at some point. Every time we modify T, at least one vertex in loses one degree and at most two unmarked vertices get one degree separately. So for a vertex to be marked, the vertices in loses at least degrees on average. As each vertex will be removed from after it loses at most tree degree, the total number of such vertex can be at most:
Summing up (1) and (2), we have
or equivalently, .
4 A Approximation
In this section we prove Theorem 2. To obtain an improved approximation of , the rough idea is that we refine the fast degree reduction algorithm in the previous section using an augmenting path technique.
Let be a fixed parameter. The basic framework stays the same as in the previous section. One difference is that the new main algorithm consists of two phases. In the large-step phase, as long as , we repeatedly apply a near-linear time algorithm LargeStepScheduling that either reduces to or terminates with . In the small-step phase we need to deal with the situation where ; in this case we repeatedly run a weaker near-linear time algorithm SmallStepScheduling that either reduces by or provides evidence that .
Both algorithms LargeStepScheduling and SmallStepScheduling rely on a building block algorithm AugPathDegRed; similar to Scheduling , both scheduling algorithms run a while-loop and repeatedly feed inputs to AugPathDegRed. Algorithm AugPathDegRed efficiently reduces the total number of vertices of high tree degree using an augmenting path technique, which is a significant improvement over FastDegreeReduction.
For the rest of this section, we first propose and analyse the building block algorithm AugPathDegRed which underlies the core of our main algorithm. After that we specify how the two phases large-step phase and small-step phase work. Finally, we prove Theorem 2.
4.1 Degree reduction via augmenting paths
4.1.1 Algorithm description
Let be a fixed threshold. This algorithm is, in some way, an extension of the previous algorithm FastDegreeReduction. As before, due to concerns of efficiency, a vertex gets marked if its tree degree is , and it stays marked (throughout one execution of AugPathDegRed) even if its tree degree decreases afterwards. Previously, we only look for a non-tree edge whose inclusion could directly reduce some tree degrees of vertices in , and when such edges no longer exist the procedure terminates. In this case, AugPathDegRed would continue to explore possibilities of improving the tree structure using the idea of augmenting paths. Intuitively, an augmenting path consists of a sequence of non-tree edges that can jointly reduce tree degrees of . Formally we give its definition below.
Definition 7 (augmenting paths).
An -length augmenting path consists of a sequence of distinct non-tree edges with the following properties.
All ’s are unmarked, ; ’s are marked for and is unmarked.
Lemma 8 (tree modification).
Given an augmenting path , one can modify T such that decreases and no vertices are added to .
We modify T in an inductive way. For , as , we can take an arbitrary tree edge , and then perform an update which guarantees that T is still a spanning tree. Note that this update also preserves the property that ; this is because, when , tree update does not change the connected components of , and thus the condition stays intact.
During the process, if any becomes during the process, mark . By definition, decreases as loses a tree neighbour; plus, because all are unchanged, and no vertices are newly added to because . ∎
It is easy to notice that what FastDegreeRedection does is repeatedly looking for augmenting paths of length and then apply Lemma 8. To extend this algorithm, when we can no longer find any augmenting paths of length , we turn to search for augmenting paths of length , and so on. Generally speaking, when the currently shortest augmenting paths have length , we apply Lemma 8 to decrease the total number of shortest augmenting paths, and when no further progress of such kind can be made we argue the shortest length of augmenting paths must now increase. Finally our algorithm terminates when grows to , and then we prove a lower bound on based on the structure of T.
The algorithm for finding the shortest length of augmenting paths works as follows; actually the algorithm computes an auxiliary layering of the graph that will also help tree modification later. Initially we set . Inductively, suppose we have already computed , then we compute the forest spanned by . Here is an extra notation: , for each , let be the connected component of that contains . If there exists an edge such that both are unmarked vertices, and that , then the algorithm terminates and reports that the shortest length of augmenting paths is equal to ; otherwise, we compute to be the set of all marked vertices such that there exists an unmarked adjacent vertex with , and then continue. The above procedure is summarised as the following pseudo-code Layering shown in Algorithm 3. Note that once , the algorithm would continue to compute .
After we have invoked Layering and computed a sequence of vertex subsets which naturally divides the graph into layers, we should start to apply tree modifications of Lemma 8 to decrease the total number of shortest augmenting paths. The difficulty in searching for shortest augmenting paths is that, for a search that starts from a pair of adjacent and unmarked vertices satisfying and goes up the layers , not every route can reach the top layer because the augmentations of some previous -length augmenting paths might have already blocked the road. Therefore, a depth-first search needs to performed. To save running time, some tricks are needed: if a certain vertex has been searched and failed to lead a way upwards to , then we tag this vertex so that future depth-first searches may avoid this tagged vertex; if a certain edge has been searched before, then we tag this edge whatsoever. The following pseudo-code AugDFS shown in Algorithm 4 may be a better illustration of this algorithm; the recursive algorithm AugDFS searches for an -length augmenting path given input . Later we will prove, if AugDFS returns true, then the sequence is indeed an augmenting path.
Now we come to describe the upper-level AugPathDegRed: basically, it repeatedly apply Layering followed by several rounds of AugDFS until . Here is the pseudo-code AugPathDegRed as shown in Algorithm 5.
Before proving termination of AugPathDegRed, we first need to argue some properties of Layering. The following lemma will serve as the basis for our future lower bounds on .
Lemma 9 (the blocking property).
Throughout each iteration of the repeat-loop in AugPathDegRed, for any and any two adjacent vertices such that is unmarked and , then .
By rules of Layering, this blocking property holds right after Layering outputs them. This claim continuous to hold afterwards because tree modifications only merge components ’s and never splits any ’s. ∎
Here is an important corollary of this Lemma 9.
Throughout each iteration of the repeat-loop, for any , suppose is adjacent to an unmarked such that . Then only contains vertices from .
Suppose otherwise, then there would be a vertex , then in this case , and thus by Lemma 9 which is a contradiction as . ∎
Now we can argue correctness of AugDFS.
If AugDFS returns true, is an augmenting path.
Finally we conclude this subsection with the lemma below, from which termination of AugPathDegRed immediately follows.
Every iteration of the repeat-loop, if not the last, increases by at least one.
By the rules of Layering, it is easy to see that at the beginning when Layering outputs , the shortest length of augmenting path is equal to . So it suffices to prove that by the end of this iteration the shortest augmenting path has length .
First we need to characterize all augmenting paths using . Let the sequence be an arbitrary augmenting path. We argue , and more importantly, if , it must be . We inductively prove that for . The basis is obvious as is required by property (i) in Definition 7. Now assume for some . Then, by Corollary 10, it would not be hard to see . Now, on the one hand by Corollary 10 , and on the other hand , so . Plus, we can see from the induction that, when it must be .
For any unmarked and adjacent vertices such that , consider the instance of AugDFS with input . We make two claims.
If there is an -length augmenting path ending with , AugDFS would succeed in finding one.
If it has returned false, then there would be no -augmenting path ending with throughout the entire repeat-loop iteration.
If (1)(2) can be proved, then by the end of this repeat-loop iteration, there would be no -length augmenting paths because at such augmenting path should end with a pair of adjacent unmarked vertices. Next we come to prove (1)(2).
The depth-first search of AugDFS exactly coincides with the conditions that , except that it skips all tagged vertices and edges. Now we prove that omitting tagged vertices and edges does not miss any -length augmenting paths. For a vertex to be tagged, we must have enumerated all of its untagged edges but failed to find any augmenting paths, and therefore any future depth-first searches on would still end up in vain. For an edge to be tagged, either a further recursion AugDFS on line-9 has succeeded or failed in finding an augmenting paths; in the former case, and has been merged, and so the condition would be violated afterwards; in the latter case, we would not need to recur on anyway.
If AugDFS has once failed to find any augmenting paths starting with , then all vertices visited by this instance of AugDFS should be tagged and they would be omitted by all succeeding instances of AugDFS. Therefore would stay unchanged since then (although itself might change). Hence, image if we re-run AugDFS with , it may return false without any recursion because all vertices in are tagged.
4.1.2 Lower bound on
Suppose AugPathDegRed has terminated with . Let us see it yields lower bounds on . To apply Lemma 5, we first need to specify a sequence of disjoint vertex subsets, which is what the following definition is about.
After an instance of AugPathDegRed has been executed, for an arbitrary component , , it is called clean if all vertices in are unmarked.
For any , suppose has clean components, then a lower bound holds that .
From Lemma 14, it suffices to lower bound the total number of clean components. The next lemma describes a scenario in which must be large.
Suppose an instance of AugPathDegRed has been executed. Let , and be snapshots of , and right before this instance of AugPathDegRed started; recall that and always refer to statistics of the current T after this instance of AugPathDegRed has finished.
Assume the following three conditions:
Then, for each , the number of clean components in is more than .
By Lemma 6, the number of tree components in is at least
Let be the set of all marked vertices (i.e., vertices that are initially unmarked) by the end of AugPathDegRed. Then, the number of clean components in is at least
The argument consists of a lower bound on and an upper bound on .
Lower bound on .
By the algorithm , then we have .
For any vertex , by the time was first added to some . After that, could only decrease when we modify T by an augmenting path where for some . Since , during a tree modification, at least one vertex in loses one degree and at most vertices in lose one degree separately. As the total number of the degree loss in is , we have
From above, we get a lower bound on ,