LowCongestion Shortcuts without Embedding^{1}^{1}1This work was supported in part by KAKENHI No. 15H00852 and 16H02878 as well as NSF grants CCF1527110 ”Distributed Algorithms for Near Planar Networks” and CCF1618280 ”Coding for Distributed Computing”.
©Haeupler, Izumi, Zuzic 2016. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in Source Publication, http://dx.doi.org/10.1145/2933057.2933112 .
Abstract
Distributed optimization algorithms are frequently faced with solving subproblems on disjoint connected parts of a network. Unfortunately, the diameter of these parts can be significantly larger than the diameter of the underlying network, leading to slow running times. Recent work by [Ghaffari and Hauepler; SODA’16] showed that this phenomenon can be seen as the broad underlying reason for the pervasive lower bounds that apply to most optimization problems in the CONGEST model. On the positive side, this work also introduced lowcongestion shortcuts as an elegant solution to circumvent this problem in certain topologies of interest. Particularly, they showed that there exist good shortcuts for any planar network and more generally any bounded genus network. This directly leads to fast distributed optimization algorithms on such topologies, e.g., for MST and MinCut approximation, given that one can efficiently construct these shortcuts in a distributed manner.
Unfortunately, the shortcut construction of [Ghaffari and Hauepler; SODA’16] relies heavily on having access to a bounded genus embedding of the network. Computing such an embedding distributedly, however, is a hard problem  even for planar networks. No distributed embedding algorithm for bounded genus graphs is in sight.
In this work, we sidestep this problem by defining a slightly restricted and more structured form of shortcuts and giving a novel construction algorithm which efficiently finds a shortcut which is, up to a logarithmic factor, as good as the best shortcut that exists for a given network. This new construction algorithm directly leads to an round algorithm for solving optimization problems like MST for any topology for which good restricted shortcuts exist  without the need to compute any embedding. This includes the first efficient algorithm for bounded genus graphs.
1 Introduction
1.1 Background and Motivation
Consider the problem of finding the minimum spanning tree (MST) on a distributed network with independent processing nodes. The network is abstracted as a graph with nodes and diameter . The nodes communicate by synchronously passing bit messages to each of its direct neighbors. The goal is to design algorithms (protocols) that minimize the number of synchronous message passing rounds before the nodes collaboratively solve the optimization problem.
The messagepassing setting we just described is a model called CONGEST [21]. The MST problem can be solved in such a setting using rounds of communication [13]. Moreover, and perhaps more surprisingly, this bound was shown to be the best possible (up to polylogarithmic factors). Specifically, there are graphs in which one cannot do any better than [22, 3, 1]. While clearly no algorithm can solve any global network optimization problem faster than , the factor is harder to discern. To make matters worse, the lower bound was shown to be far reaching. It applies to a multitude of important network optimization problems including MST, minimumcut, weighted shortestpath, connectivity verification and so on [1]
While this bound precludes the existence of more efficient algorithms in the general case, it was not clear does it hold for special families of graphs. This question is especially important because any realworld application on huge networks should exploit the special structure that the network provides. The mere existence of “hard” networks for which one cannot design any fast algorithm might not be a limiting factor.
In the first result that utilizes network topology to circumvent the lower bound, Haeupler and Ghaffari designed an round distributed MST algorithm for planar graphs [7]. Note that this algorithm offers a huge advantage over older results for planar graphs with small diameters.
They achieve this by introducing an elegant abstraction for designing distributed algorithms named lowcongestion shortcuts. Their methods could in principle be used to achieve a similar result for genusbounded graphs, but their presented algorithms have a major technical obstacle: they require a surface embedding of the planar/genus bounded graph to construct the lowcongestion shortcuts. While computing a distributed embedding for planar graphs has a complex round solution [6], this remains an open problem for genusbounded graphs [7].
This paper sidesteps the issue by vastly simplifying the construction of lowcongestion shortcuts. We define a more structured version of lowcongestion shortcuts called treerestricted shortcuts and propose a simple and general distributed algorithm for finding them. The algorithm is completely oblivious to any intricacies of the underlying topology and finds universally nearoptimal treerestricted shortcuts. As a simple consequence of our construction technique we get a round algorithm for genus graphs, which is a novel result. We believe that this simplicity makes the algorithm usable even in practice.
1.2 A Brief Overview of LowCongestion Shortcuts
We now give a short introduction to the general lowcongestion shortcuts as defined in [7]. Consider the following scenario, which is a recurring theme throughout distributed approaches for many network optimization problems:
A graph is partitioned into a number of disjoint individuallyconnected parts , and we need to compute a (typically simple) function for each of the parts in isolation.
A classical example for such a scenario is the 1926 algorithm of Boruvka [20] for computing Minimum Spanning Tree (MST): starting with a trivial partition of each node being its own part, in every iteration each part computes the minimumweighted outgoing edge and merges with the part incident to this edge. After iterations, we arrive at the MST, where is the number of nodes in .
A key concern in designing a distributed version of Boruvka’s algorithm is finding good communication schemes that allow each part to collaborate with other nodes inside the same part and without interference from other parts. While a natural solution would be to allow communication only inside the same part, this could take a long time. The problem appears when the diameter of a part in isolation is much larger than the diameter of the original graph .
To overcome this issue, Ghaffari and Haeupler [7] introduced lowcongestion shortcuts: each part is given a subgraph of extra edges that it can use to more efficiently communicate within itself. More precisely, each part is associated with a shortcut subgraph and is permitted to use for communication.
To measure the quality of a shortcut, we characterize it with two quality parameters: congestion and dilation. A shortcut has congestion and dilation if {enumerate*}[label=)]
the diameter of every subgraph is at most , and
every edge is assigned to at most different subgraphs . Given a shortcut with congestion and dilation , we can solve problems such as MST and MinCut approximation in rounds [7]. Therefore, designing a distributed algorithm can be reduced to finding goodquality shortcuts.
While the pervasive lower bound clearly implies we cannot find shortcuts with on general graphs, this might not be the case on specific families of graphs. For example, planar graphs always offer congestion and dilation shortcuts, thus bypassing the lower bound[7].
1.3 Our Contribution
Roughly speaking, there are two challenges in the design of shortcutbased algorithms. Let be the target class we want to design distributed algorithms. The first challenge is to identify the (small) values of and such that has shortcuts with congestion and dilation . This is purely a graphtheoretic problem. The second challenge is to convert the existential result proved by the first challenge to the constructive result, i.e., we must design a distributed algorithm constructing efficient shortcuts for that class. This is a distinct problem of theoretical distributed computing.
A natural idea of lowering the barrier of the algorithm design is to invent a generic algorithm which finds a congestion and dilation shortcut for the best (or approximately best) and , which provides an automatic conversion of the existential result to the constructive one. Unfortunately, the known construction for planar graphs [7] is far from such a generality: as we already mentioned, It strongly depends on the distributed planar embedding algorithm [6], and thus not applicable to any other graph class. This is also a primary reason why the construction for planar graphs in [7] cannot be extended even for bounded genus graphs.
The primary contribution of this paper is to present a simple algorithm for constructing shortcuts that resolves the issue mentioned above. We introduce a more structured definition of shortcuts called treerestricted shortcuts and give a constructive algorithm that finds the nearly optimal treerestricuted shorcuts in any graph that contains them. While the new shortcut definition is a strict subset of the old definition, the authors are not familiar with any interesting (i.e. nonpathological) case where one loses power because of the restriction.
The details of our contribution are summarized as follows:

In Section 4 we introduce a new class of shortcuts, called treerestricted shortcuts, which only use edges of some fixed spanning tree . More precisely, for each part . We introduce a new quality parameter called block parameter, which is defined as an upper bound on the number of connected components of that intersect (over all ). Note that these components are subtrees of . The block parameter can be seen as a stronger version of dilation and will often be used instead. In Section 4.3 we propose deterministic algorithms for broadcast, convergecast, and leader election (for all parts in parallel) utilizing treerestricted shortcuts, which are, simpler and faster compared with the generalcase randomized algorithms shown in [7].

In Section 5 we present a generic algorithm for constructing treerestricted shortcuts. Let be a spanning tree of with depth and assume there exists a treerestricted shortcut on with congestion and block parameter . We describe an algorithm that constructs a treerestricted shortcut with congestion and block parameter in CONGEST rounds. It is also possible to run our algorithm in the environment where the system is not aware of the value of and/or with extra factor, as described in Appendix A.

An important consequence of our algorithm is to provide the first distributed algorithm constructing a good shortcut for genus graphs. Fortunately, the known result for genus graphs exhibits the existence of treerestricted shortcuts with congestion and block parameter for an arbitrary BFS tree of depth . Thus in Section 4.4 we can obtain a distributed algorithm constructing a treerestricted shortcut with congestion and block for graphs with genus at most . For bounded genus graphs (i.e. ), the algorithms based on our shortcut construction achieves the nearoptimal time complexity (up to a polylogarithmic factor). According to the very recent unpublished result that is still in preparation, a similar result is obtained for graphs with bounded pathwidth and treewidth.
2 Related Work
The complexity theoretic issues in the design of distributed graph algorithms for the CONGEST model have received much attention in the last decade, and got an extensive progress for many problems: Minimumspanning tree [5, 13, 22, 12], Maximum flow [8], Minimum Cut [9, 19], Shortest paths and Diameter [18, 4, 10, 17, 15, 16, 11], and so on. Most of those problems have round upper and lower bounds for some sort of approximation guarantee [1, 15, 9, 2, 22]. The guarantee of exact results sometimes yield a nearly linertime bound [4]. Note that almost all lower bounds above holds for small diameter graphs. Thus, in any case, the general lower bound is more expensive than the universal lower bound of rounds.
On the positive side, distributed algorithms typically use a variety of ideas. In an effort to unify them in an elegant framework, Ghaffari and Haeupler introduced lowcongestion shortcuts [7]. Specifically, their ideas can be turned into a very short and clean round MST algorithm for general graphs. Furthermore, lowcongestion shortcuts can serve as a simple explanation of the pervasive lower bound. However, the main contribution of their techniques is a round algorithm for planar graphs. To the best of our knowledge, it is the first attempt that considers a nontrivial popular graph class.
3 Preliminaries
In this section, we formally define the CONGEST model and then recap the definitions of lowcongestion shortcuts from [7].
3.1 CONGEST Model
We work in the classical CONGEST model [21]. In this setting, a network is given as a connected undirected graph with diameter . Initially, nodes only know their immediate neighbors and they collaborate to compute some global function of the graph like the MST. Communication occurs in synchronous rounds; during a round, each node can send bits to each of its neighbors (note that the nodes also know some polynomially tight bound on , otherwise sending bits does not make sense). The nodes always correctly follow the protocol and never fail. The goal is to design protocols that minimize the resource of time  the number of rounds before the nodes compute the solution.
We now precisely formalize the notion of solving a problem in this model, e.g. how is the input and output given. While the formalization is specifically given for the MST, any other problem is completely analogous. All nodes synchronously wake up in the first round and start executing some given protocol. Every node initially only knows its immediate neighbors and the weight of each of its incident edges. After a specific number of rounds, all nodes must simultaneously output {enumerate*}[label=)]
the weight of the computed MST
for each edge incident to it, a bit indicating if .
3.2 LowCongestion Shortcuts
Let be a undirected graph along with a node partition . Lowcongestion shortcuts intuitively augment each part with extra edges that may be used to communicate within a part more efficiently. With a small abuse of notation, in the following we use the symbol to indicate both the edge set and the subgraph induced by the set. As communication for part occurs on , it is natural to try to minimize the diameters of those subgraphs. Hence we define dilation as an upper bound on the diameter of any shortcut subgraph . On the other hand, assigning an edge to almost every part will lead to overcongestion on that edge. Therefore, we define another quality measure of a shortcut, congestion, as an upper bound on the number of shortcut subgraphs that contain any edge .
Definition 1.
Let be an undirected graph with vertices subdivided into disjoint and connected subsets . In other words, is connected and for . The subsets are called parts. We define a shortcut as a tuple of shortcut subgraphs , . A shortcut is characterized by the following parameters:

has congestion if each edge is used in at most different subgraphs , i.e. .

has dilation if the diameter of any subgraph is at most .
The parameters determine the efficiency of communications facilitated by the shortcut. For example, Ghaffari and Haeupler show in [7] that one can solve the Minimum Spanning Tree and MinCut problems in , given an efficient algorithm for finding shortcuts with parameters and . Note that congestion and dilation are traditional parameters that are extensively used in routing [14, 7].
4 TreeRestricted Shortcuts
In this section, we define treerestricted shortcuts: a narrower notion of shortcuts which are {enumerate*}[label=)]
simpler to work with,
often equally powerful as the general shortcuts,
offer deterministic routing schemes and, most importantly,
can be efficiently constructed on any graph that contains them. Following the definitions, we rephrase the results of [7] in our new terms, showcase an efficient deterministic routing scheme on them, and finally state our main result and show its applications.
4.1 Definition
Treerestricted shortcuts are shortcuts with the additional property that any shortcut subgraph is restricted to some spanning tree . The user of the shortcut can typically fix any tree , so a cogent choice would be the BFS tree because of its optimal depth.
Definition 2.
Let be a restricted shortcut on the graph with respect to the parts . Given a rooted spanning tree we say that a shortcut is restricted if for each i.e. every edge of is a tree edge of .
Congestion and dilation are still welldefined for treerestricted shortcuts. However, it is more convenient to use an alternative block parameter in place of dilation. The block parameter upperbounds the number of connected components of each that intersects . Note that, while and therefore are connected, by itself might not be. The intersection property ensures that we do not count components that have no vertices in .
Definition 3.
Let be a restricted shortcut on the graph with respect to the parts . Fix a part and consider the connected components of the spanning subgraph . If such a connected component intersects we call it a block component. Furthermore, we define the block parameter of to be any upper bound to the number of block components for all parts.
A block parameter implies a bound on dilation, hence the block parameter can be seen as a stronger measure of quality. Lemma 1 argues that a block parameter of implies the dilation of . The Lemma also suggests that it is often beneficial to fix to a BFS tree of , thereby having asymptotically minimal depth. In that case, the depth of is at most the diameter of the original graph, namely . For this reason throughout this paper we denote the diameter of and the depth of by the same symbol .
Lemma 1.
Let be a spanning tree with depth and let be a restricted shortcut with congestion and block parameter with respect to parts . Then the dilation of is at most .
Proof.
Fix . If we contract every block component of into a supernode and remove all other nodes, supergraph will contain supernodes and will be connected (because is connected). Hence its diameter is . Every supernode consists of a block component of diameter , so the diameter of is at most . ∎
Distributed representation of a treerestricted shortcut: Before we proceed to describe algorithms for shortcut routing and construction, we quickly specify here more precisely how a shortcut is represented distributedly, i.e., what information regarding the shortcut any node is supposed to know in order to make the various routing algorithms on top of a shortcut efficient.
Formally, we say that a restricted shortcut is computed when each node knows {enumerate*}[label=)]
depth each of its neighbors and itself
subset of incident edges that are tree edges of
all the part IDs that can use ’s parent edge as well as the depth of their respective block component root. For the sake of clarity, the described construction algorithms do not go into details about the computation of each of those properties. However, they can be easily augmented to compute them explicitly.
4.2 Shortcuts on GenusBounded and Planar Graphs
Treerestricted shortcuts are particularly useful on genusbounded (e.g. planar) graphs. In particular, we can reinterpret the lowcongestion result of Haeupler and Ghaffari [7] using our notation.
Theorem 1 (Haeupler and Ghaffari [7]).
Let be a graph with genus and diameter , and let be any tree with depth (e.g. BFS tree). There exists a restricted shortcut with congestion and block parameter .
The paper originally also provided a upper bound on the dilation of the shortcut. However, this bound can be implicitly recovered from Lemma 1 and block parameter . Note that the Theorem proves only the existence of such shortcuts. While the original paper does describe an algorithm that can in principle be used to compute them, it requires an embedding of on a surface of genus . It is an open problem to compute such an embedding efficiently in the CONGEST model [7].
4.3 Routing on TreeRestricted Shortcuts
In this section, we show how to use treerestricted shortcuts to efficiently communicate within parts. The treerestricted structure of the shortcut allows for simpler and more efficient routing methods than general shortcuts. The main reason for this is that distributed approaches for various network optimization problems often use broadcasting and convergecasting as primitives. However, such tasks can be efficiently and deterministically solved on subtrees, even when multiple (nondisjoint) subtrees have to execute the task in parallel. Lemma 2 formalizes this statement.
Lemma 2 (Routing on trees).
Let be a tree of depth . Given a family of subtrees such that any edge of is contained in at most subtrees, there is a simple deterministic algorithm that can perform a convergecast/broadcast on all of the subtrees in CONGEST rounds.
Specifically, for convergecasts, if multiple messages are scheduled over the same edge, the algorithm forwards the packet with the smallest depth of the subtree root, breaking ties with the smallest ID of the subtree.
Proof.
The convergecast and broadcasts operations are symmetric, so we will only prove the Lemma for convergecasts.
Let be a node of . We will prove that no message gets transmitted along ’s parent edge after rounds where is the height of (distance to the farthest leaf in its subtree). Note that any message that gets transmitted along ’s parent edge must belong to a subtree that contains that edge. Let be the tuple of subtrees that contain ’s parent edge, ordered by their priority (as described in the statement). In particular, we say that has priority in node . The congestion condition stipulates that .
We will prove by induction that for the message associated with will be transmitted no later than after rounds. The claim clearly holds for the leafs of . Note that {enumerate*}[label=)]
the relative priorityordering between elements is unchanged in any node of
any subtree that contains any child of will have lower priority than any subtree in .
Fix . By the time , all of the messages corresponding to will be sent by the induction hypothesis, so it is sufficient to argue that at time , has received messages corresponding to from all of its children contained in . But this is exactly the induction hypothesis as for any child , its height and the priority of is at most or . Hence will send the message corresponding to in round or before. ∎
Convergecasting/broadcasting on a tree is helpful in treerestricted shortcut routing because we can intuitively envision each shortcut subgraph as a family of subtrees (in our notation: block components). This communication within each block component will be the main buildingblock of primitives that operate on entire parts in parallel.
Theorem 2 (Routing on treerestricted shortcuts).
On a restricted shortcut with congestion and block parameter there are deterministic distributed algorithms for:

Electing a leader for each of the parts in parallel.

Convergecasting bit messages to the leader of each part in parallel.

Broadcasting a bit message from the leader of each part in parallel.
Each algorithm takes CONGEST rounds.
Proof.
All of these algorithms have a common flavor: for each part we perceive its shortcut subgraph as a supergraph of at most supernodes where each supernode is a block component. We proceed to describe each of the algorithms on the supergraph and implicitly assume that intrablock communication happens after each step of the algorithm.
Communication within block components can be done in parallel using Lemma 2: all the nodes of a block component convergecast the relevant information to the blockroot and subsequently the blockroot broadcasts the result back.
Electing a leader for each part is performed by electing a leader for each supernode (block component) and broadcasting the leader to all neighborhood supernodes for steps. Every supernode keeps the smallest leader ID ever seen as its current leader. After rounds all the supernodes have the same leader. The algorithm requires rounds as each of the broadcasting steps is followed by an intrablock communication step.
Broadcasting/convergecasting from/to the leader can be done by building a BFS tree from the leadersupernode. We can utilize the standard distributed BFS algorithm on the supergraph requiring steps. The algorithm similarly requires rounds as each of the BFS steps is followed by an intrablock communication step. ∎
We also state a simple technical lemma that will be needed for the treerestricted shortcut construction.
Lemma 3.
Given a restricted shortcut with congestion , a deterministic distributed algorithm can find all parts whose designated shortcut subgraph has at most block components. The algorithm executes in rounds.
Proof.
Similarly to the proof of Theorem 2, we consider the supergraph of each shortcut subgraph for each part. We need to find all parts whose supergraphs have at most supernodes.
Each supernode broadcasts its leader for exactly rounds and every supernode keeps the minimum ID as their current leader. Subsequently, each leader (there may be multiple ones as we have not bounded the block parameter) tries to build a BFS tree comprised of all the nodes that believe is the leader. We can detect the existence of multiple leaders as in that case each BFS tree will contain two neighboring supernodes in different BFS trees and report failure. If this is not the case (all the supernodes of a part belong to the same BFS tree), we can convergecast the number of supernodes back to the root and subsequently broadcasts their count back. ∎
Comparison with routing on general shortcuts: Ghaffari and Haeupler [7] give a method for routing on general shortcuts in rounds that is {enumerate*}[label=)]
randomized and
assumes a leader is already elected for each part. They describe a process of leader election via a complicated randomized bootstrapping process that takes rounds. We contrast those results with our current treerestricted shortcut routing where leader election is essentially no more difficult than broadcast/convergecasting and the routing is simpler and deterministic. The downside is that nontreerestricted shortcuts sometimes offer better quality guarantees and therefore better performance.
4.4 Main Result and Applications
The main contribution of the paper is to introduce a general framework for finding goodquality shortcuts in graphs where the only assurance is that they exist. In other words, no topology assumption is assumed.
Theorem 3.
Let be a graph with a spanning tree such that there exists a restricted shortcut with congestion and block parameter . There exists a distributed algorithm that finds a restricted shortcut with congestion and block parameter with high probability. The shortcut can be found in rounds.
We note that the Theorems 1 and 3 immediately give a novel result: an algorithm for constructing shortcuts on bounded genus graphs.
Corollary 1.
Given a genus graph with diameter and parts there is a distributed algorithm that computes a treerestricted shortcut with congestion and block parameter in rounds.
Next, we explain how to use treerestricted shortcuts to distributedly compute the Minimum Spanning Tree (MST) on genus graphs. Similarly to [7], we incorporate the shortcuts into the classic 1926 algorithm of Boruvka [20].
Lemma 4.
Given a genus graph with nodes and diameter , there is a distributed algorithm that computes the Minimum Spanning Tree in rounds.
For completeness we give a brief proof outline:
Proof.
Boruvka’s algorithm runs in phases. Each phase starts with a partition of the graph into connected parts and a computed MST for each part. Initially, the algorithm starts with the trivial partition in which each node is in its own part. At each phase, each part suggests a merge along the minimumweighted edge going out of . It is wellknown that all such edges belong to some MST. By computing a treerestricted shortcut for each part in rounds and using our convergecast algorithm on it in rounds we can compute the minweight outgoing edges from each part. The only slight technical difficulty that remains is to assign IDs to parts which have merged. While we can communicate efficiently within each part, using the previously computed shortcuts, many parts could chain together to form a new part. This can be avoided by restricting the merge shapes to be star graphs: each part can become a head or tail with probability and we are only allowed to merge tails to heads. The number of phases remains as every minimumweighted outgoing edge will be used for merging with probability at least , thus reducing the expected number of parts by a constant. ∎
5 Constructing Tree Restricted Shortcuts
In this section, we describe an algorithmic framework that solves the problem of finding nearoptimal treerestricted shortcuts.
5.1 Overview over the Algorithmic Framework
Our algorithm FindShortcut uses two separate subroutines:

Core: This subroutine finds a goodquality shortcut with respect to at least a constant fraction of the parts. As a prerequisite, we must compute and fix a tree with depth such there exists a restricted shortcut with congestion and block parameter . Note that we only assume its existence.
Lemma 5.
Let be a spanning tree with depth and assume there exists a restricted shortcut with congestion and block parameter . The subroutine CoreFast finds a restricted shortcut with the following properties:

The congestion of is at most with high probability.

There exists a subset of parts with size at least such that the shortcut subgraphs corresponding to parts in have block parameter .
The subroutine takes CONGEST rounds to execute. Upon completion, each node knows for each of its incident edges which parts are they assigned to in .
We divide out the exposition of the core subroutine in two versions: a deterministic and simper CoreSlow requiring rounds; and a randomized CoreFast requiring rounds. We note that the CoreFast subroutine is the only randomized building block of our framework. Therefore, we can replace it with a deterministic (albeit slower) version at a cost of an addition factor.


Verification: This subroutine is used to check which of the restricted shortcut subgraphs found by the core subroutine have sufficiently small block parameter (in particular, at most ).
Lemma 6.
Given a tree with depth and a tentative restricted shortcut with congestion , the deterministic subroutine Verification finds all parts whose designated shortcuts have at most block components. The subroutine takes CONGEST rounds to execute. Upon completion, each node knows whether its part is in the set or not.
We call the parts whose designated shortcut subgraphs have this property as good and the rest as bad. FindShortcut runs the core subroutine followed by a verification step after which parts that have been marked as good are removed. This is repeated until no more bad parts remain.
5.2 FindShortcut Algorithm
Before we dive into the FindShortcut subroutine we must fix a spanning tree . As the depth of determines the efficiency of our framework, we can choose to be a BFS tree rooted at any node of the graph . This choice ensures that the depth of is {enumerate*}[label=)]
asymptotically optimal and
bounded by the diameter of . For this reason throughout this paper we denote the diameter of and the depth of by the same symbol .
Computing a BFS tree in our distributed CONGEST model is a standard subroutine and can be computed in rounds. Henceforth we assume that a tree with depth is computed.
FindShortcut subroutine: We run the CoreFast subroutine that computes a shortcut with congestion , but possibly an unacceptably large block parameter. The next step is to run the Verification subroutine that finds all parts whose computed shortcut subgraphs have at most block components. We call those parts good and fix their computed shortcut subgraphs. The subroutine is iteratively repeated until all the parts have been marked as good. The check can be executed via a convergecast on the entire tree .
Theorem (Restated Theorem 3).
Let be a graph with a spanning tree such that there exists a restricted shortcut with congestion and block parameter . There exists a distributed algorithm that finds a restricted shortcut with congestion and block parameter with high probability. The shortcut can be found in rounds.
Proof.
Let be the set of all shortcut subgraphs that have been marked as good through the lifetime of the entire subroutine. As any shortcut subgraph in has block parameter and congestion w.h.p. , it only remains to show that the algorithm terminates.
By Lemma 5 in each iteration we find a shortcut with congestion and block parameter for at least a half of the parts that have not been marked as good, w.h.p. This implies that after iterations all the parts are marked as good. This further implies that the congestion of is as the congestion of the union of partial shortcuts is at most the sum of congestion of individual partial shortcuts.
Finally, the number of rounds is at most times the combined number of rounds of the CoreFast and Verification subroutines, namely . ∎
5.3 Warmup: An Version of the Core Subroutine
In this section, we explain a simpler and deterministic, but slower version of the core subroutine named CoreSlow that takes rounds. This is improved in the next section where we present a round version of the same subroutine.
On a high level, the subroutine takes each part and tries to assign all the ancestors of nodes in to its shortcut subgraph. This may, however, lead to large congestions on some edges. We mitigate that issue by declaring an edge unusable if more than parts try to use it. This ensures the congestion is . The process provably leads to a constant fraction parts having both small congestion and small block parameter.
Preliminaries: As standard, assume we fixed a spanning tree of depth such that has a restricted shortcut with congestion and block parameter . During the execution of the algorithm some of the edges will be marked as unusable. Furthermore, we say that an tree edge can see a node if is in the subtree of and no edge on the unique simple path between the lower endpoint of and is unusable. Analogously, an edge can see a part ID if it can see any node in .
Outline of the CoreSlow subroutine: Initially, no edge is unusable. Process the (tree) edges of in order of decreasing depth (bottom to top). An edge is assigned to all the parts such that can see some node , but only if it would be assigned to at most such parts. If this is not the case (more than shortcut subgraphs would contain ), we mark this edge as unusable and proceed without assigning any part to it.
Detailed description of the CoreSlow subroutine: Each node maintains a list of part IDs that its parent edge can see. All the ’s are initially empty. The subroutine runs in phases where in the th phase all the nodes at depth update in parallel and send to its parent. The update for a node works by first receiving for all its children . We assign the union of all received lists and the singleton part ID of (if any) to . If , we assign the parent edge of to all the parts in and transmit to its parent (potentially requiring rounds). Otherwise, if , we declare the parent edge as unusable.
A direct implementation of this would lead to a subroutine that takes rounds in the CONGEST model. Each of the levels must propagate at most part IDs to their parent nodes. However, this bottleneck can be improved by random sampling, as we show in the next section with the subroutine CoreFast.
Lemma 7.
Let be a spanning tree with depth and assume there exists a restricted shortcut with congestion and block parameter . The subroutine CoreSlow finds a restricted shortcut with the following properties:

The congestion of is at most .

There exists a subset of parts with size at least such that the shortcut subgraphs corresponding to parts in have block parameter .
The subroutine takes CONGEST rounds to execute. Upon completion, each node knows for each of its incident edges which parts are they assigned to in .
Proof.
Let be any restricted shortcut with congestion and block parameter and let be the shortcut computed by CoreSlow. We call the canonical shortcut and the computed shortcut.
By construction, the congestion of is as any edge that would be assigned to more than shortcut subgraphs is marked as unusable. Hence we proved property 1.
Let be the set of unusable edges marked by the subroutine. In this paragraph we find an upper bound for . Consider blaming a part for congesting an unusable edge when and can see , i.e. edge was not in the canonical shortcut subgraph , but was congested by part (and ultimately declared unusable). Each part can be blamed at most times because each block component can only be blamed for the first unusable edge in his tree path towards the root. Furthermore, if is unusable, it takes at least different block components (from different parts) to be blamed for congesting . Therefore .
We say that a part missed an edge when and (consequently ). Furthermore, call a part bad if it missed at least edges and good otherwise. Note that if a part is good, the block parameter of is at most . This is because each missed edge induces a new block component in (more precisely, we can identify each block component of by either an unique block component of or an unique missed edge ). Consequently, it is sufficient to prove that the subroutine finds at least good parts.
As any unusable edge is contained in at most canonical shortcut subgraphs and for a part to be bad we need at least edges to be missed, we have that the number of bad parts is at most . Hence, the subroutine finds at least good shortcuts.
The number of rounds the subroutine takes is : on each of the depths of the tree all the nodes in parallel must send the part IDs trying to use its parent edge up the tree. A node can send up to IDs, each requiring a round for its transmission. ∎
5.4 A Faster Version of the Core Subroutine
In this section, we describe a faster version of the core subroutine named CoreFast. On a high level, we lower the running time of CoreSlow by estimating the number of parts trying to use an edge by random sampling. In particular, each part becomes active with probability and we declare an edge unusable when active parts try to use that edge.
Preliminaries: In addition to the preliminaries of CoreSlow we need shared randomness between all the nodes within a part. In other words, all the nodes of the same part must have access to the same seeds for a pseudorandom generator. This can be done by sharing random bits among all the nodes of in rounds, as described in [7].
Outline of the CoreFast subroutine: Each part becomes active with probability where is sufficiently large constant. We basically do the same CoreSlow subroutine, but instead of propagating all part IDs of , we propagate only the active ones and declare an edge unusable if at least (active) part IDs want to use it. Hence by a standard Chernoff bound argument we can claim with high probability that {enumerate*}[label=)]
we never propagate more than part IDs through an edge
each unusable edge has at least part IDs trying to use that edge and
each usable (noncongested) edge has at most part IDs. After determining which edges are unusable in , CoreFast must nevertheless find the complete set of part IDs that can use each edge. This is a tree routing problem where each message (part ID) has to be routed up the tree until the first unusable edge. No message needs to travel more than edges and no edge needs to transmit more than different part IDs w.h.p. Hence this routing can be done in using Lemma 2.
Detailed description of the CoreSlow subroutine: Due to shared randomness, each part independently becomes active with probability (all the nodes within the part agree on this label). Similarly as in CoreSlow, each node maintains a list of active part IDs that its () parent edge can see. All the lists are initially empty. The subroutine runs in phases where in the th phase all the nodes at depth try to update in parallel and send to its parent. The update for a node works by first receiving for all its children . We assign the union of all received lists and the singleton part ID of (if any) to . If , we assign the parent edge of to all the parts in and transmit to its parent (requiring rounds). This finalizes the first part of the subroutine where we determine all unusable edges. It remains to forward the complete set of part IDs (and not just the sampled ones) that can use some edge to the endpoints of . This is a classic tree routing problem where no route has its length larger than and no edge intersects more than paths w.h.p. Lemma 2 provides a method to route all part IDs in at most rounds. Note that any two part IDs whose routes share an edge have the same endpoint (lowest unusable ancestor edge), so any routing priority between the messages gives the aforementioned bound w.h.p.
Lemma (Restated Lemma 5).
Let be a spanning tree with depth and assume there exists a restricted shortcut with congestion and block parameter . The subroutine CoreFast finds a restricted shortcut with the following properties:

The congestion of is at most with high probability.

There exists a subset of parts with size at least such that the shortcut subgraphs corresponding to parts in have block parameter .
The subroutine takes CONGEST rounds to execute. Upon completion, each node knows for each of its incident edges which parts are they assigned to in .
Proof.
This proof extensively utilizes methods used in the proof of Lemma 7. For completeness, we redefine all of the used terminologies and reprove all of the intermediate results.
Let be any restricted shortcut with congestion and block parameter and let be the shortcut computed by CoreFast. We call the canonical shortcut and the computed shortcut.
As , a standard Chernoff bound argument demonstrates that any edge that is not marked as unusable can see at most different part IDs w.h.p. Hence, the congestion of is w.h.p.
Let be the set of unusable edges marked by the subroutine. In this paragraph we find an upper bound for . Consider blaming a part for congesting an unusable edge when and can see , i.e. edge was not in the canonical shortcut subgraph , but was congested by part (and ultimately declared unusable). We can similarly argue via a Chernoff bound that each unusable edge can see at least parts, hence we blame at least parts for congesting . Each part can be blamed at most times because each block component can only be blamed for the first unusable edge in his tree path towards the root. Furthermore, if is unusable, it takes at least different block components (from different parts) to be blamed for congesting . Therefore .
We say that a part missed an edge when and (consequently ). Furthermore, call a part bad if it missed at least edges and good otherwise. Note that if a part is good, the block parameter of is at most . This is because each missed edge induces a new block component in (more precisely, we can identify each block component of by either an unique block component of or an unique missed edge ). Consequently, it is sufficient to prove that the subroutine finds at least good parts.
As any unusable edge is contained in at most canonical shortcut subgraphs and for a part to be bad we need at least edges to be missed, we have that the number of bad parts is at most . Hence, the subroutine finds at least good shortcuts.
The number of rounds the subroutine takes is : on each of the depths of the tree all the nodes in parallel must send the active part IDs that its parent edge can see. If an edge is not unusable, a Chernoff bound proves that at most active part IDs can be seen from , hence the number of rounds for determining unusable edges is w.h.p.
Propagating the part IDs upwards along described in Lemma 2 takes rounds, bringing the total number of rounds to . ∎
5.5 Verification Subroutine
In this section, we describe the Verification subroutine. Given a treerestricted shortcut with congestion and possibly unbounded block parameter, it inspects each of the shortcut subgraphs in parallel and marks the ones that have at most block components.
The subroutine runs precisely the algorithm described in Lemma 3 which we restate here.
Lemma (Restated Lemma 3).
Given a restricted shortcut with congestion , a deterministic distributed algorithm can find all parts whose designated shortcut subgraph has at most block components. The algorithm executes in rounds.
The Lemma provides a direct method to implement the formal requirements of the Verification subroutine which we restate here for clarity.
Lemma (Restated Lemma 6).
Given a tree with depth and a tentative restricted shortcut with congestion , the deterministic subroutine Verification finds all parts whose designated shortcuts have at most block components. The subroutine takes CONGEST rounds to execute. Upon completion, each node knows whether its part is in the set or not.
References
 [1] A. Das Sarma, S. Holzer, L. Kor, A. Korman, D. Nanongkai, G. Pandurangan, D. Peleg, and R. Wattenhofer. Distributed verification and hardness of distributed approximation. In Proc. of the Symp. on Theory of Comp. (STOC), pages 363–372, 2011.
 [2] M. Elkin. Unconditional lower bounds on the timeapproximation tradeoffs for the distributed minimum spanning tree problem. In Proc. of the Symp. on Theory of Comp. (STOC), pages 331–340, 2004.
 [3] M. Elkin. An unconditional lower bound on the timeapproximation tradeoff for the distributed minimum spanning tree problem. SIAM Journal on Computing, 36(2):433–456, 2006.
 [4] S. Frischknecht, S. Holzer, and R. Wattenhofer. Networks cannot compute their diameter in sublinear time. In Proc. of ACMSIAM Symp. on Disc. Alg. (SODA), pages 1150–1162, 2012.
 [5] J. Garay, S. Kutten, and D. Peleg. A sublinear time distributed algorithm for minimumweight spanning trees. In Proc. of the Symp. on Found. of Comp. Sci. (FOCS), 1993.
 [6] M. Ghaffari and B. Haeupler. Distributed algorithms for planar networks I: Planar embedding. Manuscript, 2015.
 [7] M. Ghaffari and B. Haeupler. Distributed algorithms for planar networks II: Lowcongestion shortcuts, mst, and mincut. In Proc. of ACMSIAM Symp. on Disc. Alg. (SODA), pages 202–219. SIAM, 2016.
 [8] M. Ghaffari, A. Karrenbauer, F. Kuhn, C. Lenzen, and B. PattShamir. Nearoptimal distributed maximum flow: Extended abstract. In the Proc. of the Int’l Symp. on Princ. of Dist. Comp. (PODC), pages 81–90, 2015.
 [9] M. Ghaffari and F. Kuhn. Distributed minimum cut approximation. In Proc. of the Int’l Symp. on Dist. Comp. (DISC), pages 1–15, 2013.
 [10] S. Holzer and R. Wattenhofer. Optimal distributed all pairs shortest paths and applications. In the Proc. of the Int’l Symp. on Princ. of Dist. Comp. (PODC), pages 355–364, 2012.
 [11] T. Izumi and R. Wattenhofer. Time lower bounds for distributed distance oracles. In Proc. of the International Conference on Principles of Distributed Systems, pages 60–75, 2014.
 [12] M. Khan and G. Pandurangan. A fast distributed approximation algorithm for minimum spanning trees. Distributed Computing, 20(6):391–402, 2008.
 [13] S. Kutten and D. Peleg. Fast distributed construction of kdominating sets and applications. In the Proc. of the Int’l Symp. on Princ. of Dist. Comp. (PODC), pages 238–251, 1995.
 [14] F. T. Leighton, B. M. Maggs, and S. B. Rao. Packet routing and jobshop scheduling in O(congestion+ dilation) steps. Combinatorica, 14(2):167–186, 1994.
 [15] C. Lenzen and B. PattShamir. Fast routing table construction using small messages: Extended abstract. In Proc. of the Symp. on Theory of Comp. (STOC), pages 381–390, 2013.
 [16] C. Lenzen and B. PattShamir. Fast partial distance estimation and applications. In the Proc. of the Int’l Symp. on Princ. of Dist. Comp. (PODC), pages 153–162, 2015.
 [17] C. Lenzen and D. Peleg. Efficient distributed source detection with limited bandwidth. In the Proc. of the Int’l Symp. on Princ. of Dist. Comp. (PODC), pages 375–382, 2013.
 [18] D. Nanongkai. Distributed approximation algorithms for weighted shortest paths. In Proc. of the Symp. on Theory of Comp. (STOC), pages 565–573, 2014.
 [19] D. Nanongkai and H.H. Su. Almosttight distributed minimum cut algorithms. In Proc. of the Int’l Symp. on Dist. Comp. (DISC), pages 439–453, 2014.
 [20] J. Nešetřil, E. Milková, and H. Nešetřilová. Otakar boruvka on minimum spanning tree problem translation of both the 1926 papers, comments, history. Discrete Math., 233(1):3–36, 2001.
 [21] D. Peleg. Distributed Computing: A Localitysensitive Approach. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.
 [22] D. Peleg and V. Rubinovich. A neartight lower bound on the time complexity of distributed MST construction. In Proc. of the Symp. on Found. of Comp. Sci. (FOCS), pages 253–, 1999.
Appendix A Shortcut Construction in the Case of Unknown Parameters
The algorithm presented in Section 5 assume that upperbound values on and are available. That is, each node must know those values before starting the algorithm. Fortunately, in our algorithm, the lack of that knowledge is not a problem. A key property of our construction algorithm is that it inherently includes termination detection, which allows us to use a simple doubling mechanism: We first start the first trial with a small estimated value of parameters, and if we fail the construction, the next trial is executed after doubling the values of parameters. This mechanism removes the requirement of the knowledge on and/or with extra factor of the running time. It should be noted that utilizing this mechanism can yield much better shortcuts than the theoretical bound. For example, even for graphs with large genus , the algorithm can find a good (i.e. congestion) shortcut if it (luckily) exists.