Improved Distributed Expander Decomposition and Nearly Optimal Triangle Enumeration

Improved Distributed Expander Decomposition and Nearly Optimal Triangle Enumeration

Yi-Jun Chang University of Michigan, USA Thatchaphol Saranurak Toyota Technological Institute at Chicago, USA

Abstract

An -expander decomposition of a graph is a clustering of the vertices such that (1) each cluster induces subgraph with conductance at least , and (2) the number of inter-cluster edges is at most . This decomposition has a wide range of applications in the centralized setting, including approximation algorithms for the unique game, algorithms for flow and cut problems, and dynamic graph algorithms. Recently, the first application of expander decomposition in distributed computing was found. Chang, Pettie, and Zhang [SODA’19] showed that a variant of expander decomposition can be computed efficiently in the model, and they used it to show that triangle enumeration can be solved in rounds, improving upon the -round algorithm by Izumi and Le Gall [PODC’17]. It is conceivable that expander decomposition will find more applications in distributed computing.

In this paper, we give an improved distributed expander decomposition, and obtain a nearly optimal distributed triangle enumeration algorithm in the model. Specifically, we construct an -expander decomposition with in rounds for any and positive integer . For example, a -expander decomposition can be computed in rounds, for any arbitrarily small constant , and a -expander decomposition only requires rounds to compute, which is optimal up to subpolynomial factors. Previously, the algorithm by Chang, Pettie, and Zhang can construct a -expander decomposition using rounds for any , with a caveat that the algorithm is allowed to throw away some edges into an extra part which form a subgraph with arboricity at most . Our algorithm does not have this caveat.

By slightly modifying the distributed algorithm for routing on expanders by Ghaffari, Kuhn and Su [PODC’17], we obtain a triangle enumeration algorithm using rounds. This matches the lower bound by Izumi and Le Gall [PODC’17] and Pandurangan, Robinson and Scquizzato [SPAA’18] of which holds even in the model. To the best of our knowledge, this provides the first non-trivial example for a distributed problem that has essentially the same complexity (up to a polylogarithmic factor) in both and .

The key technique in our proof is the first distributed approximation algorithm for finding a low conductance cut that is as balanced as possible. Previous distributed sparse cut algorithms do not have this nearly most balanced guarantee.111Kuhn and Molla [22] previously claimed that their approximate sparse cut algorithm also has the nearly most balanced guarantee, but this claim turns out to be incorrect [4, Footnote 3].

1 Introduction

In this paper, we consider the task of finding an expander decomposition of a distributed network in the model of distributed computing. Roughly speaking, an expander decomposition of a graph is a clustering of the vertices such that (1) each component induces a high conductance subgraph, and (2) the number of inter-component edges is small. This natural bicriteria optimization problem of finding a good expander decomposition was introduced by Kannan Vempala and Vetta [19], and was further studied in many other subsequent works [38, 28, 30, 2, 40, 27, 33].222 The existence of the expander decomposition is (implicitly) exploited first in the context of property testing [14]. The expander decomposition has a wide range of applications, and it has been applied to solving linear systems [39], unique games [1, 40, 32], minimum cut [20], and dynamic algorithms [26].

Recently, Chang, Pettie, and Zhang [4] applied this technique to the field of distributed computing, and they showed that a variant of expander decomposition can be computed efficiently in . Using this decomposition, they showed that triangle detection and enumeration can be solved in rounds.333The notation hides any polylogarithmic factor. The previous state-of-the-art bounds for triangle detection and enumeration were and , respectively, due to Izumi and Le Gall [16]. Later, Daga et al. [7] exploit this decomposition and obtain the first algorithm for computing edge connectivity of a graph exactly using sub-linear number of rounds.

Specifically, the variant of the decomposition in [4] is as follows. If we allow one extra part that induces an -arboricity subgraph444The arboricity of a graph is the minimum number such that its edge set can be partitioned into forests. in the decomposition, then in rounds we can construct an expander decomposition in such that each component has conductance and the number of inter-component edges is at most .

A major open problem left by the work [4] is to design an efficient distributed algorithm constructing an expander decomposition without the extra low-arboricity part. In this work, we show that this is possible. A consequence of our new expander decomposition algorithm is that triangle enumeration can be solved in rounds, nearly matching the lower bound [16, 29] by a polylogarithmic factor.

The Model.

In the model of distributed computing, the underlying distributed network is represented as an undirected graph , where each vertex corresponds to a computational device, and each edge corresponds to a bi-directional communication link. Each vertex has a distinct -bit identifier . The computation proceeds according to synchronized rounds. In each round, each vertex can perform unlimited local computation, and may send a distinct -bit message to each of its neighbors. Throughout the paper we only consider the randomized variant of . Each vertex is allowed to generate unlimited local random bits, but there is no global randomness. We say that an algorithm succeeds with high probability (w.h.p.) if its failure probability is at most .

The model is a variant of that allows all-to-all communication, and the model is a variant of that allows messages of unbounded length.

Terminology.

Before we proceed, we review the graph terminologies related to the expander decomposition. Consider a graph . For a vertex subset , we write to denote . Note that by default the degree is with respect to the original graph . We write , and let be the set of edges with and . The sparsity or conductance of a cut is defined as . The conductance of a graph is the minimum value of over all vertex subsets . Define the balance of a cut by . We say that is a most-balanced cut of of conductance at most if is maximized among all cuts of with conductance at most . We have the following relation [17] between the mixing time and conductance :

Let be a vertex set. Denote by the set of all edges whose two endpoints are both within . We write to denote the subgraph induced by , and we write to denote the graph resulting from adding self loops to each vertex in . Note that the degree of each vertex in both and is identical. As in [35], each self loop of contributes 1 in the calculation of . Observe that we always have

Let be a vertex. Denote as the set of neighbors of . We also write . Note that . These notations , , and depend on the underlying graph . When the choice of underlying graph is not clear from the context, we use a subscript to indicate the underlying graph we refer to.

Expander Decomposition.

An -expander decomposition of a graph is defined as a partition of the vertex set satisfying the following conditions.

  • For each component , we have .

  • The number of inter-component edges is at most .

The main contribution of this paper is the following result.

{restatable}

theoremrestateexpanderdecomposition Let , and let be a positive integer. An -expander decomposition with can be constructed in rounds, w.h.p.

The proof of Theorem 1 is in Section 2. We emphasize that the number of rounds does not depend on the diameter of . There is a trade-off between the two parameters and . For example, an -expander decomposition with and can be constructed in rounds by setting in Theorem 1. If we are allowed to have and spend rounds, then we can achieve .

Distributed Triangle Finding.

Variants of the triangle finding problem have been studied in the literature [3, 4, 8, 9, 10, 29, 16]. In the triangle detection problem, it is required that at least one vertex must report a triangle if the graph has at least one triangle. In the triangle enumeration problem, it is required that each triangle of the graph is reported by at least one vertex. Both of these problems can be solved in rounds in . It is the bandwidth constraint of and that makes these problems non-trivial.

It is important that a triangle is allowed to be reported by a vertex . If it is required that a triangle has to be reported by a vertex , then there is an lower bound [16] for triangle enumeration, in both and . To achieve a round complexity of , it is necessary that some triangles are reported by vertices not in .

Dolev, Lenzen, and Peled [8] showed that triangle enumeration can be solved deterministically in rounds in . This algorithm is optimal, as it matches the -round lower bound [16, 29] in . Interestingly, if we only want to detect one triangle or count the number of triangles, then Censor-Hillel et al. [3] showed that the round complexity in can be improved to  time [3], where is the exponent of the complexity of matrix multiplication [23].

For the model, Izumi and Le Gall [16] showed that the triangle detection and enumeration problems can be solved in and time, respectively. These upper bounds were later improved to by Chang, Pettie, and Zhang using a variant of expander decomposition [4].

A consequence of Theorem 1 is that triangle enumeration (and hence detection) can be solved in rounds, almost matching the lower bound [16, 29] which holds even in . To the best of our knowledge, this provides the first non-trivial example for a distributed problem that has essentially the same complexity (up to a polylogarithmic factor) in both and , i.e., allowing non-local communication links does not help. In contrast, many other graph problems can be solved much more efficiently in than in ; see e.g., [18, 13].

{restatable}

theoremrestatetriangle Triangle enumeration can be solved in rounds in , w.h.p.

The proof of Theorem 1 is in Section 3.

1.1 Prior Work on Expander Decomposition

In the centralized setting, the first polynomial time algorithm for construction an -expander decomposition is by Kannan, Vempala and Vetta [19] where . Afterward, Spielman and Teng [37, 38] significantly improved the running time to be near-linear in , where is the number of edges. In time , they can construct a “weak” -expander decomposition. Their weak expander decomposition decomposition only has the following weaker guarantee that each part in the partition of might not induce an expander, and we only know that is contained in some unknown expander. That is, there exists some where . Although this guarantee suffices for many applications (e.g. [21, 6]), some other applications [26, 5], including the triangle enumeration algorithm of [4], crucially needs the fact that each part in the decomposition induces an expander.

Nanongkai and Saranurak [25] and, independently, Wulff-Nilsen [41] gave a fast algorithm without weakening the guarantee as the one in [37, 38]. In [25], their algorithm finds a )-expander decomposition in time . Although the trade-off is worse in [41], their high-level approaches are in fact the same. They gave the same black-box reduction from constructing an expander decomposition to finding a nearly most balanced sparse cut. The difference only comes from the quality of their nearly most balanced sparse cuts algorithms. Our distributed algorithm will also follow this high-level approach.

Most recently, Saranurak and Wang [33] gave a -expander decomposition algorithm with running time . This is optimal up to a polylogarithmic factor when . We do not use their approach, as their trimming step seems to be inherently sequential and very challenging to parallelize or make distributed.

The only previous expander decomposition in the distributed setting is by Chang, Pettie, and Zhang [4]. Their distributed algorithm gave an -expander decomposition with an extra part which is an -arboricity subgraph in rounds in . Our distributed algorithm significantly improved upon this work.

1.2 Technical Overview

For convenience, we call a cut with conductance at most a -sparse cut in this section. To give a high-level idea, the most straightforward algorithm for constructing an expander decomposition of a graph is as follows. Find a -sparse cut . If such a cut does not exist, then return as a part in the partition. Otherwise, recurse on both sides and , and so the edges in become inter-cluster edges. To see the correctness, once the recursion stops at for some , we know that . Also, the total number of inter-cluster edges is at most because (1) each inter-cluster edge can be charged to edges in the smaller side of some -sparse cut, and (2) each edge can be in the smaller side of the cut for at most times.

This straightforward approach has two efficiency issues: (1) checking whether a -sparse cut exists does not admit fast distributed algorithms (and is in fact NP-hard), and (2) a -sparse cut can be very unbalanced and hence the recursion depth can be as large as . Thus, even if we ignore time spent on finding cuts, the round complexity due to the recursion depth is too high. At a high-level, all previous algorithms (both centralized and distributed) handle the two issues in the same way up to some extent. First, they instead use approximate sparse cut algorithms which either find some -sparse cut or certify that there is no -sparse cut where . Second, they find a cut with some guarantee about the balance of the cut, i.e., the smaller side of the cut should be sufficiently large.

Let us contrast our approach with the only previous distributed expander decomposition algorithm by Chang, Pettie, and Zhang [4]. They gave an approximate sparse cut algorithm such that the smaller side of the cut has vertices for some constant , so the recursion depth is . They guarantee this property by “forcing” the graph to have minimum degree at at least , so any -sparse cut must contain vertices (this uses the fact that the graph is simple) To force the graph to have high degree, they keep removing vertices with degree at most at any step of the algorithms. Throughout the whole algorithm, the removed part form a graph with arboricity at most . This explains why their decomposition outputs the extra part which induces a low arboricity subgraph. With some other ideas on distributed implementation, they obtained the round complexity of , roughly matching the recursion depth.

In this paper, we avoid this extra low-arboricity part. The key component is the following. Instead of just guaranteeing that the smaller side of the cut has vertices, we give the first efficient distributed algorithm for computing a nearly most balanced sparse cut. Suppose there is a -sparse cut with balance , then our sparse cut algorithm returns a -sparse cut with balance at least , where is not much larger than . Intuitively, given that we can find a nearly most balanced sparse cut efficiently, the recursion depth should be made very small. This intuition can be made formal using the ideas in the centralized setting from Nanongkai and Saranurak [25] and Wullf-Nilsen [41]. Our main technical contribution is of two-fold. First, we show the first distributed algorithm for computing a nearly most balanced sparse cut, which is our key algorithmic tool. Second, in order to obtain a fast distributed algorithm, we must modify the centralized approach of [25, 41] on how to construct an expander decomposition. In particular, we need to run a low diameter decomposition whenever we encounter a graph with high diameter, as our distributed algorithm for finding a nearly most balanced sparse cut is fast only on graphs with low diameter.

Sparse Cut Computation.

At a high level, our distributed nearly most balanced sparse cut algorithm is a distributed implementation of the sequential algorithm of Spielman and Teng [38]. The algorithm of [38] involves sequential iterations of  with a random starting vertex on the remaining subgraph. Roughly speaking, the procedure  aims at finding a sparse cut by simulating a random walk. The idea is that if the starting vertex belongs to some sparse cut , then it is likely that most of the probability mass will be trapped inside . Chang, Pettie, and Zhang [4] showed that simultaneous iterations of an approximate version of  with a random starting vertex can be implemented efficiently in in rounds, where is the target conductance. A major difference between this work and [4] is that the expander decomposition algorithm of [4] does not need any requirement about the balance of the cut in their sparse cut computation.

Note that the sequential iterations of  in the nearly most balanced sparse cut algorithm of [38] cannot be completely parallelized. For example, it is possible that the union of all output of  equals the entire graph. Nonetheless, we show that this process can be partially parallelized at the cost of worsening the conductance guarantee by a polylogarithmic factor.

{restatable}

[Nearly most balanced sparse cut]theoremrestatenearlybalcut Given a parameter , there is an -round algorithm that achieves the following w.h.p.

  • In case , the algorithm is guaranteed to return a cut with balance and conductance , where is defined as , where is a most-balanced sparse cut of of conductance at most .

  • In case , the algorithm either returns or returns a cut with conductance .

The proof of Theorem 1.2 is in Appendix A. We note again that this is the first distributed sparse cut algorithm with a nearly most balanced guarantee. The problem of finding a sparse cut the distributed setting has been studied prior to the work of [4]. Given that there is a -sparse cut and balance , the algorithm of Das Sarma, Molla, and Pandurangan [34] finds a cut of conductance at most in rounds in . The round complexity was later improved to by Kuhn and Molla [22]. These prior works have the following drawbacks: (1) their running time depends on which can be as small as , and (2) their output cuts are not guaranteed to be nearly most balanced (see footnote 1).

Low Diameter Decomposition.

The runtime of our distributed sparse cut algorithm (Theorem 1.2) is proportional to the diameter. To avoid running this algorithm on a high diameter graph, we employ a low diameter decomposition to decompose the current graph into components of small diameter.

The low diameter decomposition algorithm of Miller, Peng, and Xu [24] can already be implemented in efficiently. However, there is one subtle issue that the guarantee that the number of inter-cluster edges is at most only holds in expectation. In sequential or parallel computation model, we can simply repeat the procedure for several times and take the best result. In , this however takes at least diameter time, which is inefficient when the diameter is large.

We provide a technique that allows us to achieve this guarantee with high probability without spending diameter time, so we can ensure that the number of inter-cluster edges is small with high probability in our expander decomposition algorithm.555We remark that the triangle enumeration algorithm of [4] still works even if the guarantee on the number of inter-cluster edges in the expander decomposition only holds in expectation. Intuitively, the main barrier needed to be overcome is the high dependence among the events that an edge has its endpoints in different clusters. Our strategy is to compute a partition in such a way that already induces a low diameter clustering, and the edges incident to satisfy the property that if we run the the low diameter decomposition algorithm of [24], the events that they are inter-cluster have sufficiently small dependence. Then we can use a variant of Chernoff bound with bounded dependence [31] to bound the number of inter-cluster edges with high probability.

{restatable}

[Low diameter decomposition]theoremrestatelowdiamclustering Let . There is an -round algorithm that finds a partition of the vertex set satisfying the following conditions w.h.p.

  • Each component has diameter .

  • The number of inter-component edges is at most .

The proof of Theorem B is in Appendix B.

Triangle Enumeration.

Incorporating our expander decomposition algorithm (Theorem 1) with the triangle enumeration algorithm of [4, 11], we immediately obtain an -round algorithm for triangle enumeration. This round complexity can be further improved to by adjusting the routing algorithm of Ghaffari, Kuhn, and Su [11] on graphs of small mixing time. The main observation is their algorithm can be viewed as a distributed data structure with a trade-off between the query time and the pre-processing time. In particular, for any given constant , it is possible to achieve query time by spending time on pre-processing.

2 Expander Decomposition

The goal of this section is to prove Theorem 1.

\restateexpanderdecomposition

*

For the sake of convenience, we denote

as a function associated with Theorem 1.2 such that when we run the nearly most balanced sparse cut algorithm of Theorem 1.2 with conductance parameter , if the output subset is non-empty, then it has . We note that

Let and be the parameters specified in Theorem 1. We define the following parameters that are used in our algorithm.

Nearly Most Balanced Sparse Cut:

We define in such a way that when we run the nearly most balanced sparse cut algorithm with this conductance parameter, any non-empty output must satisfy . For each , we define .

Low Diameter Decomposition:

The parameter for the low diameter decomposition is chosen as follows. Set as the smallest integer such that . Then we define .

We show that an -expander decomposition can be constructed in rounds, with conductance parameter . We will later see that is the smallest conductance parameter we ever use for applying the nearly most balanced sparse cut algorithm.

Algorithm.

Our algorithm has two phases. In the algorithm there are three places where we remove edges from the graph, and they are tagged with Remove-, for for convenience. Whenever we remove an edge , we add a self loop at both and , and so the degree of a vertex never changes throughout the algorithm. We never remove self loops.

At the end of the algorithm, is partitioned into connected components induced by the remaining edges. To prove the correctness of the algorithm, we will show that the number of removed edges is at most , and for each component .

Phase 1. The input graph is . Do the low diameter decomposition algorithm (Theorem 1.2) with parameter on . Remove all inter-cluster edges (Remove-1). For each connected component of the graph, do the nearly most balanced sparse cut algorithm (Theorem 1.2) with parameter on . Let be the output subset. If , then the subgraph quits Phase 1. If and , then the subgraph quits Phase 1 and enters Phase 2. Otherwise, remove the cut edges (Remove-2), and then we recurse on both sides and of the cut.

We emphasize that we do not remove the cut edges in Step 2b of Phase 1.

Lemma 1.

The depth of the recursion of Phase 1 is at most .

Proof.

Suppose there is still a component entering the depth of the recursion of Phase 1. Then according to the threshold for specified in Step 2b, we infer that by our choice of , which is impossible. ∎

Phase 2. The input graph is . Define . Define the sequence: , and , for each . Initialize and . Repeatedly do the following procedure. Do the nearly most balanced sparse cut algorithm (Theorem 1.2) with parameter on . Let be the output subset. Note that . If , then the subgraph quits Phase 2. If and , then update . Otherwise, update , and remove all edges incident to (Remove-3).

Intuitively, in Phase 2 we keep calling the nearly most balanced sparse cut algorithm to find a cut and remove it. If we find a cut that has volume greater than , then we make a good progress. If , then we learn that the volume of the most balanced sparse cut of conductance at most is at most by Theorem 1.2, and so we move on to the next level by setting .

The maximum possible level is . Since by definition , there is no possibility to increase to . Once we reach , we will repeatedly run the nearly most balanced sparse cut algorithm until we get and quit.

When we remove a cut in Phase 2, each becomes an isolated vertex with self loops, as all edges incident to have been removed, and so in the final decomposition we have for some . We emphasize that we only do the edge removal when . Lemma 2 bounds the volume of the cuts found during Phase 2.

Lemma 2.

For each , define as the union of all subsets found in Phase 2 when . Then either or .

Proof.

We first consider the case of . Observe that the graph satisfies the property that the most balanced sparse cut of conductance at most has balance at most , since otherwise it does not meet the condition for entering Phase 2. Note that all cuts we find during Phase 2 have conductance at most , and so the union of them is also a cut of with conductance at most . This implies that .

The proof for the case of is exactly the same, as the condition for increasing is to have . Let be the graph considered in the iteration when we increase to . The existence of such a cut of implies that the most balanced sparse cut of conductance at most of has volume at most . Similarly, note that all cuts we find when have conductance at most , and so the union of them is also a cut of with conductance at most . This implies that . ∎

Conductance of Remaining Components.

For each , there are two possible ways for to end the algorithm:

  • During Phase 1 or Phase 2, the output of the nearly most balanced sparse cut algorithm on the component that belongs to is . In this case, the component that belongs to becomes a component in the final decomposition . If is the conductance parameter used in the nearly most balanced sparse cut algorithm, then . Note that .

  • During Phase 2, for the output of the nearly most balanced sparse cut algorithm. In this case, itself becomes a component in the final decomposition . Trivially, we have .

Therefore, we conclude that each component in the final decomposition satisfies that .

Number of Removed Edges.

There are three places in the algorithm where we remove edges. We show that, for each , the number of edges removed due to Remove- is at most , and so the total number of inter-component edges in the final decomposition is at most .

  1. By Lemma 1, the depth of recursion of Phase 1 is at most . For each to , the number of edges removed due to the low diameter decomposition algorithm during depth of the recursion is at most . By our choice of , the number of edges removed due to Remove-1 is at most .

  2. For each edge removed due to the nearly most balanced sparse cut algorithm in Phase 1, we charge the cost of the edge removal to some pairs in the following way. If , for each , and for each edge incident to , we charge the amount to ; otherwise, for each , and for each edge incident to , we charge the amount to . Note that each pair is being charged for at most times throughout the algorithm, and the amount per charging is at most . Therefore, the number of edges removed due to Remove-2 is at most by our choice of .

  3. By Lemma 2, the summation of over all cuts in that are found and removed during Phase 2 due to Remove-3 is at most .

Round Complexity.

During Phase 1, each vertex participates in at most times the nearly most balanced sparse cut algorithm and the low diameter decomposition algorithm. By our choice of parameters and , the round complexity of both algorithms are , as we note that whenever we run the nearly most balanced sparse cut algorithm, the diameter of each connected component is at most .

For Phase 2, Lemma 2 guarantees that for each the algorithm can stay for at most iterations. If we neither increase nor quit Phase 2 for iterations, then we have , which is impossible. Therefore, the round complexity for Phase 2 can be upper bounded by

During Phase 2, it is possible that the graph be disconnected or has a large diameter, but we are fine since we can use all edges in for communication during a sparse cut computation, and the diameter of is at most .

3 Triangle Enumeration

We show how to derive Theorem 1 by combining Theorem 1 with other known results in [4, 11].

\restatetriangle

*

Chang, Pettie, and Zhang [4] showed that given an -expander decomposition  with , there is an algorithm that finds an edge subset with such that each triangle in is detected by some vertex during the execution of , except the triangles whose three edges are all within . The algorithm has to solve times the following routing problem in each . Given a set of routing requests where each vertex is a source or a destination for at most messages of bits, the goal is to deliver all messages to their destinations. Ghaffari, Khun, and Su [11] showed that this routing problem can be solved in rounds. This was later improved to in by Ghaffari and Li [12].

Applying our distributed expander decomposition algorithm (Theorem 1), we can find an -expander decomposition with and in rounds by selecting to be a sufficiently large constant. The mixing time of each component is at most . Then we apply the above algorithm , and it takes rounds with the routing algorithm of Ghaffari and Li [12]. After that, we recurse on the edge set , and we are done enumerating all triangles after iterations. This concludes the -round algorithm for triangle enumeration.

To improve the complexity to , we make the observation that the routing algorithm of [11] can be seen as a distributed data structure with the following properties.

Parameters:

The parameter is a positive integer that specifies the depth of the hierarchical structure in the routing algorithm. Given , define as the number such that , where is the total number of edges.

Pre-processing Time:

The algorithm for building the data structure consists of two parts. The round complexity for building the hierarchical structure is  [11, Lemma 3.2]. The round complexity for adding the portals is  [11, Lemma 3.3]

Query Time:

After building the data structure, each routing task can be solved in rounds [11, Lemma 3.4].

The parameter can be chosen as any positive integer. In [11] they used to balance the pre-processing time and the query time to show that the routing task can be solved in rounds. This round complexity was later improved to in [12]. We however note that the algorithm of [12] does not admit a trade-off as above. The main reason is their special treatment of the base layer of the hierarchical structure. In [12], is a random graph with degree , and simulating one round in already costs rounds in the original graph .

In the triangle enumeration algorithm , we need to query this distributed data structure for times. It is possible to set to be a large enough constant so that the pre-processing time costs only rounds, while the query time is still . This implies that the triangle enumeration problem can be solved in rounds.

Acknowledgment

We thank Seth Pettie for very useful discussion.

References

  • [1] S. Arora, B. Barak, and D. Steurer. Subexponential algorithms for unique games and related problems. J. ACM, 62(5):42:1–42:25, Nov. 2015.
  • [2] S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. J. ACM, 56(2):5:1–5:37, Apr. 2009.
  • [3] K. Censor-Hillel, P. Kaski, J. H. Korhonen, C. Lenzen, A. Paz, and J. Suomela. Algebraic methods in the congested clique. Distributed Computing, 2016.
  • [4] Y.-J. Chang, S. Pettie, and H. Zhang. Distributed Triangle Detection via Expander Decomposition. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 821–840, 2019.
  • [5] T. Chu, Y. Gao, R. Peng, S. Sachdeva, S. Sawlani, and J. Wang. Graph sparsification, spectral sketches, and faster resistance computation, via short cycle decompositions. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 361–372, 2018.
  • [6] M. B. Cohen, J. A. Kelner, J. Peebles, R. Peng, A. B. Rao, A. Sidford, and A. Vladu. Almost-linear-time algorithms for markov chains and new spectral primitives for directed graphs. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 410–419, 2017.
  • [7] M. Daga, M. Henzinger, D. Nanongkai, and T. Saranurak. Distributed edge connectivity in sublinear time. arXiv preprint arXiv:1904.04341, 2019. To appear at STOC’19.
  • [8] D. Dolev, C. Lenzen, and S. Peled. “Tri, tri again”: Finding triangles and small subgraphs in a distributed setting. In Proceedings 26th International Symposium on Distributed Computing (DISC), pages 195–209, 2012.
  • [9] A. Drucker, F. Kuhn, and R. Oshman. On the power of the congested clique model. In Proceedings 33rd ACM Symposium on Principles of Distributed Computing (PODC), pages 367–376, 2014.
  • [10] O. Fischer, T. Gonen, F. Kuhn, and R. Oshman. Possibilities and impossibilities for distributed subgraph detection. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 153–162, New York, NY, USA, 2018. ACM.
  • [11] M. Ghaffari, F. Kuhn, and H.-H. Su. Distributed MST and routing in almost mixing time. In Proceedings 37th ACM Symposium on Principles of Distributed Computing (PODC), pages 131–140, 2017.
  • [12] M. Ghaffari and J. Li. New distributed algorithms in almost mixing time via transformations from parallel algorithms. In U. Schmid and J. Widder, editors, Proceedings 32nd International Symposium on Distributed Computing (DISC), volume 121 of Leibniz International Proceedings in Informatics (LIPIcs), pages 31:1–31:16, Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
  • [13] M. Ghaffari and K. Nowicki. Congested clique algorithms for the minimum cut problem. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC ’18, pages 357–366, New York, NY, USA, 2018. ACM.
  • [14] O. Goldreich and D. Ron. A sublinear bipartiteness tester for bounded degree graphs. Combinatorica, 19(3):335–373, Mar 1999.
  • [15] B. Haeupler and D. Wajc. A faster distributed radio broadcast primitive. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC), pages 361–370. ACM, 2016.
  • [16] T. Izumi and F. Le Gall. Triangle finding and listing in CONGEST networks. In Proceedings 37th ACM Symposium on Principles of Distributed Computing (PODC), pages 381–389, 2017.
  • [17] M. Jerrum and A. Sinclair. Approximating the permanent. SIAM Journal on Computing, 18(6):1149–1178, 1989.
  • [18] T. Jurdziński and K. Nowicki. MST in rounds of congested clique. In Proceedings 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2620–2632, 2018.
  • [19] R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. J. ACM, 51(3):497–515, May 2004.
  • [20] K.-I. Kawarabayashi and M. Thorup. Deterministic edge connectivity in near-linear time. J. ACM, 66(1):4:1–4:50, Dec. 2018.
  • [21] J. A. Kelner, Y. T. Lee, L. Orecchia, and A. Sidford. An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 217–226, 2014.
  • [22] F. Kuhn and A. R. Molla. Distributed sparse cut approximation. In Proceedings 19th International Conference on Principles of Distributed Systems (OPODIS), pages 10:1–10:14, 2015.
  • [23] F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC ’14, pages 296–303, New York, NY, USA, 2014. ACM.
  • [24] G. L. Miller, R. Peng, and S. C. Xu. Parallel graph decompositions using random shifts. In Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures (SPAA), pages 196–203. ACM, 2013.
  • [25] D. Nanongkai and T. Saranurak. Dynamic spanning forest with worst-case update time: adaptive, las vegas, and -time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1122–1129, 2017.
  • [26] D. Nanongkai, T. Saranurak, and C. Wulff-Nilsen. Dynamic minimum spanning forest with subpolynomial worst-case update time. In Proceedings of IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 950–961. IEEE, 2017.
  • [27] L. Orecchia and N. K. Vishnoi. Towards an sdp-based approach to spectral methods: A nearly-linear-time algorithm for graph partitioning and decomposition. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011, pages 532–545, 2011.
  • [28] L. Orecchia and Z. A. Zhu. Flow-based algorithms for local graph clustering. In Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’14, pages 1267–1286, Philadelphia, PA, USA, 2014. Society for Industrial and Applied Mathematics.
  • [29] G. Pandurangan, P. Robinson, and M. Scquizzato. On the distributed complexity of large-scale graph computations. In Proceedings 30th ACM Symposium on Parallelism in Algorithms and Architecture (SPAA), 2018.
  • [30] M. Pǎtraşcu and M. Thorup. Planning for fast connectivity updates. In Proceedings 48th IEEE Symposium on Foundations of Computer Science (FOCS), pages 263–271, 2007.
  • [31] S. V. Pemmaraju. Equitable coloring extends chernoff-hoeffding bounds. In M. Goemans, K. Jansen, J. D. P. Rolim, and L. Trevisan, editors, Approximation, Randomization, and Combinatorial Optimization: Algorithms and Techniques, pages 285–296, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
  • [32] P. Raghavendra and D. Steurer. Graph expansion and the unique games conjecture. In Proceedings 42nd ACM Symposium on Theory of Computing (STOC), pages 755–764, 2010.
  • [33] T. Saranurak and D. Wang. Expander decomposition and pruning: Faster, stronger, and simpler. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2616–2635, 2019.
  • [34] A. D. Sarma, A. R. Molla, and G. Pandurangan. Distributed computation of sparse cuts via random walks. In Proceedings 16th International Conference on Distributed Computing and Networking (ICDCN), pages 6:1–6:10, 2015.
  • [35] D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. In Proceedings 40th ACM Symposium on Theory of Computing (STOC), pages 563–568, 2008.
  • [36] D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings 36th Annual ACM Symposium on Theory of Computing (STOC), pages 81–90, 2004.
  • [37] D. A. Spielman and S.-H. Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981–1025, 2011.
  • [38] D. A. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput., 42(1):1–26, 2013.
  • [39] D. A. Spielman and S.-H. Teng. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM Journal on Matrix Analysis and Applications, 35(3):835–885, 2014.
  • [40] L. Trevisan. Approximation algorithms for unique games. Theory of Computing, 4(5):111–128, 2008.
  • [41] C. Wulff-Nilsen. Fully-dynamic minimum spanning forest with improved worst-case update time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 1130–1143, 2017.

Appendix

Appendix A Nearly Most Balanced Sparse Cut

The goal of this section is to prove the following theorem.

\restatenearlybalcut

*

Proof.

This theorem follows from a re-parameterization of Lemma 8 and Lemma 11. ∎

We will prove this theorem by adapting the nearly most balanced sparse cut algorithm of Spielman and Teng [36]666There are many versions of the paper [36]; we refer to https://arxiv.org/abs/cs/0310051v9. to in a white-box manner. Before presenting the proof, we highlight the major differences between this work and the sequential algorithm of [36]. The procedure  itself is not suitable for a distributed implementation, so we follow the idea of [4] to consider an approximate version of  (Section A.2) and use the distributed implementation described in [4] (Section A.5). The nearly most balanced sparse cut algorithm of Spielman and Teng [36] involves doing iterations of  with a random starting vertex on the remaining subgraph. We will show that this sequential process can be partially parallelized at the cost of worsening the conductance guarantee by a polylogarithmic factor (Section A.4).

Terminology.

Given a parameter , We define the following functions as in [36].

Let be the adjacency matrix of the graph . We assume a 1-1 correspondence between and . In a lazy random walk, the walk stays at the current vertex with probability and otherwise moves to a random neighbor of the current vertex. The matrix realizing this walk can be expressed as , where is the diagonal matrix with on the diagonal.

Let be the probability distribution of the lazy random walk that begins at and walks for steps. In the limit, as , approaches , so it is natural to measure relative to this baseline.

Let be any function. The truncation operation rounds to zero if it falls below a threshold that depends on .

As in [36], for any vertex set , we define the vector by if and if , and we define the vector by if and if . In particular, is a probability distribution on that has all its probability mass on the vertex , and is the degree distribution of . That is, .

a.1 Nibble

We first review the  algorithm of [36], which computes the following sequence of vectors with truncation parameter .

We define as the normalized probability mass at at time . Due to truncation, for all and , we have and .

We define as a permutation of such that . That is, we order the vertices by their -value, breaking ties arbitrarily (e.g., by comparing IDs). We write to denote the set of vertices with . For example, is the set of the top vertices with the highest -value.

Algorithm () For to , if there exists an index meeting the following conditions . . . then return and quit. Otherwise return .

Note that the definition of () is exactly the same as the one presented in [36].

Definition 1.

Define as the subset of such that if we start the lazy random walk from , then for at least one of . For any edge , define .

Intuitively, if , then does not participate in () and both endpoints of are not in the output of (). In particular, is a necessary condition for , The following auxiliary lemma establishes upper bounds on and . This lemma will be applied to bound the amount of congestion when we execute multiple  in parallel. Intuitively, if is small, then we can afford to run many instances () in parallel for random starting vertices sampled from the degree distribution .

Lemma 3.

The following formulas hold for each vertex and each edge .

In particular, these two quantities are both upper bounded by .

Proof.

In this proof we use superscript to indicate the starting vertex of the lazy random walk. We write . Then . Thus, to prove the lemma, if suffices to show that . This inequality follows from the fact that , as follows.