Tight Bounds for Linear Sketches of Approximate Matchings

Tight Bounds for Linear Sketches of Approximate Matchings

Abstract

We resolve the space complexity of linear sketches for approximating the maximum matching problem in dynamic graph streams where the stream may include both edge insertion and deletion. Specifically, we show that for any , there exists a one-pass streaming algorithm, which only maintains a linear sketch of size bits and recovers an -approximate maximum matching in dynamic graph streams, where is the number of vertices in the graph. In contrast to the extensively studied insertion-only model, to the best of our knowledge, no non-trivial single-pass streaming algorithms were previously known for approximating the maximum matching problem on general dynamic graph streams.

Furthermore, we show that our upper bound is essentially tight. Namely, any linear sketch for approximating the maximum matching to within a factor of has to be of size bits. We establish this lower bound by analyzing the corresponding simultaneous number-in-hand communication model, with a combinatorial construction based on Ruzsa-Szemerédi graphs.

1 Introduction

Massive datasets routinely arise in various application domains such as web-scale graphs and social networks. The space requirement for performing computations on these massive datasets can easily become prohibitively large. A common way of managing the space requirement is to consider algorithms in the streaming model of computation. In this model, formally introduced in the seminal work of [7], an algorithm is allowed to make a single or a few passes over the input while using space much smaller than the input size. We refer the reader to [36] for a survey of classical results in this model.

In recent years, there has been extensive work on design of streaming algorithms for various graph problems, including connectivity, minimum spanning trees, spanners, sparsifiers, matchings, etc. (see the survey by McGregor [34] for a summary of these results). Two types of graph streams are mainly studied in the literature: in the insertion-only model, the stream contains only edge insertion, and in the dynamic model, the stream contains both edge insertion and deletion. The focus of this paper is the dynamic model. The input in this model, called dynamic graph streams, can be defined formally as follows.

Definition 1 ([6]).

A dynamic graph stream defines a multi-graph on vertices . Each is a triple where and . The multiplicity of an edge is defined to be:

The multiplicity of every edge is required to be always non-negative.

The streaming model where the frequency of every entry is always non-negative is standard for graph problems, and this model is generally referred to as the strict turnstile model in the literature (as opposed to the turnstile model, which allows negative frequencies also). In this paper, we study the maximum matching problem for dynamic graph streams in which the algorithm is only allowed to make a single pass over the stream.

Matchings have received a lot of attention in the graph stream literature [33, 18, 15, 16, 21, 30, 40, 6, 4, 26, 22, 27, 13, 17]. We briefly summarize the previous results for adversarially ordered streams. A weaker notion of randomly ordered streams (which is less relevant to our work) is also often considered; for results in this model, we refer the reader to [30, 27] and references therein.

For the problem of recovering a maximum matching in bipartite graphs, a trivial lower bound on the space complexity of any streaming algorithm is , which is required for just storing the matching edges. Therefore, this problem is usually studied in the semi-streaming model (originally introduced by Feigenbaum et al.  [18]), where the algorithm is allowed to use bits of space. Moreover, no exact algorithm that uses space can exist [18]. This motivates the study of -approximate algorithms that output a matching of size within a multiplicative factor of the optimum. For single-pass semi-streaming algorithms in the insertion-only model, the best known approximation factor is , which is obtained by simply maintaining a maximal matching during the stream. On the negative side, it is shown by [21, 26] that any streaming algorithm that achieves an approximation factor of better than requires the storage of bits. For dynamic graph streams, to the best of our knowledge, no non-trivial single-pass streaming algorithm using space was known. Resolving the space complexity of matchings in single-pass dynamic graph streams has been posed as an open problem at the Bertinoro workshop on sublinear and streaming algorithms in 2014 [1].

For the problem of estimating the size of a maximum matching, a strongly sublinear space regime has been considered. In the single-pass insertion-only model, when edges arrive in an adversarial order, the only known positive result for estimating the matching size is that of [17] which showed that a constant factor approximation is possible in space under the assumption that the underlying graph is planar. The same paper [17] also provides a lower bound of (resp. ) bits of space for randomized (resp. deterministic) algorithms that approximate the matching size in bipartite graphs to within a factor of . For the state of the art in the streaming model which allows multiple passes over the stream, we refer the reader to [6, 4, 22, 3, 26] and references therein.

To the best of our knowledge, the only result concerning matchings in the single-pass dynamic graph streams is the recent paper by Chitnis et al.  [12], which provides an algorithm for computing a maximal matching of size using space. For multi-pass dynamic graph streams, [3] provides a -approximation scheme for the weighted non-bipartite matching problem using passes with space (see also [34]).

Finally, closely related to our work is a recent line of work on communication complexity of approximate matchings in the multi-party setting [14, 9, 24]. The one that is closest to ours is [24], which shows a tight bound of on the total communication required to compute an -approximate matching for bipartite graphs, in the -party message passing model where the edges of the input graph are arbitrarily partitioned between the players.

Linear sketches.

One of the most powerful techniques for designing streaming algorithms is linear sketching. Let be the number of vertices in the input graph. Then edge multiplicities can be treated as a vector with entries . Let be a (possibly randomly chosen) matrix. Then is referred to as a linear sketch of the input stream. If all that a streaming algorithm maintains is such a linear sketch, then the space requirement of the algorithm is proportional to . On any incoming update , the linear sketch will be updated to where is the new vector of edge multiplicities and is a unit vector whose only non-zero entry is the entry. At the end of the stream, the algorithm can apply an arbitrary function to the linear sketch to compute the final answer.

Linear sketching is the only existing technique for designing streaming algorithms in the turnstile model and even for dynamic graph streams1. Linear sketches are also one of the main techniques for designing mergeable summaries [2] used in distributed computing. These facts have made linear sketches a computational model of their own. Multiple results are known about the power and limitations of linear sketches, e.g. [5, 6, 10, 23, 28]. In fact, it is shown that any one-pass turnstile streaming algorithm can be implemented by maintaining only a linear sketch of the input during the stream [32]2. For an in-depth introduction of linear sketching and its applications for dynamic streams and distributed computing, we refer the reader to recent surveys by McGregor [34] (graph streams) and Woodruff [39] (computational linear algebra).

1.1 Our results

We resolve the space complexity of linear sketches for approximating maximum matchings by proving tight upper and lower bounds on the space requirement. For the upper bound, we establish the following theorem.

Theorem 1.

There is a single-pass randomized streaming algorithm that takes as input a parameter and a bipartite graph with vertices, specified by a dynamic graph stream, uses bits of space, and outputs a matching of size with high probability, where opt is the size of a maximum matching in . Moreover, the algorithm only maintains a linear sketch during the stream.

We prove this result by designing a sampling based algorithm that takes advantage of the well-known linear sketching implementation of -sampler (see Section 2.1). The algorithm maintains a set of (edge) samplers that are coordinated in such a way that the sampled edges are “well-spread” across different parts of the graph and hence contain a relatively large matching. The main challenge is to achieve such a coordination for linear sketching based samplers. Such a coordination is typically achieved via sequential operations that depend on the state of the stream, while linear sketches are inherently oblivious to the underlying state.

Note that our algorithm, though stated for bipartite graphs, also works for general graphs by applying the standard technique of choosing a random bipartition of the vertices upfront and only considering edges that cross the bipartition, while losing a factor of in the approximation ratio. We further note that for weighted graphs with -bounded weights, the standard “grouping by weight” technique can be used to obtain a similar result for computing an approximation to weighted matching, while losing a factor of in the approximation ratio.

We complement our upper bound by the following (essentially) matching lower bound.

Theorem 2.

There exists a constant , such that for any , any randomized linear sketch that can be used to recover a matching of size for every input bipartite graph on vertices with constant probability, must have worst case space complexity of bits. Here, opt denotes the size of a maximum matching in .

This result is obtained as a corollary of our lower bound on the communication complexity of approximating maximum matchings in the number-in-hand simultaneous model (Theorem 5); see Section 2.2 for the exact definition of this model and the connection with linear sketches.

Our construction follows the line of work by [21, 26] on using Ruzsa-Szemerédi graphs for proving lower bound on space complexity of streaming algorithms for maximum matching problem. However, focusing on the number-in-hand simultaneous model allows us to benefit from different construction of Ruzsa-Szemerédi graphs that are dense, hence bypassing the limitation of the aforementioned works on proving lower bound for larger approximation ratios and the barrier on the value of the space lower bound. We elaborate more on this in Section 2.3.

Finally, we note that Theorem 1 and Theorem 2 provide (essentially) tight bounds on the space complexity of any streaming algorithm for dynamic graph streams that only maintains a linear sketch during the stream. This makes progress on an open problem posed at the Bertinoro workshop on sublinear and streaming algorithms in 2014 [1], regarding to the possibility of having constant factor approximation to the maximum matching in space.

Recent related work.

Independently and concurrently to our work, Konrad [29] has also studied the problem of designing linear sketches for approximating matchings in dynamic graph streams. Konrad’s work shows that an -approximation can be obtained using a linear sketch of size , and it establishes a lower bound of on the size of any linear sketch that yields an -approximation. Our approaches for establishing the lower bound on the sketch size are in the same spirit, though the techniques and constructions are quite different.

1.2 Organization

In Section 2, we introduce the key concepts and tools used in this paper. In particular, Section 2.1 describes -samplers and how we use them in our algorithm; Section 2.2 formally defines the number-in-hand simultaneous model and how it is connected to linear sketches; and Section 2.3 provides a definition of Ruzsa-Szemerédi graphs and the specific construction used in our lower bound construction. In Section 3, we describe a single-pass streaming algorithm for the maximum matching problem in dynamic graph streams and prove Theorem 1. In Section 4, we present our lower bound construction and Theorem 2. Finally, we conclude our results in Section 5.

2 Preliminaries

2.1 -Samplers

We use the following tool developed in the streaming literature.

Definition 2 (-sampler [20]).

Let be a parameter. An -sampler is an algorithm which given access to a dynamic stream, returns FAIL with probability at most , and otherwise, outputs an element , along with the frequency , where is uniformly distributed among the non-zero entries of the frequency vector .

We use -samplers as follows: For the input graph , let be a subset of vertices; suppose we maintain an -sampler over the stream where only the edges between vertices in are considered. At the end of the stream, we can use the -sampler to recover one edge between the vertices in , if such an edge exists.

We use the following lemma in our algorithm which implements -samplers using linear sketches.

Lemma 2.1 ([25]).

For any , there is a linear sketching implementation of -sampler for the frequency vector with probability of success , using bits of space.

2.2 The Number-in-Hand Simultaneous Model

The number-in-hand simultaneous model is defined as follows. The input vector is partitioned adversarially between different players , where each player only sees the input . All players have access to an infinite shared string of random bits, referred to as public coins. The goal for the players is to compute a function by simultaneously sending a (possibly randomized using only public randomness) message to a special party called the coordinator, according to a pre-specified protocol. For any input , the coordinator is then required to output with probability over the randomness used in the protocol. We refer the reader to [31] for more information about communication complexity in general.

To prove our lower bound in Theorem 2, we consider the maximum matching problem in the number-in-hand simultaneous model, defined formally as follows. Each player is given a vector , representing the edges of a graph , with . Their goal is to approximate the maximum matching in the multi-graph , where is represented by the vector .

We should note that space lower bounds for single-pass streaming algorithms are usually obtained by proving communication complexity lower bounds in a different model of communication, i.e., the one-way communication model, in which player speaks to , who speaks to , etc., and finally outputs the answer. In this model, the maximum matching problem has a simple -approximation algorithm using communication per player: send a maximal matching from each player to the next one. Since we are looking for space complexity of , the one-way model cannot lead to our lower bound in Theorem 2.

The following proposition enables us to consider the simultaneous model instead of one-way model in proof of our space lower bound. This reduction is well-known in the literature (see [32], for example).

Proposition 2.2.

Suppose there is a linear sketch of size bits for a function from which can be computed with failure probability at most ; then for any , there exists a public-coin number-in-hand simultaneous protocol for players to compute , where each player communicates a message of size and the coordinator is able to compute with failure probability at most .

Proof.

The players use the public coins to construct the set of random coin tosses required to create the matrix in the linear sketch. Then, each player computes and sends it to the coordinator. The coordinator can now compute for by simply computing , and then compute from .       

2.3 Ruzsa-Szemerédi graphs

Given an undirected graph and a set of edges , we denote by , the set of vertices which are incident on at least one edge in . Moreover, we denote by , the set of edges induced by , i.e. . is said to be an induced matching if no two edges in share an endpoint and .

Definition 3 (Ruzsa-Szemerédi graph).

We call a graph an -Ruzsa-Szemerédi graph, -RS graph for short, if the set of edges in consists of pairwise disjoint induced matchings , each of size .

In general, graphs of this type are of interest when and are relatively large as a function of number of vertices in the graph. The first construction of an -RS graph was given by Ruzsa and Szemerédi [37] with parameters and . By now, there are several known construction of these graphs with different range of parameters and  [8, 11, 19] (see [8] for more information). In particular, Fischer et al.  [19] introduced a construction with parameters and . This construction was further used and improved by [21, 26] to obtain their aforementioned lower bound of on space complexity of streaming algorithms for maximum matching problem in the insertion-only streams.

We use the construction of -RS graphs given by Alon et al.  [8], which is summarized in the following theorem.

Theorem 3 ([8]).

For any sufficiently large , there exists an -RS graph on vertices with and .

3 An -approximation using space

In this section, we present our algorithm for computing an approximate maximum matching in the dynamic graph streams and prove the following theorem.

Theorem 4.

There is a single-pass randomized streaming algorithm that takes as input a parameter and a bipartite graph with vertices specified by a dynamic graph stream, uses bits of space, and with high probability, outputs a matching of size , where opt is the size of a maximum matching in .

In the following, whenever we use -samplers, we always apply Lemma 2.1 with parameter . Since the number of -samplers used by our algorithm is bounded by , with high probability, none of them will fail. In the rest of this section, we always assume this is the case for all -samplers we use, and we do not explicitly account for the probability of -samplers failure in our proofs.

For simplicity, we assume that the algorithm is provided with a value that is a -approximation of opt i.e., the size of a maximum matching in . This is without loss of generality, since we can run our algorithm for different estimates of opt in parallel and output the largest matching among the matchings found for all estimates. In addition, we can assume , since otherwise a single edge is an -approximation of the maximum matching, which can be obtained by maintaining an -sampler over all edges in the graph.

Algorithm 1 A single-pass dynamic streaming algorithm for the maximum matching problem.
Input: A bipartite graph with vertices on each side, specified by a dynamic graph stream, a parameter , and a -approximation to the size of a maximum matching in as .
Output: A matching with size .
Pre-processing:
  1. Let , , and .

  2. Create two collections and , each containing sets (called groups). Create two -wise independent hash functions and . Assign each vertex (resp. ) to the group (resp. ).

  3. For each , assign groups in to chosen independently and uniformly at random with replacement. For each assigned to , we say is an active partner of and form an active pair.

Streaming updates:
  1. For each and each of its active partners , maintain an -sampler over the edges between the vertices in and .

Post-processing:
  1. Sample one edge from each maintained -sampler and compute a maximum matching over the sampled edges.

The space complexity of Algorithm 1 is easy to verify. The algorithm stores two -wise independent hash functions and to assign vertices to their groups, which requires bits of space [35]. truly random bits are needed for identifying the active partners of each group in , and -samplers are maintained for the active pairs during the stream, where each of them requires bits of space (Lemma 2.1). Hence, the total space complexity of the algorithm is:

where the last equality is by choice of .

We now prove the correctness of the algorithm. Fix a maximum matching in with size opt. The following concentration bound ensures that each group in and contains vertices of the maximum matching .

Claim 3.1 ([38]).

If is sum of -wise independent random variables taking values in , and , then:

For simplicity, in the following, we assume every group has exactly vertices of 3. For any group , (resp. ) we refer to the edges in that are incident on (resp. ) as the matching edges of this group. Since is a matching, the number of matching edges of each group is also .

We say a pair is matchable by if and share at least one matching edge. The general idea of the proof is to show that among all active pairs, there is a subset of active pairs with the following two properties:

  1. Each pair is matchable by .

  2. No two pairs in share the same endpoints or .

Intuitively, properties (i,ii) together ensures that there exists a “matching” between the groups in and of size . Since we maintain an -sampler for each active pair in , and each matchable active pair contains at least one edge in , the -samplers for the matchable active pairs will return edges, which will form a matching of size in graph .

To prove the existence of such a set , we start by arguing that there are groups in such that (essentially) matching edges of are incident on distinct groups . Consequently, when the algorithm randomly assigns with groups in , since , with high probability, at least one of the active pairs is matchable by . This ensures that we have matchable active pairs where all ’s are distinct. Finally, we show that a constant fraction of these matchable active pairs also have distinct ’s, with a constant probability, proving property (ii).

We now provide the formal proof. To continue, we need the following definitions. We say a group is spanning if the matching edges of are incident on at least different . We say that preserves an edge in if belongs to at least one matchable active pair.

Lemma 3.2.

With probability at least , every spanning preserves an edge in .

Proof.

We argue that if is spanning, then preserves an edge in with probability at least . Then, by applying union bound over all spanning , with probability , every spanning preserves an edge in .

For any spanning , there are different ’s such that contains an edge between and , i.e., is matchable by . Recall that is assigned with groups in uniformly at random.

If different ’s are matchable with by , assigning random groups in to suffices to ensure that with probability at least , preserves an edge in .

If different ’s are matchable with by , the probability that a spanning does not preserve any edge in is at most

 

Lemma 3.3.

With a constant probability, at least of the ’s are spanning.

Proof.

We use the following simple balls and bins argument (see Appendix A.1 for a proof).

Claim 3.4.

Suppose we assign balls to bins independently and uniformly at random. With probability at least , the number of non-empty bins is at least .

Fix an . Consider each as a bin and each matching edge of as a ball. An edge (), i.e., a ball, is assigned to the bin iff the group assigned to vertex is . The number of balls here is and since we use a -wise independent hash function () to assign the balls to the bins, all these balls are assigned independently. By Claim 3.4, at least different ’s have edges in that are incident on and (hence is spanning), with probability at least . By Markov inequality, with a constant probability, at least ’s are spanning.       

Lemma 3.5.

With a constant probability, groups in are active partners of distinct spanning , such that and are matchable by .

Proof.

Suppose each spanning , when picking the ’s, only keeps the first where and are matchable by (picking more can only increase the size of the final matching). We only need to show that the number of distinct ’s that are kept by ’s is .

Suppose ; the other case when is an easy case since each spanning is matchable with fraction of the groups in . By Lemma 3.3, there are spanning ’s with high probability. Therefore, there are edges in incident on all the spanning ’s; we denote these edges of by . Since each group in has matching edge, and , it must be that at least groups in contain at least vertices incident on ; otherwise, the total number of edges incident on is less than

Let be the set of all these groups in . Conditioned on the event that preserves an edge in , for each of the groups that are matchable with by (there are at most such groups), the probability that is kept by is at least . Therefore, for each of these groups, the probability that is assigned to any spanning is at most

Hence the expected number of groups in that are not active partner of any spanning is at most . By Markov inequality, with a constant probability, different will be kept by some spanning .

Note that the probability of success can be boosted to any constant by allowing to repeatedly pick groups from as active partners for a constant number of times.       

Proof.

(Theorem 4) By Lemma 3.5, groups in will be assigned to distinct spanning groups in ; moreover, every such pairs are matchable by . Since all these pairs are matchable by , there exists at least one edge between each of these pairs. By picking one edge for each of these pairs, using the -sampler between these active pairs, we obtain a matching of size . Therefore, in the post-processing step, the algorithm can find a matching of size .       

4 An lower bound for -approximation

In this section, we provide our lower bound result for approximating the maximum matching using linear sketches. As stated in Section 2.2, we only need to prove the lower bound for the number-in-hand simultaneous model; the rest follows from Proposition 2.2.

Theorem 5.

There exists a constant , such that for any , any protocol for approximating the maximum matching to within a factor of on every graph with vertices, in the number-in-hand simultaneous model with players, has to communicate bits from at least one player.

Note that though we state Theorem 5 for general graphs, the reduction mentioned after Theorem 1 implies the same lower bound for bipartite graphs.

By Yao’s principle, it is enough to prove the lower bound on the communication complexity of deterministic protocols on some fixed distribution on the inputs (known to the players). We provide the following distribution as a hard input distribution for every deterministic protocol.

The hard input distribution (for any and any sufficiently large integer )
  • Parameters: :


  • For each player () independently,

    1. Create a set of vertices and construct an arbitrary -RS graph over .

    2. Pick uniformly at random and let be the set of vertices matched in the induced matching .

    3. For each of the induced matchings, drop half of the edges uniformly at random.


  • Pick a random permutation of . For every player , let the label of to be for every and let the label of to be for . Note that the vertices with the same label correspond to the same vertex in the final graph.

Several remarks are in order. First, one can easily verify the following relation between the parameters,

Second, for the choice of the parameters and , by Theorem 3, such an -RS graph with vertices indeed exists. Moreover, note that the vertices in for all players are assigned with unique labels, while the vertices in are assigned with the same set of labels. Consequently, the final graph is a multi-graph with vertices and total number of edges (counting the multiplicties). We now briefly describe the intuition behind this distribution.

Each player is given an -RS graph with half of the edges discarded uniformly at random from each of the induced matchings. Moreover, only a single induced matching is “private” and the vertices that are not incident on this matching are shared among all players. In addition, the identities of the private matching and shared vertices are unknown to the players. Intuitively, for any deterministic protocol over this distribution, every player has to send enough information for the coordinator to recover a large fraction of the edges from every induced matching; otherwise, the coordinator will not be guaranteed to recover a large enough matching. We now make this intuition formal.

We say a vertex is good if it belongs to some for . We say a matching is trivial if the total number of good vertices matched in is at most .

Claim 4.1.

Let be a maximum matching in and be any trivial matching, then

Proof.

Since is a maximum matching, it contains at least edges (just using the induced matching between the good vertices of each player). On the other hand, since is a trivial matching, its size is at most the number of vertices shared by all players plus the number of good vertices matched in , which is at most . Since ,

 

Our goal is to prove that in any protocol that each player transmits a “small-size” message, the expected number of good vertices matched by the final matching is small. In other words, the coordinator would only be able to recover a trivial matching.

Recall that is the graph given to the player and . With a slight abuse of notation, we refer to the induced subgraph of that is obtained by removing all isolated vertices as the graph itself, since, this graph is effectively the real input to the player . Moreover, note that picking the permutation ensures that the labels of the vertices in are chosen uniformly at random from and hence revealing no extra information to the player . Let be the set of all possible graphs that can be. Since the edges of are obtained through dropping half of the edges uniformly at random from each induced matching of an -RS graph, . Moreover, in the input distribution, is chosen from uniformly at random.

For any subset , we define the graph as the intersection graph of all graphs in , i.e., an edge belongs to the graph iff it belongs to every graph in .

Lemma 4.2.

For any , any subset , and any integer , let be the set of indices such that for any , contains at least edges from the -th induced matching; if , then .

Proof.

Let ; we can upper bound the size of as follows:

Therefore, implies ; a contradiction.       

Lemma 4.3.

Suppose for each , the player sends a message of size at most

bits to the coordinator; then, the expected number of good vertices that are matched in the matching computed by the coordinator is at most .

Proof.

Fix an index and a player . Let denote the random variable counting the number of good vertices that are matched by the coordinator from the graph provided to the player . In the following, we prove that

(1)

Having this, for , by linearity of expectation, we have , implying that the expected number of good vertices matched by the coordinator is at most .

Suppose the coordinator knows all inputs to the players except for player , i.e., the graph . Note that this is the maximum information the coordinator can obtain from other players. Define as the deterministic mapping used by the player to map the input graph to a -bit message and send it to the coordinator. Define the function such that for any , .

The important observation is that since the protocol is deterministic, the coordinator can output an edge as a matching edge for the player , only if is part of every graph in . We define to be the event that for the graph , .

The following claim can be proven using a simple counting argument (see Appendix A.2 for a proof).

Claim 4.4.

For any , .

We can write the expected value of as,

(2)

By Claim 4.4, the first term in this equation is less than . For simplicity, we neglect this additive value of in Equation (2). We now bound the second term. We have:

(3)

We can now compute for any as follows. Let ; the event implies that . By Lemma 4.2, for defined as in the lemma statement, . In the input distribution, is chosen from uniformly at random. Therefore, the probability that is at most . Hence,

(4)

By plugging in inequality (4) in (3) we obtain,

Consequently, we proved the inequality (1), i.e, .       

Proof.

(Theorem 5) By Lemma 4.3, if no player communicates a message of size bits, then the expected number of good vertices matched in the matching output by the coordinator is and hence by Markov bound the output matching is a trivial matching with probability . By Claim 4.1, any trivial matching is at most an -approximation to the maximum matching.

Since , , , and (by Theorem 3), we have that any simultaneous protocol that obtains a better than -approximation to the maximum matching with constant probability, has to communicate bits from at least one player.

 

5 Conclusions

In this paper, we resolved the space complexity of linear sketches for approximating the maximum matching problem in dynamic graph streams. In particular, for approximating the maximum matching to within a factor of , we proved that the space of bits is sufficient and necessary for every single-pass streaming algorithm that only maintains a linear sketch of the stream.

Our result suggests that to achieve better upper bound for the maximum matching problem, a new set of techniques is required. Alternatively, it might be the case that any algorithm for dynamic graph streams can be implemented as a linear sketch (similar to the equivalence between linear sketches and single-pass turnstile algorithms [32]). As noted earlier, to the best of our knowledge, every known single-pass streaming algorithm for the general dynamic graph streams is indeed of this form (i.e., only maintains a linear sketch). In that case, our bounds would characterize the power of any single-pass streaming algorithm for the maximum matching problem in dynamic graph streams.

Acknowledgments

We would like to thank Michael Kapralov and David Woodruff for helpful discussions.

Appendix A Omitted Proofs

a.1 Omitted proofs from Lemma 3.3

Claim.

Suppose we assign balls to bins independently and uniformly at random. With probability at least , the number of non-empty bins is at least .

Proof.

For each bin, the probability that the bin is empty is at most,

We consider two cases. If ,

Hence the expected number of empty bins is at most , and by Markov inequality, with probability at least , the number of empty bins is at most .

If , since for ,

Hence the probability that a bin is non-empty is at least , and the expected number of non-empty bins is at least . Since a bin being non-empty is negatively correlated with other bins being non-empty, by the extended Chernoff bound, with probability at least , the number of non-empty bins is at least .

Hence over all, the number of non-empty bins is at least with probability at least .       

a.2 Omitted proofs from Lemma 4.3

Claim.

For any , .

Proof.

Let be the output of the function , and with slight abuse of notation, we let for some such that . We say is light iff . We have

 

Footnotes

  1. To the best of our knowledge the only exception is the recent paper [12], which considers a promised problem in dynamic graph streams. However, it is worth mentioning that for the non-promise version of the problem, the algorithm given in the same work can again be viewed as a linear sketching algorithm.
  2. We emphasize that the result in [32] is proven for the turnstile model rather than the strict turnstile model.
  3. One can simply substitute in following equations instead of and obtain the same result with a slight change in the constants.

References

  1. Bertinoro workshop 2014, problem 64. http://sublinear.info/index.php?title=Open_Problems:64. Accessed: 2015-05-1.
  2. Agarwal, P. K., Cormode, G., Huang, Z., Phillips, J. M., Wei, Z., and Yi, K. Mergeable summaries. ACM Trans. Database Syst. 38, 4 (2013), 26.
  3. Ahn, K. J., and Guha, S. Access to data and number of iterations: Dual primal algorithms for maximum matching under resource constraints. CoRR abs/1307.4359 (2013).
  4. Ahn, K. J., and Guha, S. Linear programming in the semi-streaming model with application to the maximum matching problem. Inf. Comput. 222 (2013), 59–79.
  5. Ahn, K. J., Guha, S., and McGregor, A. Analyzing graph structure via linear measurements. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms (2012), SODA ’12, SIAM, pp. 459–467.
  6. Ahn, K. J., Guha, S., and McGregor, A. Graph sketches: sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012 (2012), pp. 5–14.
  7. Alon, N., Matias, Y., and Szegedy, M. The space complexity of approximating the frequency moments. In STOC (1996), ACM, pp. 20–29.
  8. Alon, N., Moitra, A., and Sudakov, B. Nearly complete graphs decomposable into large induced matchings and their applications. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012 (2012), pp. 1079–1090.
  9. Alon, N., Nisan, N., Raz, R., and Weinstein, O. Welfare maximization with limited interaction. Electronic Colloquium on Computational Complexity (ECCC) 22 (2015), 54.
  10. Andoni, A., Nguyên, H. L., Polyanskiy, Y., and Wu, Y. Tight lower bound for linear sketches of moments. In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I (2013), pp. 25–32.
  11. Birk, Y., Linial, N., and Meshulam, R. On the uniform-traffic capacity of single-hop interconnections employing shared directional multichannels. IEEE Transactions on Information Theory 39, 1 (1993), 186–191.
  12. Chitnis, R. H., Cormode, G., Hajiaghayi, M. T., and Monemizadeh, M. Parameterized streaming: Maximal matching and vertex cover. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015 (2015), pp. 1234–1251.
  13. Crouch, M., and Stubbs, D. S. Improved streaming algorithms for weighted matching, via unweighted matching. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2014, September 4-6, 2014, Barcelona, Spain (2014), pp. 96–104.
  14. Dobzinski, S., Nisan, N., and Oren, S. Economic efficiency requires interaction. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014 (2014), pp. 233–242.
  15. Eggert, S., Kliemann, L., and Srivastav, A. Bipartite graph matchings in the semi-streaming model. In Algorithms - ESA 2009, 17th Annual European Symposium, Copenhagen, Denmark, September 7-9, 2009. Proceedings (2009), pp. 492–503.
  16. Epstein, L., Levin, A., Mestre, J., and Segev, D. Improved approximation guarantees for weighted matching in the semi-streaming model. SIAM J. Discrete Math. 25, 3 (2011), 1251–1265.
  17. Esfandiari, H., Hajiaghayi, M. T., Liaghat, V., Monemizadeh, M., and Onak, K. Streaming algorithms for estimating the matching size in planar graphs and beyond. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015 (2015), pp. 1217–1233.
  18. Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J. On graph problems in a semi-streaming model. Theor. Comput. Sci. 348, 2-3 (2005), 207–216.
  19. Fischer, E., Lehman, E., Newman, I., Raskhodnikova, S., Rubinfeld, R., and Samorodnitsky, A. Monotonicity testing over general poset domains. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada (2002), pp. 474–483.
  20. Frahling, G., Indyk, P., and Sohler, C. Sampling in dynamic data streams and applications. International Journal of Computational Geometry & Applications 18, 01n02 (2008), 3–28.
  21. Goel, A., Kapralov, M., and Khanna, S. On the communication and streaming complexity of maximum bipartite matching. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms (2012), SODA ’12, SIAM, pp. 468–485.
  22. Guruswami, V., and Onak, K. Superlinear lower bounds for multipass graph processing. In Proceedings of the 28th Conference on Computational Complexity, CCC 2013, K.lo Alto, California, USA, 5-7 June, 2013 (2013), pp. 287–298.
  23. Hardt, M., and Woodruff, D. P. How robust are linear sketches to adaptive inputs? In Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013 (2013), pp. 121–130.
  24. Huang, Z., Radunovic, B., Vojnovic, M., and Zhang, Q. Communication complexity of approximate matching in distributed graphs. In 32nd International Symposium on Theoretical Aspects of Computer Science, STACS 2015, March 4-7, 2015, Garching, Germany (2015), pp. 460–473.
  25. Jowhari, H., Sağlam, M., and Tardos, G. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (2011), ACM, pp. 49–58.
  26. Kapralov, M. Better bounds for matchings in the streaming model. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013 (2013), pp. 1679–1697.
  27. Kapralov, M., Khanna, S., and Sudan, M. Approximating matching size from random streams. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 (2014), pp. 734–751.
  28. Kapralov, M., Lee, Y. T., Musco, C., Musco, C., and Sidford, A. Single pass spectral sparsification in dynamic streams. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014 (2014), pp. 561–570.
  29. Konrad, C. Maximum matching in turnstile streams. Manuscript, May, 2015.
  30. Konrad, C., Magniez, F., and Mathieu, C. Maximum matching in semi-streaming with few passes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings (2012), pp. 231–242.
  31. Kushilevitz, E., and Nisan, N. Communication complexity. Cambridge University Press, 1997.
  32. Li, Y., Nguyen, H. L., and Woodruff, D. P. Turnstile streaming algorithms might as well be linear sketches. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014 (2014), pp. 174–183.
  33. McGregor, A. Finding graph matchings in data streams. In Approximation, Randomization and Combinatorial Optimization, Algorithms and Techniques, 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2005 and 9th InternationalWorkshop on Randomization and Computation, RANDOM 2005, Berkeley, CA, USA, August 22-24, 2005, Proceedings (2005), pp. 170–181.
  34. McGregor, A. Graph stream algorithms: a survey. SIGMOD Record 43, 1 (2014), 9–20.
  35. Motwani, R., and Raghavan, P. Randomized Algorithms. Cambridge University Press, 1995.
  36. Muthukrishnan, S. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science 1, 2 (2005).
  37. Ruzsa, I. Z., and Szemerédi, E. Triple systems with no six points carrying three triangles. Combinatorics (Keszthely, 1976), Coll. Math. Soc. J. Bolyai 18 (1978), 939–945.
  38. Schmidt, J. P., Siegel, A., and Srinivasan, A. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8, 2 (1995), 223–250.
  39. Woodruff, D. P. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science 10, 1-2 (2014), 1–157.
  40. Zelke, M. Weighted matching in the semi-streaming model. Algorithmica 62, 1-2 (2012), 1–20.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
   
Add comment
Cancel
Loading ...
10378
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description