Communication complexity of approximate maximum matching in the message-passing model
We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications.
The input to the problem is a graph that has vertices and the set of edges partitioned over sites, and an approximation ratio parameter . The output is required to be a matching in that has to be reported by one of the sites, whose size is at least factor of the size of a maximum matching in .
We show that the communication complexity of this problem is information bits. This bound is shown to be tight up to a factor, by constructing an algorithm, establishing its correctness, and an upper bound on the communication cost. The lower bound also applies to other graph combinatorial problems in the message-passing communication model, including max-flow and graph sparsification.
Complex and massive volume data processing requires to scale out to parallel and distributed computation platforms. Scalable distributed computation algorithms are needed that make efficient use of scarce system resources such as communication bandwidth between compute nodes in order to avoid the communication network becoming a bottleneck. A particular interest has been devoted to studying scalable computation methods for graph data, which arises in a variety of applications including online services, online social networks, biological, and economic systems.
In this paper, we consider the distributed computation problem of finding an approximate maximum matching in an input graph whose edges are partitioned over different compute nodes (we refer to as sites). Several performance measures are of interest including the communication complexity in terms of the number of bits or messages, the time complexity in terms of the number of rounds, and the storage complexity in terms of the number of bits. In this paper we focus on the communication complexity. Our main result is a tight lower bound on the communication complexity for approximate maximum matching.
We assume a multi-party message-passing communication model , we refer to as message-passing model, which is defined as follows. The message-passing model consists of sites , , , . The input is partitioned across sites, with sites , , , holding pieces of input data , , , , respectively. The goal is to design a communication protocol for the sites to jointly compute the value of a given function at point . The sites are allowed to have point-to-point communications between each other. At the end of the computation, at least one site should return the answer. The goal is to find a protocol that minimizes the total communication cost between the sites.
For technical convenience, we introduce another special party called the coordinator. The coordinator does not have any input. We require that all sites can only talk with the coordinator, and at the end of the computation, the coordinator should output the answer. We call this model the coordinator model. See Figure 1 for an illustration. Note that we have essentially replaced the clique communication topology with a star topology, which increases the total communication cost only by a factor of and thus, it does not affect the order of the asymptotic communication complexity.
The edge partition of an input graph over sites is defined by a partition of the set of edges in disjoint sets , , , , and assigning each set of edges to site . For bipartite graphs with a set of left vertices and a set of right vertices, we define an alternative way of an edge partition, referred to as the left vertex partition, as follows: the set of left vertices are partitioned in disjoints parts, and all the edges incident to one part is assigned to a unique site. Note that left vertex partition is more restrictive, in the sense that any left vertex partition is an instance of an edge partition. Thus, lower bounds holds in this model are stronger as designing algorithms might be easier in this restrictive setting. Our lower bound is proved for left vertex partition model, while our upper bound holds for an arbitrary edge partition of any graph.
1.1Summary of results
We study the approximate maximum matching problem in the message-passing model which we refer to as Distributed Matching Reporting (DMR) that is defined as follows: given as input is a graph with vertices and a parameter ; the set of edges is arbitrarily partitioned into subsets such that is assigned to site ; the coordinator is required to report an -approximation of the maximum matching in graph .
In this paper, we show the following main theorem.
In this paper we are more interested in the case when , since otherwise the trivial lower bound of bits (the number of bits to describe a maximum matching) is already near-optimal.
For DMR, a seemingly weaker requirement is that, at the end of the computation, each site outputs a set of edges such that is a matching of size that is at least factor of a maximum matching. However, given such an algorithm, each site might just send to the coordinator after running the algorithm, which will increase the total communication cost by at most an additive term of . Therefore, our lower bound also holds for this setting.
A simple greedy distributed algorithm solves DMR for with the communication cost of bits. This algorithm is based on computing a maximal matching in graph . A maximal matching is a matching whose size cannot be enlarged by adding one or more edges. A maximal matching is computed using a greedy sequential procedure defined as follows. Let be the graph induced by a subset of edges . Site computes a maximal matching in , and sends it to via the coordinator. Site then computes a maximal matching in by greedily adding edges in to , and then sends to site . This procedure is continued and it is completed once site computed and sent it to the coordinator. Notice that is a maximal matching in graph , hence it is a -approximation of a maximum matching in . The communication cost of this protocol is bits because the size of each is at most edges and each edge’s identifier can be encoded with bits. This shows that our lower bound is tight up to a factor. This protocol is essentially sequential and takes rounds in total. We show that Luby’s classic parallel algorithm for maximal matching  can be easily adapted to our model with rounds of computation and bits of communication.
In Section 4, we show that our lower bound is also tight with respect to the approximation ratio parameter for any up to a factor. It was shown in  that many statistical estimation problems and graph combinatorial problems require bits of communication to obtain an exact solution. Our lower bound shows that for DMR even computing a constant approximation requires this amount of communication.
The lower bound established in this paper applies also more generally for a broader range of graph combinatorial problems. Since a bipartite maximum matching problem can be found by solving a max-flow problem, our lower bound also holds for approximate max-flow. Our lower bound also implies a lower bound for graph sparsification problem; see  for definition. This is because in our lower bound construction (see Section 3), the bipartite graph under consideration contains many cuts of size which have to be included in any sparsifier. By our construction, these edges form a good approximate maximum matching, and thus any good sparsifier recovers a good matching. In , it was shown that there is a sketch-based -approximate graph sparsification algorithm with the sketch size of bits, which directly translates to an approximation algorithm of communication in our model. Thus, our lower bound is tight up to a poly-logarithmic factor for the graph sparsification problem.
We briefly discuss the main ideas and techniques of our proof of the lower bound for DMR. As a hard instance, we use a bipartite graph with . Each site holds a set of vertices which is a partition of the set of left vertices . The neighbors of each vertex in is determined by a two-party set-disjointness instance (DISJ, defined formally in Section 3.2). There are in total DISJ instances, and we want to perform a direct-sum type of argument on these DISJ instances. We show that due to symmetry, the answer of DISJ can be recovered from a reported matching, and then use information complexity to establish the direct-sum theorem. For this purpose, we use a new definition of the information cost of a protocol in the message-passing model.
We believe that our techniques would prove useful to establish the communication complexity for other graph combinatorial problems in the message-passing model. The reason is that for many graph problems whose solution certificates “span” the whole graph (e.g., connected components, vertex cover, dominating set, etc), it is natural that a hard instance would be like for the maximum matching problem, i.e., each of the sites would hold roughly vertices and the neighbourhood of each vertex would define an independent instance of a two-party communication problem.
The problem of finding an approximate maximum matching in a graph has been studied for various computation models, including the streaming computation model , MapReduce computation model , and a traditional distributed computation model known as computation model.
In , the maximum matching was presented as one of open problems in the streaming computation model. Many results have been established since then by various authors , , , , , , , , , , and . Many of the studies were concerned with a streaming computation model that allows for space; referred to as the semi-streaming computation model. The algorithms developed for the semi-streaming computation model can be directly applied to obtain a constant-factor approximation of maximum matching in a graph in the message-passing model that has a communication cost of bits.
For approximate maximum matching problem in the MapReduce model,  gave a -approximation algorithm, which requires a constant number of rounds and uses bits of communication, for any input graph with edges.
The approximate maximum matching has been studied in the computation model by various authors . In this computation model, each processor corresponds to a unique vertex of the graph and edges represent bidirectional communications between processors. The time advances over synchronous rounds. In each round, every processor sends a message to each of its neighbours, and then each processor performs a local computation using as input its local state and the received messages. Notice that in this model, the input graph and the communication topology are the same, while in the message-passing model the communication topology is essentially a complete graph which is different from the input graph and, in general, sites do not correspond to vertices of the topology graph.
A variety of graph and statistical computation problems have been recently studied in the message-passing model , , , , . A wide range of graph and statistical problems has been shown to be hard in the sense of requiring bits of communication, including graph connectivity , exact counting of distinct elements , and -party set-disjointness . Some of these problems have been shown to be hard even for random order inputs .
In , it has been shown that the communication complexity of the -party set-disjointness problem in the message-passing model is bits. This work was independent and concurrent to ours. Incidentally, it uses a similar but different input distribution to ours. Similar input distributions were also used in previous work such as  and . This is not surprising because of the nature of the message-passing model. There may exist a reduction between the -party set-disjointness and DMR but showing this is non-trivial and would require a formal proof. The proof of our lower bound is different in that we use a reduction of the -party DMR to a -party set-disjointness using a symmetrisation argument, while  uses a coordinative-wise direct-sum theorem to reduce the -party set-disjointness to a -party -bit problem.
The approximate maximum matching has been recently studied in the coordinator model under additional condition that the sites send messages to the coordinator simultaneously and once, referred to as the simultaneous-communication model. The coordinator then needs to report the output that is computed using as input the received messages. It has been shown in  that for the vertex partition model, our lower bound is achievable by a simultaneous protocol for any up to a poly-logarithmic factor.
The communication/round complexity of approximate maximum matching has been studied in the context of finding efficient economic allocations of items to agents, in markets that consist of unit-demand agents in a distributed information model where agents’ valuations are unknown to a central planner, which requires communication to determine an efficient allocation. This amounts to studying the communication or round complexity of approximate maximum matching in a bipartite graph that defines preferences of agents over items. In a market with agents and items, this amounts to approximate maximum matching in the -party model with a left vertex partition.  and  studied this problem in the so called blackboard communication model, where messages sent by agents can be seen by all agents. For one-round protocols,  established a tight trade-off between message size and approximation ratio. As indicated by the authors in , their randomized lower bound is actually a special case of ours. In a follow-up work,  obtained the first non-trivial lower bound on the number of rounds for general randomized protocols.
In Section 2 we present some basic concepts of probability and information theory, communication and information complexity that are used throughout the paper. Section 3 presents the lower bound and its proof, which is the main result of this paper. Section 4 establishes the tightness of the lower bound up to a poly-logarithmic factor. Finally, in Section 5, we conclude.
2.1Basic facts and notation
Let denote the set , for given integer . All logarithms are assumed to have base . We use capital letters to denote random variables and the lower case letters to denote specific values of respective random variables .
We write to mean that is a random variable with distribution , and to mean that is a sample from distribution . For a distribution on a domain , and , we write to denote the conditional distribution of given .
For any given probability distribution and positive integer , we denote with the -fold product distribution of , i.e. the distribution of independent and identically distributed random variables according to distribution .
We will use the following distances between two probability distributions and on a discrete set : (a) the total variation distance defined as
and, (b) the Hellinger distance defined as
The total variation distance and Hellinger distance satisfy the following relation:
With a slight abuse of notation for two random variables and , we write and in lieu of and , respectively.
We will use the the following two well-known inequalities.
Hoeffding’s inequality Let be the sum of independent and identically distributed random variables that take values in . Then, for any ,
Chebyshev’s inequality Let be a random variable with variance . Then, for any ,
For two random variables and , let denote the Shannon entropy of the random variable , and let denote the conditional entropy of given . Let denote the mutual information between and , and let denote the conditional mutual information given . The mutual information between any and is non negative, i.e. , or equivalently, .
We will use the following relations from the information theory:
Chain rule for mutual information For any jointly distributed random variables , and ,
Data processing inequality If and are conditionally independent random variables given , then
Super-additivity of mutual information If are independent random variables, then
Sub-additivity of mutual information If are conditionally independent random variables given , then
In the two party communication complexity model two players, Alice and Bob, are required to jointly compute a function . Alice is given and Bob is given , and they want to jointly compute the value of by exchanging messages according to a randomized protocol .
We use to denote the random transcript (i.e., the concatenation of messages) when Alice and Bob run on the input , and to denote the output of the protocol. When the input is clear from the context, we will simply use to denote the transcript. We say that is a -error protocol if for every input , the probability that is not larger than , where the probability is over the randomness used in . We will refer to this type of error as worst-case error. An alternative and weaker type of error is the distributional error, which is defined analogously for an input distribution, and where the error probability is over both the randomness used in the protocol and the input distribution.
Let denote the length of the transcript in information bits. The communication cost of is
The -error randomized communication complexity of , denoted by , is the minimal cost of any -error protocol for .
The multi-party communication complexity model is a natural generalization to parties, where each party has a part of the input, and the parties are required to jointly compute a function by exchanging messages according to a randomized protocol.
For more information about communication complexity, we refer the reader to .
The communication complexity quantifies the number of bits that need to be exchanged by two or more players in order to compute some function together, while the information complexity quantifies the amount of information of the inputs that must be revealed by the protocol. The information complexity has been extensively studied in the last decade, e.g., . There are several definitions of information complexity. In this paper, we follow the definition used in . In the two-party case, let be a distribution on , we define the information cost of measured under as
where and is the public randomness used in . For notational convenience, we will omit the subscript of and simply use to denote the information cost of . It should be clear that is a function of for a fixed protocol . Intuitively, this measures how much information of and is revealed by the transcript . For any function , we define the information complexity of parametrized by and as
2.5Information complexity and coordinator model
We can indeed extend the above definition of information complexity to -party coordinator model. That is, let be the input of player with and be the whole transcript, then we could define . However, such a definition does not fully explore the point-to-point communication feature of the coordinator model. Indeed, the lower bound we can prove using such a definition is at most what we can prove under the blackboard model and our problem admits a simple algorithm with communication in the blackboard model. In this paper we give a new definition of information complexity for the coordinator model, which allows us to prove higher lower bounds compared with the simple generalization. Let be the transcript between player and the coordinator, thus . We define the information cost for a function with respect to input distribution and the error parameter in the coordinator model as
For any protocol , the expected size of its transcript is (we abuse the notation by using also for the transcript) The theorem then follows because the worst-case communication cost is at least the average-case communication cost.
The statement directly follows from the data processing inequality because given , is fully determined by the random coins used, and is thus independent of .
The lower bound is established by constructing a hard distribution for the input bipartite graph such that .
We first discuss the special case when the number of sites is equal to , and each site is assigned one unique vertex in together with all its adjacent edges. We later discuss the general case.
A natural approach to approximately compute a maximum matching in a graph is to randomly sample a few edges from each site, and hope that we can find a good matching using these edges. To rule out such strategies, we construct random edges as follows.
We create a large number of noisy edges by randomly picking a small set of nodes of size roughly and connect each node in to each node in independently at random with a constant probability. Note that there are such edges and the size of any matching that can be formed by these edges is at most , which we will show to be asymptotically , where OPT is the size of a maximum matching.
We next create a set of important edges between and such that each node in is adjacent to at most one random node in . These edges are important in the sense that although there are only of them, the size of a maximum matching they can form is large, of the order . Therefore, to compute a matching of size at least , it is necessary to find and include important edges.
We then show that finding an important edge is in some sense equivalent to solving a set-disjointness (DISJ) instance, and thus, we have to solve DISJ instances. The concrete implementation of this intuition is via an embedding argument.
In the general case, we create independent copies of the above random bipartite graph, each with vertices, and assign vertices to each site (one from each copy). We then prove a direct-sum theorem using information complexity.
In the following, we introduce the two-party AND problem and the two-party DISJ problem. These two problems have been widely studied and tight bounds are known (e.g. ). For our purpose, we need to prove stronger lower bounds for them. We then give a reduction from DISJ to DMR and prove an information cost lower bound for DMR in Section 3.3.
3.1The two-party And problem
In the two-party AND communication problem, Alice and Bob hold bits and respectively, and they want to compute the value of the function AND.
Next we define input distributions for this problem. Let be random variables corresponding to the inputs of Alice and Bob respectively. Let be a parameter. Let denote the probability distribution of a Bernoulli random variable that takes value with probability or value with probability . We define two input probability distributions and for as follows.
Sample , and then set the value of as follows: if , let and ; otherwise, if , let , and . Thus, we have
Sample , and then choose as above (i.e. sample according to ). Then, reset the value of to be or with equal probability (i.e. set ).
Here is an axillary random variable to break the dependence of and , as we can see and are not independent, but conditionally independent given . Let be the probability that under distribution , which is .
For the special case , by , it is shown that, for any private coin protocol with worst-case error probability , the information cost
where the information cost is measured with respect to and is the random variable corresponding to . Note that the above mutual information is different from the definition of information cost; it is referred to as conditional information cost in . It is smaller than the standard information cost by data processing inequality ( and are conditionally independent given ). For a fixed protocol , the joint probability distribution is determined by the distribution of and so is . Therefore, when we say the (conditional) information cost is measured w.r.t. , it means that the mutual information, , is measured under the joint distribution determined by .
The above lower bound might seem counterintuitive, as the answer to AND is always under the input distribution and a protocol can just output which does not reveal any information. However, such a protocol will have worst-case error probability , i.e., it is always wrong when the input is , contradicting the assumption. When distributional error is considered, the (distributional) error and information cost can be measured w.r.t. different input distributions. In our case, the error will be measured under and the information cost will be measured under , and we will prove that any protocol having small error under must incur high information cost under .
We next derive an extension that generalizes the result of  to any and distributional errors. We will also use the definition of one-sided error.
Recall that is the probability that when , which is . Recall that , and thus . Note that a distributional error of under is trivial, as a protocol that always outputs achieves this (but it has one-sided error ). Therefore, for two-sided error, we will consider protocols with error probability slightly better than the trivial protocol, i.e., with error probability for some .
If we set , the first part of Theorem ? recovers the result of .
We will use to denote the transcript when the input is . By definition,
With a slight abuse of notation, in (Equation 1), and are random variables with distributions and , respectively.
For any random variable with distribution , the following two inequalities were established in :
where is the Hellinger distance between two random variables and .
We can apply these bounds to lower bound the term . However, we cannot apply them to lower bound the term when because then the distribution of is not . To lower bound the term , we will use the following well-known property, whose proof can be found in the book  (Theorem 2.7.4).
Hence, the mutual information is a concave function of the distribution of , since the distribution of is fixed given .
Recall that is the probability distribution that takes value with probability and takes value with probability . Note that can be expressed as a convex combination of and (always taking value ) as follows: . (Recall that is assumed to be smaller than .) Let and . Then, using Lemma ?, we have
where the last inequality holds by (Equation 3) and non-negativity of mutual information.
Thus, we have
where the last inequality holds because .
We next show that if is a protocol with error probability smaller than or equal to under distribution , then
which together with other above relations implies the first part of the theorem.
By the triangle inequality,
where the last equality is from the cut-and-paste lemma in  (Lemma 6.3).
Thus, we have
where the last inequality is by the triangle inequality.
Similarly, it holds that
Let denote the error probability of and denote the error probability of conditioned on that the input is . Recall . We have
and clearly . Let be the output of when the input is , which is also a random variable. Note that
where denote the total variation distance between probability distributions of random variables and . Using Lemma ?, we have
By the same arguments, we also have
By (Equation 8), we have
From the Cauchy-Schwartz inequality, it follows
Hence, we have
which combined with (Equation 4) establishes the first part of the theorem.
We now go on to prove the second part of the theorem. Assume has a one-sided error , i.e., it outputs with probability at least if the input is , and always output correctly otherwise. To boost the success probability, we can run parallel instances of the protocol and answer if and only if there exists one instance which outputs . Let be this new protocol, and it is easy to see that it has a one-sided error of . By setting , it is at most , and thus the (two-sided) distributional error of under is smaller than . By the first part of the theorem, we know . We also have
where the inequality follows from the sub-additivity and the fact that are conditionally independent of each other given and . Thus, we have .
3.2The two-party Disj communication problem
The two-party DISJ communication problem with two players, Alice and Bob, who hold strings of bits and , respectively, and they want to compute
By interpreting and as indicator vectors that specify subsets of , DISJ if and only if the two sets represented by and are disjoint. Note that this accommodates the AND problem as a special case when .
Let be Alice’s input and be Bob’s input. We define two input distributions and for as follows.
For each , independently sample , and let be the corresponding auxiliary random variable (see the definition of ). Define .
Let , then pick uniformly at random from , and reset to be or with equal probability. Note that , and the probability that DISJ is equal to .
We define the one-sided error for DISJ similarly: A protocol has a one-sided error for DISJ if it is always correct when DISJ, and is correct with probability at least when DISJ.
We first consider the two-sided error case. Let be a protocol for DISJ with distributional error under . Consider the following reduction from AND to DISJ. Alice has input , and Bob has input . They want to decide the value of . They first publicly sample , and embed in the -th position, i.e. set and . Then they publicly sample according to for all . Let . Conditional on , they sample such that for each . Note that this step can be done using only private randomness, since, in the definition of , and are independent given . Then they run the protocol on the input and output whatever outputs. Let denote this protocol for AND. Let be the corresponding random variables of respectively. It is easy to see that if , then , and thus the distributional error of is under . The public coins used in include , and the public coins of .
We first analyze the information cost of under . We have
where (Equation 13) is by the supper-additivity of mutual information, ( ?) holds because when the conditional distribution of given is the same as the distribution of , and ( ?) follows from Theorem ? using the fact that has error under .
We have established that when , it holds
We now consider the information cost when . Recall that to sample from , we first sample , and then pick uniformly at random from and reset to or with equal probability. Let be the indicator random variable of the event that the last step does not change the value of .
We note that for any jointly distributed random variables , , and ,
To see this note that by the chain rule for mutual information, we have
Combining the above two equalities, (Equation 15) follows by the facts and .
Let and . We have
The proof for the one-sided error case is the same, except that we use the one-sided error lower bound in Theorem ? to bound ( ?).
3.3Proof of Theorem
Here we provide a proof of Theorem ?. The proof is based on a reduction of DISJ to DMR. We first define the hard input distribution that we use for DMR.
The input graph is assumed to be a random bipartite graph that consists of disjoint, independent and identically distributed random bipartite graphs , , , . Each bipartite graph has the set of left vertices and the set of right vertices, both of cardinality . The sets of edges , , , are defined by a random variable that takes values in such that whether or not is an edge in is indicated by .
The distribution of is defined as follows. Let , , , be independent and identically distributed random variables with distribution .
We will use the following notation:
where each , and is the th bit. In addition, we will also use the following notation:
Note that is the input to DMR, and is not part of the input for DMR, but it is used to construct .
The edge partition of input graph over sites , , , is defined by assigning all edges incident to vertices , ,, to site , or equivalently gets . See Figure 2 for an illustration.
Input Reduction Let be Alice’s input and be Bob’s input for DISJ. We will first construct an input of DMR from , which has the above hard distribution. In this reduction, in each bipartite graph , we carefully embed instances of DISJ. The output of a DISJ instance determines whether or not a specific edge in the graph exists. This amounts to a total of DISJ instances embedded in graph . The original input of Alice and Bob is embedded at a random position, and the other instances are sampled by Alice and Bob using public and private random coins. We then argue that if the original DISJ instance is solved, then with a sufficiently large probability, at least of the embedded DISJ instances are solved. Intuitively, if a protocol solves an DISJ instance at a random position with high probability, then it should solve many instances at other positions as well, since the input distribution is completely symmetric. We will see that the original DISJ instance can be solved by using any protocol solving DMR, the correctness of which also relies on the symmetric property.
Alice and Bob construct an input for DMR as follows:
Alice and Bob use public coins to sample an index from a uniform distribution on . Alice constructs the input for site , and Bob constructs input for other sites (see Figure| Figure 3).
Alice and Bob use public coins to sample an index from a uniform distribution on .
is sampled as follows: Alice sets , and Bob sets . Bob privately samples
For each , is sampled as follows:
Alice and Bob use public coins to sample .
Alice and Bob privately sample and from and , respectively. Bob privately and independently samples
Alice privately draws an independent sample from a uniform distribution on , and resets to or with equal probability. As a result, . For each , Bob privately draws a sample from a uniform distribution on and resets to a sample from .
Note that the input of site is determined by the public coins, Alice’s input and her private coins. The inputs are determined by the public coins, Bob’s input and his private coins.
Let denote the distribution of when is chosen according to the distribution .
Let be the approximation ratio parameter. We set in the definition of .
Given a protocol for DMR that achieves an -approximation with the error probability at most under , we construct a protocol for DISJ that has a one-sided error probability of at most as follows.
Given input , Alice and Bob construct an input for DMR as described by the input reduction above. Let be the samples used for the construction of . Let be the two indices sampled by Alice and Bob in the reduction procedure.
With Alice simulating site and Bob simulating other sites and the coordinator, they run on the input defined by . Any communication between site and the coordinator will be exchanged between Alice and Bob. For any communication among other sites and the coordinator, Bob just simulates it without any actual communication. At the end, the coordinator, that is Bob, obtains a matching .
Bob outputs if, and only if, for some ,