Improved Approximation Guarantees for Weighted Matching
in the Semi-Streaming Model
We study the maximum weight matching problem in the semi-streaming model, and improve on the currently best one-pass algorithm due to Zelke (Proc. STACS ’08, pages 669–680) by devising a deterministic approach whose performance guarantee is . In addition, we study preemptive online algorithms, a sub-class of one-pass algorithms where we are only allowed to maintain a feasible matching in memory at any point in time. All known results prior to Zelke’s belong to this sub-class. We provide a lower bound of on the competitive ratio of any such deterministic algorithm, and hence show that future improvements will have to store in memory a set of edges which is not necessarily a feasible matching.
The computational task of detecting maximum weight matchings is one of the most fundamental problems in discrete optimization, attracting plenty of attention from the operations research, computer science, and mathematics communities. (For a wealth of references on matching problems see .) In such settings, we are given an undirected graph whose edges are associated with non-negative weights specified by . A set of edges is a matching if no two of the edges share a common vertex, that is, the degree of any vertex in is at most . The weight of a matching is defined as the combined weight of its edges, i.e., . The objective is to compute a matching of maximum weight. We study this problem in two related computational models: the semi-streaming model and the preemptive online model.
The semi-streaming model.
Even though these settings appear to be rather simple as first glance, it is worth noting that matching problems have an abundance of flavors, usually depending on how the input is specified. In this paper, we investigate weighted matchings in the semi-streaming model, was first suggested by Muthukrishnan . Specifically, a graph stream is a sequence of distinct edges, where is an arbitrary permutation of . When an algorithm is processing the stream, edges are revealed sequentially, one at a time. Letting and , efficiency in this model is measured by the space a graph algorithm uses, the time it requires to process each edge, and the number of passes it makes over the input stream. The main restriction is that the space is limited to bits of memory. We refer the reader to a number of recent papers [10, 3, 4, 2, 9] and to the references therein for a detailed literature review.
Online graph problems.
Unlike the semi-streaming model, in online problems the size of the underlying graph is not known in advance. The online matching problem has previously been modeled as follows. Edges are presented one by one to the algorithm, along with their weight. Once an edge is presented, we must make an irrevocable decision, whether to accept it or not. An edge may be accepted only if its addition to the set of previously accepted edges forms a feasible matching. In other words, an algorithm must keep a matching at all times, and its final output consists of all edges which were ever accepted. In this model, it is easy to verify that the competitive ratio of any (deterministic or randomized) algorithm exceeds any function of the number of vertices, meaning that no competitive algorithm exists. However, if all weights are equal, a greedy approach which accepts an edge whenever possible, has a competitive ratio of , which is best possible for deterministic algorithms .
Similarly to other online settings (such as call control problems ), a preemptive model can be defined, allowing us to remove a previously accepted edge from the current matching at any point in time; this event is called preemption. Nevertheless, an edge which was either rejected or preempted cannot be inserted to the matching later on. We point out that other types of online matching problems were studied as well [7, 6, 8, 1].
Comparison between the models.
Both semi-streaming algorithms and online algorithms perform a single pass over the input. However, unlike semi-streaming algorithms, online algorithms are allowed to concurrently utilize memory for two different purposes. The first purpose is obviously to maintain the current solution, which must always be a feasible matching, implying that the memory size of this nature is bounded by the maximal size of a matching. The second purpose is to keep track of arbitrary information regarding the past, without any concrete bound on the size of memory used. Therefore, in theory, online algorithms are allowed to use much larger memory than is allowed in the semi-streaming model. Moreover, although this possibility is rarely used, online algorithms may perform exponential time computations whenever a new piece of input is revealed. On the other hand, a semi-streaming algorithm may re-insert an edge the current solution, even if it has been temporarily removed, as long as this edge was kept in memory. This extra power is not allowed for online (preemptive) algorithms, making them inferior in this sense in comparison to their semi-streaming counterparts.
Feigenbaum et al.  were the first to study matching problems under similar assumptions. Their main results in this context were a semi-streaming algorithm that computes a -approximation in passes for maximum cardinality matching in bipartite graphs, as well as a one-pass -approximation for maximum weighted matching in arbitrary graphs. Later on, McGregor  improved on these findings, to obtain performance guarantees of and for the maximum cardinality and maximum weight versions, respectively, being able to handle arbitrary graphs with only a constant number of passes (depending on ). In addition, McGregor  tweaked the one-pass algorithm of Feigenbaum et al. into achieving a ratio of . Finally, Zelke  has recently attained an improved approximation factor of , which stands as the currently best one-pass algorithm. Note that the -approximation algorithm in  and the -approximation algorithm in  are preemptive online algorithms. On the other hand, the algorithm of Zelke  uses the notion of shadow-edges, which may be re-inserted into the matching, and hence it is not an online algorithm.
Main result I.
The first contribution of this paper is to improve on the above-mentioned results, by devising a deterministic one-pass algorithm in the semi-streaming model, whose performance guarantee is . In a nutshell, our approach is based on partitioning the edge set into weight classes, and computing a separate maximal matching for each such class in online fashion, using memory bits overall. The crux lies in proving that the union of these matchings contains a single matching whose weight compares favorably to the optimal one. The specifics of this algorithm are presented in Section 2.
Main result II.
Our second contribution is motivated by the relation between semi-streaming algorithms and preemptive online algorithms, which must maintain a feasible matching at any point in time. To our knowledge, there are currently no lower bounds on the competitive ratio that can be achieved by incorporating preemption. Thus, we also provide a lower bound of on the performance guarantee of any such deterministic algorithm. As a result, we show that improved one pass algorithms for this problem must store more than just a matching in memory. Further details are provided in Section 3.
2 The Semi-Streaming Algorithm
This section is devoted to obtaining main result I, that is, an improved one-pass algorithm for the weighted matching problem in the semi-streaming model. We begin by presenting a simple deterministic algorithm with a performance guarantee of . We then show how to randomize its parameters, still within the semi-streaming framework, and obtain an expected approximation ratio of . Finally, we de-randomize the algorithm by showing how to emulate the required randomness using multiple copies (constant number) of the deterministic algorithm, while paying an additional additive factor of at most , for any fixed .
2.1 A simple deterministic approach
We maintain the maximum weight of any edge seen so far in the input stream. Clearly, the maximum weight matching of the edges seen so far has weight in the interval . Note that if we disregard all edges with weight at most , the weight of the maximum weight matching in the resulting instance decreases by an additive term of at most .
Our algorithm has a parameter , and a value . We define weight classes of edges in the following way. For every , we let the class be the collection of edges whose weight is in the interval . We note that by our initial assumption, the weight of each edge is in the interval , and we say that a weight class is under consideration if its weight interval intersects . The number of classes which are under consideration at any point in time is .
Our algorithm simply maintains the list of classes under consideration and maintains a maximal (unweighted) matching for each such class. In other words, when the value of changes, we delete from the memory some of these matchings, corresponding to the classes which stop being under consideration. Note that to maintain a maximal matching in a given subgraph, we only need to check if the two endpoints of the new edge are not covered by existing edges of the matching.
To conclude, for every new edge we proceed as follows. We first check if is greater than the current value of . If so, we update and the list of weight classes under consideration accordingly. Then, we find the weight class of , and try to extend its corresponding matching, i.e., will be added to this matching if it remains a matching after doing so.
Note that at each point the content of the memory is the value and a collection of matchings, consisting of edges overall. Therefore, our algorithm indeed falls in the semi-streaming model.
At the conclusion of the input sequence, we need to return a single matching rather than a collection of matchings. To this end, we could compute a maximum weighted matching of the edges in the current memory. However, for the specific purposes of our analysis, we use the following faster algorithm. We sort the edges in memory in decreasing order of weight classes, such that the edges in appear before those in , for every . Using this sorted list of edges, we apply a greedy algorithm for selecting a maximal matching, in which the current edge is added to this matching if it remains a matching after doing so. Then, the post-processing time needed is linear in the size of the memory used, that is, . This concludes the presentation of the algorithm and its implementation as a semi-streaming algorithm.
For purposes of analysis, we round down the weight of each edge such that to be . This way, we obtain rounded edge weights. Now fix an optimal solution opt and denote by opt its weight, and by its rounded weight. The next claim immediately follows from the definition of .
As an intermediate step, we analyze an improved algorithm which keeps all weight classes. That is, for each , we use to denote the maximal matching of class at the end of the input, and denote by the solution obtained by this algorithm, if we would have applied it. Similarly, we denote by the set of edges in opt which belong to . For every , we define the set of vertices , associated with , to be the set of endpoints of edges in that are not associated with higher weight classes:
For a vertex , we define its associated weight to be . For vertices which do not belong to any , we let their associated weight be zero. We next bound the total associated weight of all the vertices.
The total associated weight of all the vertices is at most .
Consider a vertex and let be the edge in adjacent to . If then we charge the weight associated with to the edge . Thus, an edge is charged at most twice from vertices associated with its own weight class. Otherwise, if then there must be some other edge , for some , that prevented us from adding to , in which case we charge the weight associated with to . Notice that , for otherwise, would not be associated with . Thus, the edge must be of the form and can only be charged twice from vertices in weight class , once through and once through .
To bound the ratio between and the total associated weight of the vertices, it suffices to bound the ratio between the weight of an edge and the total associated weight of the vertices which are charged to . Assume that , then there are at most two vertices which are charged to and class for all , and no vertex is associated to and class for . Hence, the total associated weight of these vertices is at most
and the claim follows since . ∎
It remains to bound with respect to the total associated weight.
is at most the total weight associated with all vertices.
It suffices to show that for every edge the maximum of the associated weights of and is at least the rounded weight of . Suppose that this claim does not hold, then and are not covered by , as otherwise their associated weight would be at least . Hence, when the algorithm considered , we would have added to , contradicting our assumption that and are not covered by . ∎
Using the above sequence of lemmas, and recalling that we lose another in the approximation ratio due to disregarding edges of weight at most , we obtain the following inequality:
Therefore, we establish the following theorem.
Our simple deterministic algorithm has an approximation ratio of . This ratio can be optimized to by picking .
The next example demonstrates that the analysis leading to Theorem 2.4 is tight.
Let be some large enough integer and be sufficiently small. Consider the instance depicted in Figure 1, where consists of a single edge with weight . For every , the matching consists of exactly two edges and each of weight , and consists of two edges and each of weight . In addition, there are two edges and whose weight is . It is easy to see that each is indeed maximal in its own weight class. Given these matchings, our greedy selection rule will output a single edge with total weight (notice that computing a maximum weight matching in does not help when ). Moreover, the value of the optimal solution matches our upper bound up to an additive term.
2.2 Improved approximation ratio through randomization
In what follows, we analyze a randomized variant of the deterministic algorithm which was presented in the previous subsection. In general, this variant sets the value of to be where is a random variable. This method is commonly referred to as randomized geometric grouping.
Formally, let be a continuous random variable which is uniformly distributed on the interval . We define the weight class , and run the algorithm as in the previous subsection. Note that this algorithm uses only the partition of the edges into classes and not the precise values of their weights. In addition, we denote by the resulting matching obtained by the algorithm, and by the total associated weight of the vertices, where for a vertex we define its associated weight to be (i.e., the minimal value in the interval ). We also denote by the value of for this particular .
For any fixed value of , inequality (2.1) immediately implies Note that and are random variables, such that for each realization of the above inequality holds. Hence, this inequality holds also for their expected values. That is, we have established the following lemma where represents expectation with respect to the random variable .
We next lower bound opt in terms of .
We will show the corresponding inequality for each edge . We denote by the rounded weight of for a specific value of . Then, it suffices to show that . Let be an integer, and let be the value that satisfies . Then, for , , and for , , thus the expected rounded weight of over the choices of is
and the claim follows. ∎
Combining the above two lemmas we obtain that the expected weight of the resulting solution is at least . This approximation ratio is optimized for , where it is roughly . Hence, we have established the following theorem.
The randomized algorithm has an approximation ratio of roughly .
Prior to presenting our de-randomization, we slightly modify the randomized algorithm of the previous subsection. In this variation, instead of picking uniformly at random from the interval we pick uniformly at random from the discrete set . We apply the same method as in the previous section where we replace by . Then, using Lemma 2.6, we obtain . To extend Lemma 2.7 to this new setting, we note that can be obtained by first picking and then rounding it down to the largest number in which is at most . In this way, we couple the distributions of and . Now consider the rounded weight of an edge in opt in the two distinct values of and . The ratio between the two rounded weight is at most . Therefore, we establish that . Therefore, the resulting approximation ratio of the new variation is . By settinf to be large enough (picking is enough), the resulting approximation ratio is bounded by .
De-randomizing the new variation in the semi-streaming model is straightforward. We simply run in parallel all possible outcomes of the algorithm, one for each possible value of , and pick the best solution among the solutions we obtained. Since is a constant (for fixed values of ), the resulting algorithm is still a semi-streaming algorithm whose performance guarantee is . By scaling prior to applying the algorithm, we establish the following result.
For any fixed , there is a deterministic one-pass semi-streaming -approximation algorithm for the weighted matching problem. This algorithm processes each input edge in constant time and required time at the end of the input to compute the final output.
3 Online Preemptive Matching
In this section, we established the following theorem.
The competitive ratio of any deterministic preemptive online algorithm is at least , where is the unique real solution of the equation .
Definitions of some constants.
Let for some and assume that a deterministic online algorithm achieves a competitive ratio of at most . We construct an input graph iteratively, and show that after a finite number of steps, the competitive ratio is violated.
In the construction of the input, all edge weights come from two weight sequences. The main weight sequence is , and an additional weight function is . These sequences are defined as follows:
, and for .
The first sequence is defined for only as long as . As soon as , the sequence stops with , and the length of the sequence is . We later show that such a value must exist. Let (and ).
Properties of the sequences.
By definition, since , if , then holds as well. Note that for all , by definition, since , but . In addition, we have the following:
This equality holds for since
where the first equality holds by definition of , the second equality holds by definition of , and the third one by simple algebra. In addition,
The last equality holds for since
where the first equality holds by definition of , the second by definition of , the third by simple algebra, the fourth by definition of and , and the last one by definition of .
Input construction, step 1.
To better understand our construction, we advice the reader to consult Figure 2. The input is created in steps. In the initial step, two edges and , each of weight , are introduced. Assume that after both edges have arrived, the online algorithm holds the edge . All future edges either have endpoints which are new vertices, or in the set (i.e., they do not contain as an endpoint). An optimal solution keeps .
Input construction, properties.
Every future step can be of two distinct types, which will be described later on. Among the edges introduced below, vertices called denote endpoints which occur each on a single edge.
After step , the following invariants are maintained. The algorithm keeps a single edge denoted by . If , then . If , then this edge can be one of two edges, or . If , then its weight is , and an optimal solution has one edge of each weight . No future edges will have common endpoints with these edges, except, possibly, with the endpoint of the edge of weight (the edge of this weight which this optimal solution keeps is always ). Otherwise, , and its weight is , in which case an optimal solution can have edges of weights , except for one weight for some . This index is used in the definition of the next step, and the properties of the current step. In addition to these edges, the optimal solution also has the edge . Future edges will have endpoints which are new vertices, or in the set . In the last case, the vertex is equal to the vertex . The invariants clearly hold after the first step. We next define all other steps and show that the invariants hold for each option.
Input construction, step .
If , the last step consists of an edge of weight . Let , if and otherwise . The new edge is , where is a new vertex. This edge has a common endpoint with the edge that the algorithm has. In fact, the algorithm has an edge of weight at least , and thus we assume that it does not preempt it. If the algorithm has an edge of weight , the edge does not have as an endpoint, so adding the new edge to the optimal solution does not require the removal of any edges, and the profit of the optimal solution is . If the algorithm has an edge of weight , the new edge is . We replace the edge of the optimal solution by the new edge. In addition, the edge (where is the index such that the optimal solution before the modification of the current step does not have an edge of weight ) is added to the optimal solution, since the endpoint became free, and the endpoint only has degree 1. The profit of the optimal solution is again. Recall that , and hence the algorithm earns (in both cases) at most . Note also that the optimal solution has value of and if then we can drop the edge of this weight from the optimal solution and get a solution of value . Therefore, we will use as a lower bound on the value of the optimal solution in this case. Thus we will show later that .
Input construction, step , for .
We next show how to construct the edges of step , for the case . We introduce two new edges of weight . Let , if and otherwise . The new edges are , and , where and are new vertices. Both these edges have a common endpoint with the edge that the algorithm has, and the algorithm can either preempt the edge it has, in which case we assume (without loss of generality) that it now has , or else it keeps the previous edge. If the algorithm keeps the previous edge, let , if and otherwise . In this case a third edge, , which has a weight of , is introduced. The vertex is new.
There are four cases to consider. In the first case, if the algorithm replaces the edge with the edge , then an optimal solution can add the edge to its edges, since the endpoint is new, and the endpoint was introduced in the previous step, in which the optimal solution obtained the edge .
If the algorithm replaces the edge with the edge , an optimal solution can remove the edge from its solution and add the two edges and (where is the index such that the optimal solution before the modification of the current step does not have an edge of weight ). This is possible since the endpoints and do not have other edges, and the endpoints and become free.
In the last two cases, the invariants hold. For the remaining two cases note that if or and the algorithm has a single edge of weight or , respectively, then the optimal solution is strictly positive and the value of the algorithm is non-positive, and hence the resulting approximation ratio in this case is unbounded. Hence, we can assume without loss of generality that if the algorithm has a single edge at the end of step , then its weight is strictly positive.
If the algorithm does not replace the edge with the edge , we show that it must replace it with the edge . Assume that this is not the case. Then the profit of the algorithm is and the optimal solution can omit its edge and add the edges and (since all these endpoints are introduced in steps and , except for , which becomes free). Thus the profit of the optimal algorithm is , while the profit of the online algorithm is . Thus, the algorithm must switch to the edge , and the structure of the optimal solution is according to the invariants.
If the algorithm does not replace the edge with the edge , we show that it must replace it with the edge . Assume that this is not the case. Then the profit of the algorithm is and the optimal solution can omit its edge and add the edges and (since and become free, and the other two endpoints are introduced in step ). Thus the profit of the optimal algorithm is , where and , since as , we get that the optimal profit is at least , while the profit of the online algorithm is . Thus, the algorithm must switch to the edge , and the structure of the optimal solution is according to the invariants.
Bounding the competitive ratio.
We next define a recursive formula for . By the definition of the sequence , we have
We first use this recurrence to show that if then . To see this note that by assumption , hence using the recurrence formula we conclude that
which is equivalent to , so , and we conclude that , as we argued. Therefore, it remains to show that there is a value of such that . To establish this claim, it suffices to show that there is a value of for which (since ). To prove this last claim, we will show that there is a value of such that . Finally, to show the existence of such , we will solve the linear homogeneous recurrence formula, and use the explicit form of to show that there is a value of such that .
To solve the recurrence formula (3.1), we guess solutions of the form for all , and get the following quadratic equation for :
We solve this quadratic equation and get its solutions
Note that using , and recalling that is the unique real solution of the equation , we conclude that and hence the two solutions are complex numbers whose imaginary parts are not zero. Since we got two distinct solutions of , it is known that the recurrence formula (3.1) is solved by a formula of the form where and are constants. We find the value of and using the conditions and . So we get the following set of two equations: (corresponding to ), and (corresponding to ). From the first equation we conclude that , and using this we obtain . Hence, the closed form solution of for values of is as follows.
We use the notation , and let . As noted above , and hence is a real number. We also define and such that , and also , then we get the following formula for .
Note that for all , and hence to show that the sequence changes its sign as we required, it suffices to show that the sequence changes its sign, but this last claim holds because (as the solutions and are not real numbers). Hence, the claim follows.
-  N. Bansal, N. Buchbinder, A. Gupta, and J. Naor. An -competitive algorithm for metric bipartite matching. In Proceedings of the 15th Annual European Symposium on Algorithms, pages 522–533, 2007.
-  M. Elkin and J. Zhang. Efficient algorithms for constructing -spanners in the distributed and streaming models. Distributed Computing, 18(5):375–385, 2006.
-  J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207–216, 2005.
-  J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709–1727, 2008.
-  J. A. Garay, I. S. Gopal, S. Kutten, Y. Mansour, and M. Yung. Efficient on-line call control algorithms. Journal of Algorithms, 23(1):180–194, 1997.
-  B. Kalyanasundaram and K. Pruhs. Online weighted matching. Journal of Algorithms, 14(3):478–488, 1993.
-  R. M. Karp, U. V. Vazirani, and V. V. Vazirani. An optimal algorithm for on-line bipartite matching. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, pages 352–358, 1990.
-  S. Khuller, S. G. Mitchell, and V. V. Vazirani. On-line algorithms for weighted bipartite matching and stable marriages. Theoretical Computer Science, 127(2):255–267, 1994.
-  A. McGregor. Finding graph matchings in data streams. In Proceedings of the 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, pages 170–181, 2005.
-  S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science. Now Publishers Inc, 2005.
-  A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer, 2003.
-  M. Zelke. Weighted matching in the semi-streaming model. In Proceedings of the 25th Annual Symposium on Theoretical Aspects of Computer Science, pages 669–680, 2008.