Optimal Analysis of an Online Algorithm for the Bipartite Matching Problem on a Line

Optimal Analysis of an Online Algorithm for the Bipartite Matching Problem on a Line

Sharath Raghvendra
Abstract

In the online metric bipartite matching problem, we are given a set of server locations in a metric space. Requests arrive one at a time, and on its arrival, we need to immediately and irrevocably match it to a server at a cost which is equal to the distance between these locations. A -competitive algorithm will assign requests to servers so that the total cost is at most times the cost of where is the minimum cost matching between and .

We consider this problem in the adversarial model for the case where and are points on a line and . We improve the analysis of the deterministic Robust Matching Algorithm (RM-Algorithm, Nayyar and Raghvendra FOCS’17) from to an optimal . Previously, only a randomized algorithm under a weaker oblivious adversary achieved a competitive ratio of (Gupta and Lewi, ICALP’12). The well-known Work Function Algorithm (WFA) has a competitive ratio of and for this problem. Therefore, WFA cannot achieve an asymptotically better competitive ratio than the RM-Algorithm.

Bipartite Matching, Online Algorithms, Adversarial Model, Line Metric
\hideLIPIcs

Virginia Tech
Blacksburg, USAsharathr@vt.edu\CopyrightSharath Raghvendra\subjclassTheory of Computation Design and Analysis of Algorithms Online Algorithms\category\relatedversion\supplement\fundingThis work is supported by a NSF CRII grant NSF-CCF 1464276\EventEditorsBettina Speckmann and Csaba D. Tóth \EventNoEds2 \EventLongTitle34th International Symposium on Computational Geometry (SoCG 2018) \EventShortTitleSoCG 2018 \EventAcronymSoCG \EventYear2018 \EventDateJune 11–14, 2018 \EventLocationBudapest, Hungary \EventLogosocg-logo \SeriesVolume99 \ArticleNo67

1 Introduction

Driven by consumers’ demand for quick access to goods and services, business ventures schedule their delivery in real-time, often without the complete knowledge of the future request locations or their order of arrival. Due to this lack of complete information, decisions made tend to be sub-optimal. Therefore, there is a need for competitive online algorithms which immediately and irrevocably allocate resources to requests in real-time by incurring a small cost.

Motivated by these real-time delivery problems, we study the problem of computing the online metric bipartite matching of requests to servers. Consider servers placed in a metric space where each server has a capacity that restricts how many requests it can serve. When a new request arrives, one of the servers with positive capacity is matched to this request. After this request is served, the capacity of the server reduces by one. We assume that the cost associated with this assignment is a metric cost; for instance, it could be the minimum distance traveled by the server to reach the request.

The case where the capacity of every server is is the celebrated -server problem. The case where every server has a capacity of is the online metric bipartite matching problem. In this case, the requests arrive one at a time, we have to immediately and irrevocably match it to some unmatched server. The resulting assignment is a matching and is referred to as an online matching. An optimal assignment is impossible since an adversary can easily fill up the remaining locations of requests in in a way that our current assignment becomes sub-optimal. Therefore, we want our algorithm to compute an assignment that is competitive with respect to the optimal matching. For any input and any arrival order of requests in , we say our algorithm is -competitive, for , when the cost of the online matching is at most times the minimum cost, i.e.,

Here is the minimum-cost matching of the locations in and . In the above discussion, note the role of the adversary. In the adversarial model, the adversary knows the server locations and the assignments made by the algorithm and generates a sequence to maximize . In the weaker oblivious adversary model, the adversary knows the randomized algorithm but does not know the random choices made by the algorithm. In this paper, we consider the online metric bipartite matching problem in the adversarial model and where and are points on a line.

Consider the adversarial model. For any request, the greedy heuristic simply assigns the closest unmatched server to it. The greedy heuristic, even for the line metric, is only -competitive [3] for the online matching problem. The well-known Work Function Algorithm (WFA) chooses a server that minimizes the sum of the greedy cost and the so-called retrospective cost. For the -server problem, the competitive ratio of the WFA is which is near optimal with a lower bound of on the competitive ratio of any algorithm in any metric space that has at least points [7].

In the context of online metric matching problem, there are algorithms that achieve a competitive ratio of in the adversarial model [9, 5, 4]. This competitive ratio is worst-case optimal, i.e., there exists a metric space where we cannot do better. However, for Euclidean metric, for a long time, there was a stark contrast between the upper bound of and the lower bound of . Consequently, significant effort has been made to study the performance of online algorithms in special metric spaces, especially the line metric. For example, for the line metric, it has been shown that the WFA when applied to the online matching problem has a lower bound of and an upper bound of  [2]; see also [1] for a competitive algorithm for the line metric. In the oblivious adversary model, there is an -competitive [3] algorithm for the line metric. There is also an -competitive algorithm in the oblivious adversary model for any metric space with doubling dimension  [3].

Only recently, for any metric space and for the adversarial model, Raghvendra and Nayyar [8] provided a bound of on the competitive ratio of the RM-Algorithm – an algorithm that was introduced by Raghvendra [9]; here is the worst case ratio of the the TSP and diameter of any positive diameter subset of . There is a simple lower bound of on the competitive ratio of any algorithm for this problem. Therefore, RM-Algorithm is near-optimal for every input set . When is a set of points on a line, and therefore, their analysis bounds the competitive ratio of the RM-Algorithm by for the line metric and for any -dimensional Euclidean metric. Furthermore, RM-Algorithm also has a lower bound of on its competitive ratio for the line metric. In this paper, we provide a different analysis and show that the RM-Algorithm is -competitive.

Overview of RM-Algorithm:

At a high-level, the RM-Algorithm maintains two matchings and , both of which match requests seen so far to the same subset of servers in . We refer to  as the online matching and as the offline matching. For a parameter chosen by the algorithm, the offline matching is a -approximate minimum-cost matching satisfying a set of relaxed dual feasibility conditions of the linear program for the minimum-cost matching; here each constraint relaxed by a multiplicative factor . Note that, when , the matching is simply the minimum-cost matching.

When the request arrives, the algorithm computes an augmenting path with the minimum “cost” for an appropriately defined cost function. This path starts at and ends at an unmatched server . The algorithm then augments the offline matching along whereas the online matching will simply match the two endpoints of . Note that and will always match requests to the same subset of servers. We refer to the steps taken by the algorithm to process request as the phase of the algorithm. For , it has been shown in [9] that the sum of the costs of every augmenting path computed by the algorithm is bounds the online matching cost from above. Nayyar and Raghvendra [8] use this property and bounded the ratio of the sum of the costs of augmenting paths to the cost of the optimal matching. In order to accomplish this they associate every request to the cost of . To bound the sum of costs of all the augmenting paths, they partition the requests into groups and within each group they bound this ratio by (which is a constant when is a set of points on a line). For the line metric, each group can, in the worst-case, have a ratio of . However, not all groups can simultaneously exhibit this worst-case behavior. In order to improve the competitive ratio from to , therefore, one has to bound the combined ratios of several groups by a constant making the analysis challenging.

1.1 Our Results and Techniques

In this paper, we show that when the points in are on a line, the RM-Algorithm achieves a competitive ratio of . Our analysis is tight as there is an example in the line metric for which the RM-Algorithm produces an online matching which is times the optimal cost. We achieve this improvement using the following new ideas:

  • First, we establish the the ANFS-property of the RM-Algorithm (Section 3.1). We show that many requests are matched to an approximately nearest free server (ANFS) by the algorithm. We define certain edges of the online matching as short edges and show that every short edge matches the request to an approximately nearest free server. Let be the set of short edges of the online matching and be the long edges. We also show that when , the total cost of the short edges and therefore, the cost of the long edges is .

  • For every point in , the RM-Algorithm maintains a dual weight (Section 2). For our analysis in the line metric, we assign an interval to every request. The length of this interval is determined by the dual weight of the request. At the start of phase , let be the union of all such intervals. By its construction, will consist of a set of interior-disjoint intervals. While processing request , the RM-Algorithm conducts a series of dual adjustments and a subset of requests (referred to as ) undergo an increase in dual weights. After the dual adjustments, the union of the intervals for requests in forms a single interval and these increases conclude in the discovery of the minimum cost augmenting path. Therefore, after phase , intervals in may grow and combine to form a single interval in . Furthermore, the new online edge is also contained inside this newly formed interval (Section 3.3). Based on the length of the interval , we assign one of levels to the edge . This partitions all the online edges in levels.

  • The online edges of any given level can be expressed as several non-overlapping well-aligned matching of a well separated input (Section 4) . We establish properties of such matchings (Section 3.2 and Figure 1) and use it to bound the total cost of the “short” online edges of level by the sum of and times the cost of the long edges of level , where is a small positive constant (Section 4). Adding across the levels, we get . Using the ANFS-property of short and long edges, we immediately get . For a sufficiently small , we bound the competitive ratio, i.e., by .

Organization:

The rest of the paper is organized as follows. We begin by presenting (in Section 2) the RM-Algorithm and some of its use properties as shown in [9]. For the analysis, we establish the ANFS-property in Section 3.1. After that, we will (in Section 3.2) introduce well aligned matchings of well-separated inputs on a line. Then, in Section 3.3, we interpret the dual weight maintained for each request as an interval and study the properties of the union of these intervals. Using these properties (in Section 4) along with the ANFS-property of the algorithm, we will establish a bound of on the competitive ratio for the line metric.

2 Background and Algorithm Details

In this section, we introduce the relevant background and describe the RM-algorithm.

A matching is any set of vertex-disjoint edges of the complete bipartite graph denoted by . The cost of any edge is given by ; we assume that the cost function satisfies the metric property. The cost of any matching is given by the sum of the costs of its edges, i.e., . A perfect matching is a matching where every server in is serving exactly one request in , i.e., . A minimum-cost perfect matching is a perfect matching with the smallest cost.

Given a matching , an alternating path (resp. cycle) is a simple path (resp. cycle) whose edges alternate between those in and those not in . We refer to any vertex that is not matched in as a free vertex. An augmenting path is an alternating path between two free vertices. We can augment by one edge along if we remove the edges of from and add the edges of to . After augmenting, the new matching is precisely given by , where is the symmetric difference operator. A matching and a set of dual weights, denoted by for each point , is a -feasible matching if, for any request and a server , the following conditions hold:

(1)
(2)

Also, we refer to an edge to be eligible if either or satisfies inequality (1) with equality:

(3)
(4)

For a parameter , we define the -net-cost of any augmenting path with respect to to be:

The definitions of -feasible matching, eligible edges and -net cost (when ) are also used in describing the well-known Hungarian algorithm which computes the minimum-cost matching. In the Hungarian algorithm, initially is a -feasible matching with all the dual weights set to . In each iteration, the Hungarian Search procedure adjusts the dual weights and computes an augmenting path of eligible edges while maintaining the -feasibility of and then augments along . The augmenting path computed by the standard implementation of the Hungarian search procedure can be shown to also have the minimum -net-cost.

Using this background, we describe the RM-Algorithm. At the start, the value of is chosen at the start of the algorithm. The algorithm maintains two matchings: an online matching and a -feasible matching (also called the offline matching) both of which are initialized to . After processing requests, both matchings and match each of the requests to the same subset of servers in , i.e., the set of free (unmatched) servers is the same for both and . To process the request , the algorithm does the following

  1. Compute the minimum -net-cost augmenting path with respect to the offline matching . Let be this path starting from and ending at some server .

  2. Update offline matching by augmenting it along , i.e., and update online matching by matching to . .

While computing the minimum -net-cost augmenting path in Step , if there are multiple paths with the same -net-cost, the algorithm will simply select the one with the fewest number of edges. Throughout this paper, we set . In [9], we present an -time search algorithm that is similar to the Hungarian Search to compute the minimum -net-cost path in Step 1 any phase and we describe this next.

The implementation of Step of the algorithm is similar in style to the Hungarian Search procedure. To compute the minimum -net-cost path , the algorithm grows an alternating tree consisting only of eligible edges. There is an alternating path of eligible edges from to every server and request participating in this tree. To grow this tree, the algorithm increases the dual weights of every request in this tree until at least one more edge becomes eligible and a new vertex enters the tree. In order to maintain feasibility, the algorithm reduces the dual weights of all the servers in this tree by the same amount. This search procedure ends when an augmenting path consisting only of eligible edges is found. Let (resp. ) be the set of requests (resp. servers) that participated in the alternating tree of phase . Note that during Step , the dual weights of requests in may only increase and the dual weights of servers in may only reduce.

The second step begins once the augmenting path is found. The algorithm augments the offline matching along this path. Note that, for the to be -feasible, the edges that newly enter must satisfy (2). In order to ensure this, the algorithm will reduce the dual weight of each request on to . Further details of the algorithm and proof of its correctness can be found in [9]. In addition, it has also been shown that the algorithm maintains the following three invariants:

  1. The offline matching and dual weights form a -feasible matching,

  2. For every server , and if , . For every request , and if has not yet arrived, ,

  3. At the end of the first step of phase of the algorithm the augmenting path is found and the dual weight of , , is equal to the -net-cost .

Notations:

Throughout the rest of this paper, we will use the following notations. We will index the requests in their order of arrival, i.e., is the th request to arrive. Let be the set of first request. Our algorithm processes the request , by computing an augmenting path . Let be the free server at the other end of the augmenting path . Let be the set of augmenting paths generated by the algorithm. In order to compute the augmenting path , in the first step, the algorithm adjusts the dual weights and constructs an alternating tree; let be the set of requests and let  be the set of servers that participate in this alternating tree. Let be the offline matching after the th request has been processed; i.e., the matching obtained after augmenting the matching along . Note that and is the final matching after all the requests have been processed. The online matching is the online matching after requests have been processed. consists of edges . Let be the free servers with respect to matchings and , i.e., the set of free servers at the start of phase . For any path , let be its length.

Next, in Section 3.1, we will present the approximate nearest free server (ANFS) property of the RM-Algorithm. In Section 3.2, we present an well aligned matching of a well separated input instance. In Section 3.3, we interpret the execution of each phase of the RM-Algorithm in the line metric. Finally, in Section 4, we give our analysis of the algorithm for the line metric.

3 New Properties of the Algorithm

In this section, we present new properties of the RM-Algorithm. First, we show that the RM-Algorithm will assign an approximate nearest free server for many requests and show that the total cost of these “short” matches will be at least one-sixth of the online matching cost. Our proof of this property is valid for any metric space.

3.1 Approximate Nearest Free Server Property

We divide the augmenting paths computed by the RM-Algorithm into two sets, namely short and long paths. For any , we refer to as a short augmenting path if and long otherwise. Let be this set of all short augmenting paths and be the long augmenting paths. In phase , the algorithm adds an edge between and in the online matching. We refer to any edge of the online matching as a short edge if is a short augmenting path. Otherwise, we refer to this edge as a long edge. The set of all short edges, and the set of long edges partition the edges of the online matching .

At the start of phase , are the set of free servers. Let be the server closest to , i.e., Any other server is an -approximate nearest free server to the request if

In Lemma 3.1 and 3.1, we show that the short edges in match a request to a -ANFS and the cost of is at least one-sixth of the cost of the online matching.

{lemma}

For any request , if is a short augmenting path, then is a -ANFS of .

Proof.

Let be the nearest available server of in . Both and are free and so the edge is also an augmenting path with respect to with . The algorithm computes which is the minimum -net-cost path with respect to and so,

Since is a short augmenting path,

(5)
(6)

implying that is a -approximate nearest free server to the request . ∎

{lemma}

Let be the set of short edges of the online matching . Then,

(7)
Proof.

Since the matchings and differ only in the edges of the augmenting path , we have

The second equality follows from the definition of . Adding and subtracting to the RHS we get,

The last equality follows from (3.1). Rearranging terms and setting , we get,

(9)

In the second to last equation, the summation on the LHS telescopes canceling all terms except . Since and is an empty matching, we get . As is always a positive value, the second to last equation follows.

Recollect that is the set of short augmenting paths and is the set of long augmenting paths with . We rewrite (9)

(10)

The last two inequalities follow from the fact that is a positive term and also the definition of long paths, i.e., if is a long path then . Adding to (10) and applying (9), we get

(11)

When request arrives, the edge is an augmenting path of length with respect to and has a -net-cost . Since is the minimum -net-cost path, we have . Therefore, we can write (11) as

or as desired.

If we set , then or .

Convention and Notations for the Line Metric:

When and are points on a line, any point is simply a real number. We can interpret any edge as the line segment that connects and and its cost as . We will abuse notation and use to denote both the edge as well as the corresponding line segment. A matching of and , therefore, is also a set of line segments, one corresponding to each edge. For any closed interval , will be the open interval . We define the boundary, of any closed (or open) interval (or ) to be the set of two end points. The optimal matching of points on a line has a very simple structure which we describe next.

Properties of optimal matching on a line:

For any point set on a line, let be a sequence of points of sorted in increasing order of their coordinate value. Given sets and , consider sequences and . The minimum-cost matching will match server to request . We will show this next.

Let . Suppose contains points from and points from and let . Consider and let be the first points in the sequence . Note that, for any , and there are precisely more (or fewer) vertices of than in . Consider the intervals , for every with a length . Any perfect matching will have at least edges with one end point in and the other in . Every such edge will contain the interval and so, the cost of any perfect matching is at least .

We claim that the matching described above has a cost and so is an optimal matching. For any , without loss of generality, suppose there are more points of than in and so contains the points and . In the optimal solution, we match with ; the remaining servers in match to requests in . Therefore, for every , there are exactly edges in that contain the interval and so, we can express its cost as .

Any edge of the matching is called a left edge if the server is to the left of request . Otherwise, we refer to the edge as the right edge. Consider any perfect matching of and and for an interval , let be the edges of that contain the interval . For every interval , if the edges in are either all left edges or all right edges, then as an easy consequence from the above discussion, it follows that is an optimal matching.

  1. For every interval , if the edges in are either all left edges or all right edges, then is an optimal matching.

3.2 Properties of 1-dimensional Matching

In this section, we define certain input instances that we refer to as a well-separated input instance. We then define matchings that are well-aligned for such instance and then bound the cost of such a well-aligned matching by the cost of their optimal matching. In Section 4, we divide the edges of the online matching into well-aligned matchings on well-separated instances. This will play a critical role in bounding the competitive ratio for the line metric.

Well-separated instances and well aligned matchings:

A well-separated instance of the -dimensional matching problem is defined as follows. For any and , consider the following four intervals , and . Note that is the leftmost interval, is the rightmost interval, and lies in the middle of the and . is simply the union of , and . We say that any input set of servers and requests is an -well-separated input if, for some , there is a translation of such that and .

Given an -well-separated input and , consider the intervals, and . We divide the edges of any matching of and into three groups. Any edge is close if or . Any edge is a far edge if or . Let and denote the close and far edges of respectively. For a matching edge , we denote it as a medium edge if the request is inside the interval and the server is inside the interval or . We denote this set of edges as the medium edges and denote it by . From the well-separated property of the input it is easy to see that . A matching is -well-aligned if all the edges of with both their endpoints inside (resp. ) are right (resp. left) edges. See Figure 1 for an example of -well aligned matching of an -well separated input instance. Any -well-aligned matching of an -well-separated input instance satisfies the following property.

Figure 1: All servers and requests . So, is an -well separated input instance. The matching is partitioned into , and . is -well aligned since the edges of in are left edges (server is to the left of request) and those in are right edges (server is to the right of request).
{lemma}

For any , given an -well-separated input and and an -well-aligned matching , let be the optimal matching of and and let and be as defined above. Then,

Proof.

Let and , be the edges of that are in (or ) and (or ) respectively and be the remaining edges of . Let and be the servers and requests that participate in . Similarly, we define the sets and for and the sets and for . Let denote the optimal matching of and and let denote the optimal matching of with . The following four claims will establish the lemma:

  1. , i.e., is an optimal matching of and ,

  2. ,

  3. , and,

  4. .

The matching is a perfect matching of servers and requests . Note that all edges of are inside the interval and . Furthermore, those that are inside the interval are left edges and the edges that are inside are right edges. Therefore, satisfies the precondition for (OPT) and so, is an optimal matching of and implying (i).

To prove (ii), observe that any edge in has the request inside the interval . Therefore, the maximum length of any such edge is at most . On the other hand, let be the match of in the optimal matching . From the well-separated property, and since , is a lower bound the length of . Therefore, the cost of all the edges in is bounded .

We prove (iii) as follows: Let be the optimal matching of and . Note that every request in is contained inside the interval . Since all servers are in the interval , every edge of that is incident on any vertex of has a cost of at least . Initially set to . For every edge , we remove points and and the edges of incident on them from ; note that the other end point of the edges of incident on and can be any vertex of and including points from and . After the removal of points, the vertex set of is and . Removal of the edges can create at most free vertices in with respect to . Similarly there are at most free vertices in with respect to . We match these free nodes arbitrarily in at a cost of at most per edge. Therefore, the total cost of the matching is at most . For every the edge of incident on has a cost of at least . Therefore, the cost of is at least . Combined, the new matching matches to and has a cost at most leading to (3).