Optimal Distributed Weighted Set Cover Approximation

# Optimal Distributed Weighted Set Cover Approximation

Ran Ben-Basat Technion, sran@cs.technion.ac.il    Guy Even Tel Aviv University, guy@eng.tau.ac.il    Ken-ichi Kawarabayashi NII, Japan, k_keniti@nii.ac.jp, greg@nii.ac.jp    Gregory Schwartzmanfootnotemark:
###### Abstract

We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank . This problem is equivalent to the Minimum Weight Set Cover Problem in which the frequency of every element is bounded by . The approximation factor of our algorithm is . Let denote the maximum degree in the hypergraph. Our algorithm runs in the congest model and requires rounds, for constants and . This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights or the number of vertices. Thus adding another member to the exclusive family of provably optimal distributed algorithms.

For constant values of and , our algorithm improves over the -approximation algorithm of [KMW06] whose running time is , where is the ratio between the largest and smallest vertex weights in the graph.

## 1 Introduction

In the Minimum Weight Vertex Cover (mwvc) problem, we are given an undirected graph with vertex weights , for . The goal is to find a minimum weight cover such that . This problem is one of the classical NP-hard problems presented in [Kar72].

In this paper, we consider the Minimum Weight Hypergraph Vertex Cover (mwhvc) problem, a generalization of the mwvc problem to hypergraphs of rank . In a hypergraph, , each edge is a nonempty subset of the vertices. A hypergraph is of rank if the size of every hyperedge is bounded by . The mwvc problem naturally extends to mwhvc using the above definition. Note that mwhvc is equivalent to the Minimum Weight Set Cover problem with element frequencies bounded by .

We consider the mwhvc problem in the distributed setting, where the communication network is a bipartite graph . We refer to the network vertices as nodes and network edges as links. The nodes of the network are the hypergraph vertices on one side and hyperedges on the other side. There is a network link between vertex and hyperedge iff . The computation is performed in synchronous rounds, where messages are sent between neighbors in the communication network. As for message size, we consider the congest model where message sizes are bounded to . This is more restrictive than the local model where message sizes are unbounded.

Denoting by the maximum vertex degree in , any distributed constant-factor approximation algorithm requires rounds to terminate, even for unweighted graphs and  [KMW16]. Two results match the lower bound. For the Minimum Weight Vertex Cover Problem in graphs , the lower bound was matched by [BCS17] with a -approximation algorithm (BCS algorithm) with optimal round complexity for every . 111Recently, the range of for which the runtime is optimal was improved to for any  [BEKS18]. The progress of the BCS algorithm is analyzed via a trade-off between reducing the degree of the vertices and reducing the weight of the vertices. We do not know how to generalize the BCS algorithm and its analysis to hypergraphs. For the Minimum Cardinality Vertex Cover in Hypergraphs Problem, the lower bound was matched by  [EGM18] with an -approximation algorithm. The round complexity in [EGM18] is , which is optimal for constant and . The algorithm in [EGM18] and its analysis is a deterministic version of the maximal independent set algorithm of [Gha16]. We do not know how to generalize the algorithm in [EGM18] and its analysis to hypergraphs with vertex weights.

In this paper, we present a deterministic distributed -approximation algorithm for minimum weight vertex cover in -rank hypergraphs, which completes in rounds in the congest model, for any constants and . Our running time is optimal according to a lower bound by [KMW16].222The dependence of the round complexity of our algorithm on and is given by , for every constant (Theorem C.1). This is the first distributed algorithm for this problem whose round complexity does not depend on the node weights. For constant values of , Astrand et al. [ÅS10] present an -approximation algorithm whose running time is ( is the ratio between the largest and smallest weights in the graph). Kuhn et al. [Kuh05, KMW06] present an -approximation algorithm that terminates in rounds. To the best of our knowledge, these are the only works that deal with the Minimum Weight Hypergraph Vertex Cover Problem (mwhvc) in the distributed setting.

Our algorithm is one of a handful of distributed algorithms for local problems which are provably optimal [BCGS17, BCS17, CKP16, CV86, GS17, EGM18]. Among these are the classic Cole-Vishkin algorithm [CV86] for 3-coloring a ring, the more recent results of [BCGS17] and [BCS17] for mwvc and Maximum Matching, and the very recent result of [EGM18] for Minimum Cardinality Hypergraph Vertex Cover.

### 1.1 Tools and techniques

Our solution employs the Primal-Dual schema. The Primal-Dual approach introduces, for every hyperedge , a dual variable denoted by . The dual edge packing constraints are . If for some it holds that , we say the is -tight. Let . For every feasible dual solution, the weight of the set of -tight vertices is at most times the weight of an optimal (fractional) solution. The algorithm terminates when the set of -tight edges constitutes a vertex cover.

The challenge in designing a distributed algorithm is in controlling the rate at which we increase the dual variables. On the one hand, we must grow them rapidly to reduce the number of communication rounds. On the other hand, we may not violate the edge packing constraints. The algorithm proceeds in iterations, each of which requires a constant number of communication rounds. We initialize the dual variables in a "safe" way so that feasibility is guaranteed. We refer to the additive increase of the dual variable in iteration by . Loosely speaking, the algorithm increases the increments exponentially (multiplication by ) provided that no vertex is -tight with respect to the deals of the previous iteration. Otherwise, the increment equals the previous increment . The analysis builds on two observations: (1) The number of times that the increment is multiplied by is bounded by . (2) The number of iterations in which a vertex is -tight with respect to the deals of the previous iteration is at most . Hence the total number of iterations is bounded by . Setting implies that the number of iterations is .

## 2 Problem Formulation

Let denote a hypergraph. Vertices in are equipped with nonnegative weights . For a subset , let . Let denote the set of hyperedges that are incident to some vertex in (i.e., ).

The Minimum Weight Hypergraph Vertex Cover Problem (mwhvc) is defined as follows. \@checkendproblem dccclxxxvii

      Input: Hypergraph G=(V,E) with vertex weights w(v). A subset C⊆V such that E(C)=E. Minimize w(C).

The mwhvc Problem is equivalent to the Weighted Set Cover Problem. Consider a set system , where denotes a set of elements and denotes a collection of subsets of . The reduction from the set system to a hypergraph proceeds as follows. The set of vertices is (one vertex per subset ). The set of edges is (one hyperedge per element ), where . The weight of vertex equals the weight of the subset .

## 3 Distributed (f+ε)-Approximation Algorithm for Mwhvc

### 3.1 Input

The input is a hypergraph with non-negative vertex weights and an approximation ratio parameter . We denote the rank of by (i.e., each hyperedge contains at most vertices) and the maximum degree of by (i.e., each vertex belongs to at most edges).

##### Assumptions.

We assume that {enumerate*}[label=()]

Vertex weights are polynomial in so that sending a vertex weight requires bits.

Vertex degrees are polynomial in (i.e., ) so that sending a vertex degree requires bits. Since , this assumption trivially holds for constant .

The maximum degree is at least so that .

### 3.2 Output

A vertex cover . Namely, for every hyperedge , the intersection is not empty. The set is maintained locally in the sense that every vertex knows whether it belongs to or not.

### 3.3 Communication Network

The communication network is a bipartite graph. There are two types of nodes in the network: servers and clients. The set of servers is (the vertex set of ) and the set of clients is (the hyperedges in ). There is a link from server to a client if . We note that the degree of the clients is bounded by and the degree of the servers is bounded by .

### 3.4 Parameters and Variables

• The approximation factor parameter and the rank determine the parameter defined by .

• The parameter is set to and determines the factor by which “deals” are multiplied333 For simplicity, we assume that is known and that . The assumption that the maximal degree is known to all vertices is not required. Instead, each hyperedge can compute a local maximum degree , where . The local maximum degree can be used instead of to define local value of the multiplier . . See Section C in the Appendix for a setting of that reduces the dependency of the running time on and .

• We denote the dual variables at the end of iteration by (see Appendix A for a description of the dual edge packing linear program). The amount by which is increased in iteration is denoted by . Namely, .

### 3.5 Notation

• We say that an edge is covered by if .

• Let denote the set of hyperedges that contain .

• For every vertex , the algorithm maintains a subset that consists of the uncovered hyperedges in (i.e., ).

### 3.6 Algorithm Mwhvc

1. Initialization. Set . For every vertex , set .

2. Iteration . The edge collects the weight and degree from every vertex , and sets: }. The value is sent to every . The dual variable is updated .

3. For to do:

1. Every vertex checks if it is -tight. If , then joins the cover , sends a message to every that is covered, and () terminates.

2. For every uncovered edge , if receives a message that it is covered, then it tells all its vertices that is covered, and terminates.

3. For every vertex , if it receives a message from that is covered, then . If , then terminates (without joining the cover).

4. For every vertex , if , then send the message “raise” to every , else send the message “stuck” to every .

5. For every uncovered edge . If received a “stuck” message then , else (if all incoming messages are “raise”) . Send to every , who updates .

##### Termination

Every vertex terminates when either or every edge is covered (i.e., ). Every edge terminates when it is covered (i.e., ).

##### Execution in CONGEST.

See Section B in the Appendix for a discussion of how Algorithm mwhvc is executed in the congest model.

## 4 Algorithm Analysis

### 4.1 Approximation Ratio

The following claim states that, in each iteration, the sum of the deals of edges incident to a vertex is bounded by .

If , then .

###### Proof.

The proof is by induction on . The induction basis, for , holds because for every edge . The induction step, for , considers two cases. If for every , then the induction step follows from the induction hypothesis. If there exists an edge such that , then Step 3d implies that , as required. ∎

If an edge is covered in iteration , then terminates and is not set for . In this case, we define , namely, the last value assigned to a dual variable.

###### Claim 4.2.

For every the dual variables constitute a feasible edge packing. Namely,

 ∑e∈E(v)δi(e) ≤w(v) for every vertex v∈V, δi(e) ≥0 for every edge e∈E.
###### Proof.

Nonnegativity follows from the initialization and the positive increases by deals. The packing constraints are proved by induction on the number of iterations. The induction basis, for , holds because . (Recall that .) The induction step is proved as follows. By Step 3e, if , then , otherwise . By Step 3a, . By Claim 4.1 in Appendix A, , and the claim follows. ∎

Let opt denote the cost of an optimal (fractional) weighted vertex cover of .

###### Corollary 4.3.

Upon termination, the approximation ratio of Algorithm mwhvc is .

###### Proof.

Throughout the algorithm, the set consists of -tight vertices. By Claim A.1, . Upon termination, constitutes a vertex cover, and the corollary follows. ∎

### 4.2 Communication Rounds Analysis

In this section, we prove that the number of communication rounds of Algorithm mwhvc is bounded by . It suffices to bound the number of iterations because each iteration consists of a constant number of communication rounds.

#### 4.2.1 Raise or Stuck Iterations

###### Definition 4.4.

An iteration is an -raise iteration if . An iteration is a -stuck iteration if sent the message “stuck” in iteration .

Note that if iteration is a -stuck iteration and , then and is not an -raise iteration.

We bound the number of -raise iterations as follows.

###### Lemma 4.5.

The number of -raise iterations is bounded by

###### Proof.

Let denote a vertex with minimum normalized weight in . The first deal satisfies . By Claim 4.1, . Since the deal is multiplied by in each -raise iteration, the lemma follows. ∎

We bound the number of -stuck as follows.

###### Lemma 4.6.

The number of -stuck iterations is bounded by

###### Proof.

Suppose that iteration (for ) is a -stuck iteration. This implies that . Thus . Had there been more than iterations that are -stuck, then the dual variable would be larger than , contradicting Claim 4.2. ∎

#### 4.2.2 Putting it Together

###### Theorem 4.7.

Fix some , the number of iterations of Algorithm  is

 O(logαΔ+f⋅αβ)
###### Proof.

Fix an edge . We bound the number of iterations until is covered as follows. Every iteration is either an -raise iteration or a -stuck iteration for some . Since contains at most vertices, we conclude that the number of iterations is bounded by the number of -stuck iterations plus the sum over of the number of -stuck iterations. The theorem follows from Lemmas 4.5 and 4.6. ∎

Finally, by setting appropriately, we bound the running time as follows.

###### Corollary 4.8.

If , then the round complexity of Algorithm mwhvc is .

A refined assignment of that leads to a reduced dependency of the running time on and is presented in Section C in the Appendix.

## References

• [ÅS10] Matti Åstrand and Jukka Suomela. Fast distributed approximation algorithms for vertex cover and set cover in anonymous networks. In SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June 13-15, 2010, pages 294–302, 2010.
• [BCGS17] Reuven Bar-Yehuda, Keren Censor-Hillel, Mohsen Ghaffari, and Gregory Schwartzman. Distributed approximation of maximum independent set and maximum matching. In PODC, pages 165–174. ACM, 2017.
• [BCS17] Reuven Bar-Yehuda, Keren Censor-Hillel, and Gregory Schwartzman. A distributed (2 + )-approximation for vertex cover in o(log / log log ) rounds. J. ACM, 64(3):23:1–23:11, 2017.
• [BEKS18] R. Ben-Basat, G. Even, K.-i. Kawarabayashi, and G. Schwartzman. A Deterministic Distributed -Approximation for Weighted Vertex Cover in Rounds. In SIROCCO, 2018.
• [CKP16] Yi-Jun Chang, Tsvi Kopelowitz, and Seth Pettie. An exponential separation between randomized and deterministic complexity in the LOCAL model. In FOCS, pages 615–624. IEEE Computer Society, 2016.
• [CV86] Richard Cole and Uzi Vishkin. Deterministic coin tossing with applications to optimal parallel list ranking. Information and Control, 70(1):32–53, 1986.
• [EGM18] Guy Even, Mohsen Ghaffari, and Moti Medina. Distributed Set Cover Approximation: Primal-Dual with Optimal Locality. In DISC, 2018.
• [Gha16] Mohsen Ghaffari. An improved distributed algorithm for maximal independent set. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 270–277. Society for Industrial and Applied Mathematics, 2016.
• [GS17] Mohsen Ghaffari and Hsin-Hao Su. Distributed degree splitting, edge coloring, and orientations. In SODA, pages 2505–2523. SIAM, 2017.
• [Kar72] Richard M. Karp. Reducibility among combinatorial problems. In Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York., pages 85–103, 1972.
• [KMW06] Fabian Kuhn, Thomas Moscibroda, and Roger Wattenhofer. The price of being near-sighted. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 980–989, 2006.
• [KMW16] Fabian Kuhn, Thomas Moscibroda, and Roger Wattenhofer. Local computation: Lower and upper bounds. J. ACM, 63(2):17:1–17:44, 2016.
• [Kuh05] Fabian Kuhn. The price of locality: exploring the complexity of distributed coordination primitives. PhD thesis, ETH Zurich, 2005.

## Appendix A Primal-Dual Approach

The fractional LP relaxation of mwhvc is defined as follows.

 minimize: ∑v∈Vw(v)⋅x(v) subject to: ∑v∈ex(v)≥1,∀e∈Ex(v)≥0,∀v∈V (P)

The dual LP is an Edge Packing problem defined as follows:

 maximize: ∑e∈Eδ(e) subject to: ∑e∋vδ(e)≤w(v),∀v∈Vδ(e)≥0,∀e∈E (D)

The following claim is used for proving the approximation ratio of the mwhvc algorithm.

###### Claim A.1.

Let opt denote the value of an optimal fractional solution of the primal LP (A). Let denote a feasible solution of the dual LP (A). Let and . Define the -tight vertices by:

 Tε ≜{v∈V∣∑e∋vδ(e)≥(1−β)⋅w(v)}.

Then .

###### Proof.
 w(Tε) =∑v∈Tεw(v) ≤11−β⋅(∑v∈Tε∑e∋vδ(e)) ≤f1−β∑e∈Eδ(e)≤(f+ε)⋅opt.

The last transition follows from and by weak duality. The claim follows. ∎

## Appendix B Adaptation to the CONGEST model

To complete the discussion, we need to show that the message lengths in Algorithm mwhvc are .

1. In round , every vertex sends its weight and degree to every hyperedge in . We assume that the weights and degrees are polynomial in , hence the length of the binary representations of and is .

Every hyperedge sends back to every the pair , where has the smallest normalized weight, i.e., .

Every vertex locally computes and .

2. In round , the following types of messages are sent: “ is covered”, “raise”, or “stuck”. These messages require only a constant number of bits. The decision whether or requires a single bit.

3. Finally, if is set locally based on the local maximum degree , then every vertex sends its degree to all the edges . The local maximum degree for is sent to every vertex , and this parameter is used to compute locally.

## Appendix C Improved Running Time

In this section, we present a modified definition of the multiplier that leads to an improved dependence of the running time on and .

Let denote a constant. Set the multiplier as follows:

 α≜⎧⎨⎩(logΔloglogΔ)(1−γ)if fβ<(logΔloglogΔ)γ2otherwise. (1)

Note that in the following, the round complexity is monotonically nonincreasing in , so it may be chosen arbitrarily close to .

###### Theorem C.1.

For every constant , by setting according to Eq. 1, the round complexity of Algorithm mwhvc is bounded by

 O⎛⎝logΔloglogΔ+(f2ε)1/γ⋅loglogΔ⎞⎠ (2)
###### Proof.

By Theorem 4.7, the number of iterations is bounded by . We consider two cases.

1. Suppose that . In this case, . The terms in the bound on the number of iterations satisfy:

 logαΔ =O(logΔloglogΔ) f⋅αβ ≤logΔloglogΔ.
2. Suppose that . In this case , and hence

 logαΔ ≤(fβ)1/γ⋅loglogΔ f⋅αβ ≤O((fβ)1/γ).

In both cases, the bound on the number of iterations is bounded by the expression in Eq. 2, and the theorem follows. ∎

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   