Partial resampling to approximate covering integer programs1footnote 11footnote 1 A preliminary version of this paper appeared in the proceedings of the Symposium on Discrete Algorithms (SODA) 2016.

Partial resampling to approximate covering integer programs111 A preliminary version of this paper appeared in the proceedings of the Symposium on Discrete Algorithms (SODA) 2016.

Antares Chen
University of California, Berkeley, CA 94720. This work was done while a student at Montgomery Blair High School, Silver Spring, MD 20901. Email: antaresc@berkeley.edu.
   David G. Harris
Department of Applied Mathematics, University of Maryland, College Park, MD 20742. Research supported in part by NSF Awards CNS-1010789 and CCF-1422569. Email: davidgharris29@hotmail.com
   Aravind Srinivasan Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742. Research supported in part by NSF Awards CNS-1010789 and CCF-1422569, and by a research award from Adobe, Inc. Email: srin@cs.umd.edu
Abstract

We consider column-sparse positive covering integer programs, which generalize set cover and which have attracted a long line of research developing (randomized) approximation algorithms. We develop a new rounding scheme based on the Partial Resampling variant of the Lovász Local Lemma developed by Harris & Srinivasan (2013). This achieves an approximation ratio of , where is the minimum covering constraint and is the maximum -norm of any column of the covering matrix (whose entries are scaled to lie in ). When there are additional constraints on the sizes of the variables, we show an approximation ratio of to satisfy these size constraints up to multiplicative factor , or an approximation of ratio of to satisfy the size constraints exactly (where is the maximum number of non-zero entries in any column of the covering matrix). We also show nearly-matching inapproximability and integrality-gap lower bounds. These results improve asymptotically, in several different ways, over results shown by Srinivasan (2006) and Kolliopoulos & Young (2005).

We show also that our algorithm automatically handles multi-criteria programs, efficiently achieving approximation ratios which are essentially equivalent to the single-criterion case and which apply even when the number of criteria is large.

1 Introduction

We consider positive covering integer programs – or simply covering integer programs (CIPs) – defined as follows (with denoting the set of non-negative integers). We have solution variables , and for , a system of covering constraints of the form:

Here is an -long non-negative vector; by scaling, we can assume that and . We can write this more compactly as . We may optionally have constraints on the size of the solution variables, namely, that we require ; these are referred to as the multiplicity constraints. Finally, we have some linear objective function, represented by a vector . Our goal is to minimize , subject to the multiplicity and covering constraints.

This generalizes the set-cover problem, which can be viewed as a special case in which . Solving set cover or integer programs exactly is NP-hard [12], so a common strategy is to obtain a solution which is approximately optimal. There are at least three ways one may obtain an approximate solution, where OPT denotes the optimal solution-value for the given instance:

  1. the solution may violate the optimality constraint, that is, ;

  2. may violate the multiplicity constraint: i.e., for some ;

  3. may violate the covering constraints: i.e., for some .

These three criteria are in competition. For our purposes, we will demand that our solution completely satisfies the covering constraints. We will seek to satisfy the multiplicity constraints and optimality constraint as closely as possible. Our emphasis will be on the optimality constraints, that is, we seek to ensure that

where is “small”. The parameter , in this context, is referred to as the approximation ratio. More precisely, we will derive a randomized algorithm with the goal of satisfying , where the expectation is taken over our algorithm’s randomness.

Many approximation algorithms for set cover and its extensions give approximation ratios as a function of , the total number of constraints: e.g., it is known that the greedy algorithm has approximation ratio [18]. We often prefer a scale-free approximation ratio, that does not depend on the problem size but only on its structural properties. Two cases that are of particular interest are when the matrix is row-sparse (a bounded number of variables per constraint) or column-sparse (each variable appears in a bounded number of constraints). We will be concerned solely with the column-sparse setting in this paper. The row-sparse setting, which generalizes problems such as vertex cover, typically leads to very different types of algorithms than the column-sparse setting.

Two common parameters used to measure the column sparsity of such systems are the maximum and norms of the columns; that is,

Since the entries of are in , we have ; it is also possible that .

Approximation algorithms for column-sparse CIPs typically yield approximation ratios which are a function of or , and possibly other problem parameters as well. These algorithms fall into two main classes. First, there are greedy algorithms: they start by setting , and then increment where is chosen in some way which “looks best” in a myopic way for the residual problem. These were first developed by [3] for set cover, and later analysis (see [7]) showed that they give essentially optimal approximation ratios for set cover. These were extended to CIP in [8] and [5], showing an approximation ratio of . These greedy algorithms are often powerful, but they are somewhat rigid. In addition, the greedy algorithms do not yield “oblivious” approximation ratios — that is, the greedy algorithm can only operate with knowledge of the objective function.

An alternative, and often more flexible, class of approximation algorithms is based on linear relaxation. There are a number of possible linear relaxations, but the simplest is one which we refer to as the basic LP. This LP has the same covering constraints as the original CIP, but replaces the constraint with the weaker constraint . The set of feasible points to the basic LP is a polytope, and one can find its exact optimal fractional solution . As this is a relaxation, we have . It thus suffices to turn the solution into a random integral solution satisfying . We will also see some stronger LP formulations, such as the Knapsack-Cover (KC) inequalities.

Randomized rounding is often employed to transform solutions to the basic LP back to a feasible integral solution. The simplest scheme, first applied to this context by [17], is to simply draw as independent , for some . When this is used, simple analysis using Chernoff bounds shows that simultaneously for all when , where is some absolute constant. Thus, the overall solution is within a factor of from the optimum, where . One noteworthy aspect here is that this randomized rounding does not depend on the specific objective function; in this sense, it is “oblivious”, yielding a good expected value for any objective function.

In [19], Srinivasan gave a scale-free method of randomized rounding (ignoring multiplicity constraints), based on the FKG inequality and some proof ideas behind the Lovász Local Lemma (LLL). This gave an approximation ratio of . The rounding scheme, by itself, only gave a positive (exponentially small) probability of achieving the desired approximation ratio. The algorithm of [19] also included a polynomial-time derandomization using the method of conditional expectations; this derandomization howeve requires knowledge of the objective function.

The algorithm of Srinivasan can potentially cause a large violation in the multiplicity constraints. In [13], Kolliopoulos & Young considered how to modify the algorithm of [19] to respect the multiplicity constraints. They gave two algorithms, which offer different types of approximation ratios. The first algorithm takes parameter , violates each multiplicity constraint “” to at most “”, and has approximation ratio of . (We refer to this situation as -respect multiplicity.) The second algorithm meets the multiplicity constraints exactly and achieves approximation ratio .

1.1 Our contributions

In this paper, we give a new randomized rounding scheme, based on the partial resampling variant of the LLL developed in [10] and some proof ideas developed in [9] for the LLL applied to systems of correlated constraints. We show the following result:

Theorem 1.1.

Suppose we have fractional solution for the basic LP. Let . Then our randomized algorithm yields a solution satisfying the covering constraints with probability one, and with

The expected running time of this rounding algorithm is .

This automatically implies that for . Our algorithm has several advantages over previous techniques.

  1. We give approximation ratios in terms of , the maximum -norm of the columns of . Such bounds are always stronger than those phrased in terms of the corresponding -norm.

  2. When is small, our approximation ratios is asymptotically smaller than that of [19]. In particular, we avoid the term in our approximation ratio.

  3. When is large, then our approximation ratio is roughly ; this is asymptotically optimal (including having the correct coefficient), and improves on [19].

  4. This algorithm is quite efficient, essentially as fast as reading in the matrix .

  5. The algorithm is oblivious to the objective function — although it achieves a good approximation factor for any objective , the algorithm itself does not use in any way.

We find it interesting that one can “boil down” the parameters into a single parameter , which seems to completely determine the behavior of our algorithm.

Our partial resampling algorithm in its simplest form could significantly violate the multiplicity constraints. By choosing slightly different parameters for our algorithm, we can ensure that the multiplicity constraints are nearly satisfied, at the cost of a worsened approximation ratio:

Theorem 1.2.

Suppose we have a fractional solution for the basic LP. Let . For any given , our algorithm yields a solution satisfying the covering constraints with probability one, and with

This is an asymptotic improvement over the approximation ratio of [13], in three different ways:

  1. It depends on the -norm of the columns, not the norm;

  2. When is large, it is smaller by a full factor of ;

  3. When is small, it gives an approximation ratio which approaches , at a rate independent of .

The two previous approximation ratios are all given in terms of the basic LP. We also give an approximation algorithm based on the KC inequalities, which is a stronger linear relaxation than the basic LP. This gives a different type of asymptotic guarantee, which is phrased in terms of the optimal integral solution (not the optimal basic LP solution):

Theorem 1.3.

There is a randomized algorithm running in expected polynomial time, yielding a solution which satisfies the covering constraints, multiplicity constraints, and has

This improves over the the corresponding approximation ratio of [13], in that it achieves the optimal leading coefficient of .

There are many ways of parametrizing CIP’s; we have chosen to focus on the parameters such as the minimum RHS value , the maximum -column norm , and most importantly the ratio . Our approximation ratios are functions of these parameters; we show a number of matching lower bounds, which demonstrate that one cannot obtain significantly improved approximation ratios which are parametrized as functions of these same parameters. The formal statements of these results contain numerous qualifiers and technical conditions, but we summarize these here.

  1. When is large, then assuming the Exponential Time Hypothesis, any polynomial-time algorithm to solve the CIP (ignoring multiplicity constraints), whose approximation ratio is parametrized as a function of , must have approximation ratio .

  2. When is large, then the integrality gap between the basic LP and integral solutions which -respect multiplicity, is of order .

  3. When is small, then the integrality gap of the basic LP is .

In this sense, our approximation algorithms are nearly optimal as functions of the parameters . On the other hand, there are many alternate parameters which could be analyzed instead, and alternate approximation ratio guarantees given (which would be incomparable to ours.)

Finally, we give an extension to covering programs with multiple linear criteria. Specifically, we show that even conditional on our solution satisfying all the covering constraints, not only do we have but that in fact the values of are concentrated, roughly equivalent to the being independently distributed as Bernoulli with probability . Thus, for each there is a very high probability that we have and in particular there is a good probability that we have simultaneously for all .

Theorem 1.4 (Informal).

Suppose we are given a covering system with a fractional solution and with objective functions , whose entries are in and such that for all . Let . Then our solution satisfies the covering constraints with probability one; with probability at least ,

where . (A similar result is possible, if we also want to ensure that ; then the approximation ratio is .)

This significantly improves on [19], in terms of both the approximation ratio as well as the running time. Roughly speaking, the algorithm of [19] gave an approximation ratio of (worse than the approximation ratio in the single-criterion setting) and a running time of (polynomial time only when , the number of objective functions, is constant).

1.2 Outline

In Section 2, we develop a randomized rounding algorithm when the fractional solution satisfies ; here is a key parameter which we will discuss how to select in later sections. This randomized rounding produces produces a binary solution vector , for which .

In Section 3, we will develop a deterministic quantization scheme to handle fractional solutions of arbitrary size, using the algorithm of Section 2 as a subroutine. We will show an upper bound on the sizes of the variables in terms of the fractional . We will also show an upper bound on , which we state in a generalized form without making reference to column-sparsity or other properties of the matrix .

In Section 4, we consider the case in which we have a lower bound on the RHS constraint vectors , as well as an upper bound on the -norm of the columns of . Based on these values, we set key parameters of the rounding algorithm, to obtain good approximation ratios as a function of . These approximation ratios do not respect multiplicity constraints.

In Section 5, we extend these results to take into account the multiplicity constraints as well. We give two types of approximation algorithms: in the first, we -respect the the multiplicity constraints. In the second, we respect the multiplicity constraints exactly.

In Section 6, we construct a variety of lower bounds on achievable approximation ratios. These are based on integrality gaps as well as hardness results. These show that the approximation ratios developed in Section 4 are essentially optimal for most values of and , particularly when .

In Section 7, we show that our randomized rounding scheme obeys a negative correlation property, allowing us to show concentration bounds on the sizes of the objective functions . This significantly improves on the algorithm of [19]; we show asymptotically better approximation ratios in many regimes, and we also give a polynomial-time algorithm regardless of the number of objective functions.

1.3 Comparison with the Lovász Local Lemma

One type of rounding scheme that has been used for similar types of integer programs is based on the LLL; we contrast this with our approach taken here.

The LLL, first introduced in [6], is often used to show that a rare combinatorial structure can be randomly sampled from a probability space. In the basic form of randomized rounding, one must ensure that the probability of a “bad-event” (an undesirable configuration of a subset of the variables) — namely, that — is on the order of ; this ensures that, with high probability, no bad events occur. This accounts for the term in the approximation ratio. The power of the LLL comes from the fact that the probability of a bad-event is not compared with the total number of events, but only with the number of events it affects. Thus, one may hope to show approximation ratios which are independent of .

At a heuristic level, the LLL should be applicable to the CIP problem. We have a series of bad-events, one for each covering constraint. Furthermore, because of our assumption that the system is column-sparse, each variable only affects a limited number of these bad-events. Thus, it should be possible to use the LLL to obtain a scale-free approximation ratio.

There has been prior work applying the LLL to packing integer programs, such as [14]. One technical problem with the LLL is that it only depends on whether bad-events affect each other, not the degree to which they do so. Bad-events which are only slightly correlated are still considered as dependent by the LLL. Thus, a weakness of the LLL for integer programs with arbitrary coefficients (i.e. allowing ), is that potentially all the entries of could be extremely small yet non-zero, causing every constraint to affect each other by a tiny amount. For this reason, typical applications of the LLL to column-sparse integer programs have been phrased in terms of the column norm . For packing problems with no constraint-violation allowed, good approximations parametrized by , but not in general by , are possible [1].

In [11], Harvey addressed this technical problem by applying a careful, multi-step quantization scheme  with iterated applications of the LLL, to discrepancy problems with coefficient matrices where the norm of each column and each row is “small”.

The LLL, in its classical form, only shows that there is a small probability of avoiding all the bad-events. Thus, it does not lead to efficient algorithms. In [15], Moser & Tardos solved this longstanding problem by introducing a resampling-based algorithm. This algorithm initially samples all random variables from the underlying probability space, and will continue resampling subsets of variables until no more bad-events occur. Most applications of the LLL, such as [11], would yield polynomial-time algorithms using this framework.

In the context of integer programming, the Moser-Tardos algorithm can be extended in ways which go beyond the LLL itself. In [10], Harris & Srinivasan described a variant of the Moser-Tardos algorithm based on “partial resampling”. In this scheme, when one encounters a bad-event, one only resamples a random subset of the variables (where the probability distribution on which variables to resample is carefully chosen). This was applied for “assignment-packing” integer programs with small constraint violation. These bounds, like those of [11], depend on .

It is possible to formulate the CIP problem in the LLL framework, and to view our algorithm as a variant of the Moser-Tardos algorithm. This would achieve qualitatively similar bounds, albeit with asymptotics which are noticeably worse than the ones we give here. In particular, using the LLL directly, one cannot achieve approximation factors of the form when ; one obtains instead an approximation ratio of where is some constant strictly larger than one. The case when is more complicated and there the LLL-based approaches appear to be asymptotically weaker by super-constant factors.

The technical core of our algorithm is an adapation of the partial resampling MT algorithm of [10] combined with a methodology of [9] to yield improved probabilistic guarantees for LLL systems with correlated constraints. These techniques can only be used when the original fractional solution has entries which are small (and hence can be interpreted as probabilities); we develop a novel preprocessing step to handle large fractional entries which giving good guarantees on the multiplicity constraints.

Because so many different problem-specific techniques and calculations are combined with a variety of LLL techniques, it is cumbersome to derive our algorithm directly as a special case or corollary of results from [10]. For the most part, we will discuss our algorithm in a self-contained way, keeping the comparison with the LLL more as informal motivation than technical guide.

2 The RELAXATION algorithm

We first consider the case when all the values of are small; this turns out to be the critical case for this problem. In this case, we present an algorithm which we label RELAXATION. Initially, this algorithm draws each as an independent Bernoulli trials with probability , for some parameter . This will satisfy many of the covering constraints, but there will still be some left unsatisfied. We loop over all such constraint; whenever a constraint is unsatisfied, we modify the solution as follows: for each variable which has , we set to be an independent Bernoulli random variable with probability . Here is another parameter which we will also discuss how to select.

For the remainder of this Section 2, we assume throughout that and are given parameters, and that in addition we have for all . We assume also that satisfies the covering constraints, i.e. for all . These assumptions will not be stated explicitly in the remainder.

1:function relaxation(, , , , )
2:     for  from  do Initialization
3:               
4:     while  do The covering constraints are not all satisfied
5:         Let be minimal such that
6:         for  from  do
7:              if  then
8:                                               
9:     return
Algorithm 1 The RELAXATION algorithm

Note that this algorithm only increments the variables. Hence, when a constraint is satisfied, it will remain satisfied until the end of the algorithm.

Whenever we encounter an unsatisfied constraint and draw new values for the variables (lines 6–8), we refer to this as resampling the constraint . There is an alternative way of looking at the resampling procedure, which seems counterintuitive but will be crucial for our analysis. Instead of setting each variable with probability , we instead select a subset , where each currently satisfying goes into independently with probability . Then, for each variable , we draw , where . It is clear that this two-part sampling procedure is equivalent to the one-step procedure described in Algorithm 1. In this case, we say that is the resampled set for constraint . If (for any constraint ) we say that variable is resampled.

For every variable , we either have at the initial sampling, or first becomes equal to one during some resampling of a constraint ; or at the end of the algorithm. If for the first time at the resampling of constraint , we say turns at . If initially, we say that turns at .

In the algorithm as we have described, the first step is to set the variables as independent Bernoulli with probability . Our analysis, following [10] and [9], is based on an inductive argument, in which we consider what occurs when is set to some arbitrary value. If , then the algorithm is already finished. If not, there will be a series of modifications made to until it terminates. Given any fixed value of , we will show upper bounds on the probability of certain future events.

Lemma 2.1.

Let be subsets of . The probability that the first resampled sets for constraint are respectively is at most , where we define

Proof.

For any integer , any integer , any sets and any vector , we define the following random process and the following event : Suppose that instead of drawing as in line 3 of RELAXATION, we set , and we continue the remaining steps of the RELAXATION algorithm (lines 4–8) until done. We say that in this process event has occurred if:

  1. There are at total resamplings

  2. There are at least resamplings of constraint

  3. The first resampled sets for constraint are respectively .

We claim now that for any , and , and any integer , we have

(1)

(Note that by our assumption , and so the RHS of (1) is always well-defined.)

We shall prove (1) by induction on . For the base case () this is trivially true, because is impossible (there must be at least resamplings), and so the LHS of (1) is zero while the RHS is non-negative. We move on to the induction step.

If , then the RELAXATION algorithm performs no resamplings. Thus, if , then event is impossible and again (1) holds. On the other hand, if , then the RHS of (1) is equal to one, and again this holds vacuously. So we suppose ; let be minimal such that . Then the first step of RELAXATION is to resample constraint .

We observe that if for any , then the event is impossible. The reason for this is that we only resample variables which are equal to zero; thus variable can never be resampled for the remainder of the RELAXATION algorithm. In particular, we will never have in any resampled set. Thus, as , it is impossible for to eventually be the resampled sets for constraint . So if for any then (1) holds vacuously.

Let denote the value of the variables after the first resampling ( is a random variable). Then we observe that the remaining steps of the RELAXATION algorithm are equivalent to what would have occurred if we had set initially.

Now, suppose that . Then after the first resampling, the event becomes equivalent to the event . Thus, in this case, we have

and this shows the induction step as desired. (Note that here we are able to bound the probability of the event , even though is a random variable instead of a fixed vector, because our induction hypothesis applies to all vectors .

Next, suppose that . In this case, we observe that the following are necessary events for :

  1. , where is the first resampled set for constraint .

  2. For any , in the first resampling step (which includes variable ), we draw .

The condition (B2) follows from the observation, made earlier, that is impossible if but . Any such must be resampled (due to condition (B1)), and it must be resampled to become equal to zero.

Let us first bound the probability of the condition (B1). As we put each into with probability independently, the probability that all go into is . By the same token, if , then avoids going into with probability . Therefore, the overall probability of selecting is given by

By definition of , we have that . By Proposition A.1, we thus have:

further implying:

(2)

Next, let us consider the probability of (B2), conditional on (B1). Each is drawn independently as Bernoulli-; thus, the total probability of event (B2), conditional on (B1), is at most .

Finally, let us consider the probability of event (B3), conditional on (B1) and (B2). Observe that the event is conditionally independent of events (B1) and (B2), given . By the law of total probability, we have

Thus, as (B1), (B2), and (B3) are necessary conditions for , we have

and the induction claim again holds.

Thus, we have shown that (1) holds for any integer and any , and . Next, for any sets and any , let us define the event to be the event that, if we start the RELAXATION algorithm with , then the first resampled sets for constraint are respectively ; we make no condition on the total number of resamplings. Observe that we have the increasing chain

and . By countable additivity of the probability measure, we have:

So far, we have computed the probability of having be the first resampled sets for constraint , given that is fixed to an arbitrary initial value . We now can compute the probability that are the first resampled sets for constraint given that is drawn as independent Bernoulli-.

In the first step of the RELAXATION algorithm, we claim that a necessary event for to be the first resampled sets is to have for each ; the rationale for this is equivalent to that for (B2). This event has probability . Subsequently the event must occur.

The probability of , conditional on for all , is at most (by a similar argument to that of computing the probability of (B3) conditional on (B1), (B2)). Thus, the overall probability that the first resampled sets for constraint are is at most

as desired. ∎

We next compute ; such sums will recur in our calculations.

Proposition 2.2.

Suppose . For any constraint define

Then for all .

Proof.

We have

Also, noting that satisfies the covering constraints (i.e., , we have that

Proposition 2.3.

For any constraint and any , we have

Proof.

We have:

Now note that and hence . The claimed bound then holds. ∎

To gain some intuition about this expression , note that if we set (which is not necessarily the optimal choice for the overall algorithm), then we have

and this can be recognized as the Chernoff lower-tail bound. Namely, this is an upper bound on the probability that a sum of independent -random variables, with mean , will become as small as . This makes sense: for example at the very first step of the algorithm (before any resamplings are performed), then is precisely a sum of independent Bernoulli variables with mean . The event we are measuring (the probability that a constraint is resampled) is precisely the event that this sum is smaller than .

We next bound the running time of the algorithm.

Proposition 2.4.

Suppose . The expected number of resamplings steps made by the algorithm RELAXATION is at most .

Proof.

Consider the probability that there are resamplings of constraint . A necessary condition for this to occur is that there are sets such that are respectively the first resampled sets for constraint . Taking a union-bound over , we have:

As , this implies that the expected number of resamplings of constraint is at most

We also give a crucial bound on the distribution of the variables at the end of the resampling process.

Theorem 2.5.

Suppose for all and . Then for any , the probability that at the conclusion of RELAXATION algorithm is at most

Proof.

There are two possible ways to have : either turns at or it turns at for some , . The former event has probability .

Suppose that turns at . In this case, there must be sets such that:

  1. The first resampled sets for constraint are respectively

  2. During the th resampling of constraint , we set .

Now, observe that the probability of (C3), conditional on (C1), (C2), is . The reason for this is that event (C3) occurs after (C1), (C2) are already determined. Thus, we can use time-stochasticity to compute the conditional probability.

For any fixed and any fixed sets , the probability that are the first resampled sets is at most by Lemma 2.1

Thus, in total, the probability that events (C1)–(C3) hold for a fixed