Near Optimality in Covering and Packing Gamesby Exposing Global Information

Near Optimality in Covering and Packing Games
by Exposing Global Information

Maria-Florina Balcan School of Computer Science, Georgia Institute of Technology, Atlanta GA 30332. Email: ninamf@cc.gatech.edu.    Sara Krehbiel School of Computer Science, Georgia Institute of Technology, Atlanta GA 30332. Email: sarak@gatech.edu.    Georgios Piliouras School of Electrical & Computer Engineering, Georgia Institute of Technology, Atlanta GA 30332. Email: georgios.piliouras@ece.gatech.edu    Jinwoo Shin Algorithms & Randomness Center, Georgia Institute of Technology, Atlanta GA 30332. Email: jshin72@cc.gatech.edu.
Abstract

Covering and packing problems can be modeled as games to encapsulate interesting social and engineering settings. These games have a high Price of Anarchy in their natural formulation. However, existing research applicable to specific instances of these games has only been able to prove fast convergence to arbitrary equilibria. This paper studies general classes of covering and packing games with learning dynamics models that incorporate a central authority who broadcasts weak, socially beneficial signals to agents that otherwise only use local information in their decision-making. Rather than illustrating convergence to an arbitrary equilibrium that may have very high social cost, we show that these systems quickly achieve near-optimal performance.

In particular, we show that in the public service advertising model of [1], reaching a small constant fraction of the agents is enough to bring the system to a state within a factor of optimal in a broad class of set cover and set packing games or a constant factor of optimal in the special cases of vertex cover and maximum independent set, circumventing social inefficiency of bad local equilibria that could arise without a central authority. We extend these results to the learn-then-decide model of [2], in which agents use any of a broad class of learning algorithms to decide in a given round whether to behave according to locally optimal behavior or the behavior prescribed by the broadcast signal. The new techniques we use for analyzing these games could be of broader interest for analyzing more general classic optimization problems in a distributed fashion.

1 Introduction

Set covering and packing problems are important and interesting not only from a classical optimization point of view, but also as a game theoretic framework for analyzing social problems in which willful agents are inherent cost minimizers and for solving engineering systems problems in which programmable agents have some degree of autonomy in seeking solutions to distributed optimization problems. In this paper, we model covering and packing problems as games, and we use models from learning theory to describe local decision making by players in these games. As opposed to previous work, we are interested in demonstrating convergence not to arbitrary local equilibria but to states that are low cost relative to the global optimum. We accomplish this by incorporating a globally-informed central authority into natural behavior dynamics.

Problem.

Given a universe of elements with associated costs and a collection of sets of these elements, the minimum weighted set cover optimization problem is to choose the lowest cost subset of elements such that each set is represented by at least one chosen element. While this problem is NP-hard, good approximation algorithms exist. However, such algorithms tend to be centralized in nature and require global knowledge.

Game.

We analyze a setting in which a central authority knows a good approximation, but elements are modeled as only locally aware agents with cost functions representing a natural distributed game interpretation of the core optimization problem. We generalize the problem by not requiring total coverage, rather the importance of covering a given set is determined by its set weight. Each element that chooses to be on incurs his own cost , and each element that is off pays the sum of the weights of sets he participates in that do not contain any other on element. If the element costs are all smaller than the set weights, then the cost-minimizing set of on elements is also the optimal set cover. If additionally each set is of size two, then this is the special case of a minimum weighted vertex cover problem. By simply redefining the cost structure so that pays if he is off and the sum of weights of fully-covered sets he participates in if he is on, we can interpret this new game as a packing problem with maximum independent set as a special case.

Social and engineering applications.

Our motivation for this game theoretic approach is two-fold. The first setting is a social one in which agents have inherent costs associated with being on or off that correlate with the social objective. As a concrete example, suppose government wishes to set up a network of offices, say homeless shelters, that perform some service to the local community. Society would like the lowest cost solution that adequately addresses the needs of most communities, but for political reasons it may not be possible to enforce an optimal solution in a top-down manner. Furthermore, individual counties have competing interests in that they desire their own area to be served but incur some cost by opening a shelter.

Another motivation is the setting in which non-autonomous agents are programmed to make decisions based on their surroundings. The extensive literature on cooperative control has shown that in this setting many optimization problems can be conveniently solved in a distributed fashion by endowing agents with artificial individual objective functions and cost-minimizing behavior. Many of these games and dynamics models result in convergence to a Nash equilibrium, or local optimum. In particular, several papers have modeled sensor networks as a special case of our set cover game. The elements are autonomous sensors, and a geographic region is a set consisting of elements corresponding to sensors that could cover that region. A sensor that is on is charged some fixed cost, whereas a sensor that is off is charged a cost proportional to the number or importance of its adjacent regions that are uncovered by any other sensor. This application is particularly well-suited for cooperative control because sensors can only observe the behavior of other sensors in their neighborhoods, and the structure of the network may not be known ahead of time, making it impossible for a central designer to program the sensors with an optimal solution.

Equilibrium quality and dynamics models.

Much of the work on cooperative control and dynamics-based algorithmic game theory only guarantees that systems converge to some equilibrium. Many games, however, have a high Price of Anarchy (PoA), where PoA means the worst case ratio between the social cost in an equilibrium and that of the global optimal configuration (see Section 2.1 for its formal definition). The following special case illustrates that PoA is in our set cover game. Suppose agents (or players) are charged some amount when they are on and otherwise penalized 1 for every incident uncovered set. Then a star graph in which vertices are agents and edges are sets has a global optimum with only the center on, yielding social cost , compared to a low quality Nash Equilibrium in which only the center is off, yielding social cost .

The more general problem of dynamics for games with high PoA is addressed in [1, 2], in which authors propose three models of distributed and semi-selfish social behavior in a general repeated game setting. The models share the common feature that a central authority has knowledge of some joint strategy profile with low social cost, and this authority broadcasts this strategy in the hopes that players will adopt their prescribed strategies. Specifically, the public service advertising model (PSA) of [1] assumes that each agent independently has an probability of receiving and temporarily adopting the advertising strategy. Those that do not receive and adopt their prescribed strategy behave in a myopic best response manner. This model is well-suited for an engineering systems setting, where we do not expect all components to receive the central authority’s signal. The learning models of [2] assume that each agent uses any of a broad class of learning algorithms to continually choose between acting according to their local best response move and their broadcasted signal. In the learn-then-decide (LTD) model, agents eventually commit to one of these options. These models are better motivated by a social setting where agents that are only locally aware are interested in exploring the advertising strategy with the hopes that it will benefit them personally. These papers provide high quality guarantees for particular games, including fair cost-sharing and party affiliation games.

Our results.

The positive theoretical guarantees about social welfare in the outcomes of the games studied in the advertising and learning models of [1, 2] serve as motivation to use these models in studying our general general classes of set cover and packing games, which apply to engineering systems applications such as sensor networks as well as more purely game theoretic settings. For the case where costs of agents and weights of sets are bounded below and above by constants, we show the following:

  • In vertex cover games111As mentioned earlier, a set cover game where each set has size is called a vertex cover game, and in such games equilibria have natural connections to vertex covers in the graph induced by the sets (i.e. edges)., we show that for any advertising strategy ,

    the dynamics of agents converges to a state of expected cost in PSA and LTD models.
  • In set cover games, we show that for any advertising strategy ,

    the dynamics of agents converges to a state of expected cost ,

    where is the maximum number of sets containing given two agents.

  • In set cover games, we show that for a specific advertising strategy ,

    the dynamics of agents converge to a state of cost with high probability in PSA model.

    Moreover, we present a poly-time algorithm to find such a specific of low cost, i.e.

Furthermore, we emphasize that all the above convergence guarantees happen in polynomial number of steps in terms of the number of agents. As we mentioned earlier, without such advertising strategies, agents can be an inefficient equilibrium state of cost , even restricted to vertex cover games (i.e. ). We also discuss extensions to the case where the costs of agents and weights of sets are not bounded below or above by constants.

Related work.

Achieving global coordination in distributed multi-agent systems is a central problem of control theory with multiple real-world applications (see [16] and references therein). More specifically, several papers consider game theoretic formulations of covering problems which are inspired by practical sensor network problems [15, 11, 14, 3]. In particular, [3] analyzes a game that is a specific case of the problem addressed in this paper. However, [3] and many other control theory papers guarantee only convergence to stable states which are locally optimal. Since these games often have a high Price of Anarchy [10, 13], the results do not translate to global performance guarantees.

A number of approaches have been explored to circumvent such bad PoA results. In [17] the authors assume that the authorities enjoy complete control over some fraction of the agents. Similarly, [6, 7] focus on the problem of identifying and controlling the influential nodes of a network. While we also use a special type of advertising for improved results in Theorem 4, we do not require particular control over certain agents. Rather, the models we use from [1, 2] incorporate strategic behavior for all agents. Another line of research offers stronger performance guarantees using specific learning algorithms that employ equilibrium selection [9, 4, 5] or cyclic behavior [8]. Unfortunately, these techniques do not yield guarantees of fast convergence to good states in our class of games.

Our analysis builds on the works of [1, 2], in which authors propose game theoretic models of distributed and semi-selfish social behavior in a general repeated game setting. The models share the common feature that a central authority has knowledge of some joint strategy profile with low social cost, and this authority broadcasts this strategy in the hopes that players will adopt their prescribed strategies. These papers provide quality guarantees for particular games, including fair cost-sharing and party affiliation games. By using these models, we do not have to make the hard choice between enforcing top-down solutions (which may be infeasible in both engineering systems and social settings) and poor performance guarantees. Instead, we show that for a broad class of covering and packing problems, incorporating mild influence from a weak central authority guides the system into a near-optimal state when agents are only optimizing locally.

2 Preliminaries

2.1 Background on General Games

We represent a general game as a triple , where is a set of players, is the finite action space of player , and denotes the cost function of player . The joint action space of the players is . For a joint action , we denote by the actions of all players . Players’ cost functions map joint actions to non-negative real numbers, i.e.  for all . In this paper, we define a social cost function, , simply as the summation of individual players’ costs. The optimal social cost is denoted by

Given a joint action , the best response of player is the set of actions that minimizes player ’s cost subject to the other players’ fixed actions , i.e.

Best response dynamics is a process in which at each time step, an arbitrary player not already playing best response updates his action to one in his current best response set. A joint action is a pure Nash equilibrium if no player can benefit from deviating to another action, namely, for every .

A game is called an exact potential game [12] if there exists a potential function such that

for all , , and . For general potential games, only the signs of both sides of these equations must be equal. While general games are not guaranteed to have a pure Nash equilibrium, all finite potential games do and furthermore best response dynamics in such games always converges to a pure Nash equilibrium [12, 13]. However, the convergence time can be exponentially large in terms of the number of players in general.

Two well known concepts for quantifying the inefficiency of equilibria relative to non-equilibria are Price of Anarchy and Price of Stability. For the set of pure Nash equilibria of game , Price of Anarchy (PoA) and Price of Stability (PoS) are defined as

2.2 Covering Game

Given agents , a collection of sets , costs for , and weights for , we describe the covering game where actions and cost as defined in (1) for every agent . We let be the ‘-th order’ maximum degree of the hypergraph induced by sets. Namely,

In addition, we define

Before defining the cost functions, we introduce some notation. We say a set is ‘covered’ in joint strategy if for some . Otherwise, is said to be ‘uncovered’. Denote the collection of sets that include agent and are uncovered in with , or simply when is clear from context. The entire set of uncovered sets is written . For , define , and for , define . Now define the cost function of agent is defined with respect to any joint strategy as follows:

(1)

Observe that expresses how much agent prefers to cover the sets containing . For example, if , then each agent prefers to avoid the situation that there exists an uncovered sets containing her. As we explain in Section 5, these covering games can be interpreted as equivalent packing games.

For a joint action (or strategy profile) , let and be sets of nodes that are on and off, respectively. It is easy to check that the social cost has the following simple form:

(2)

Best response convergence.

Recall that best response dynamics converge to pure Nash equilibria for potential games. Now observe that the covering game is an exact potential game with potential function

(3)

Combining this observation with the social cost formula implies that for any we have

(4)

where we let be the size of the largest set i.e. .

Optimization and equilibrium quality.

The star graph example from the introduction reveals that PoA in the covering game can be very large. More generally, certain covering game instances exist with PoA even restricted to the simple case .222For example, let for all and let for all . Label an arbitrary set of elements , and label the other elements . Define to be all sets with one element in and one in . It is straightforward to check that the solution with all on and all off is a Nash equilibrium with cost , while the solution with all off and all on is a Nash equilibrium with cost . This motivates the need for efficient dynamics with better guarantees than convergence to arbitrary equilibria.

As a step in that direction, here we provide a centralized LP-rounding-based poly-time algorithm to find a low-cost configuration for the covering game as follows.

  • Solve the following Linear Programming (LP), and obtain the solution .

    (5)
  • Set

The following lemma proves that the algorithm is a -approximation one for minimizing the social cost (2).

Lemma 1.

The configuration obtained from the algorithm has

where we recall that .

Proof.

Let be the optimal configuration i.e. . If there exists a uncovered set under the configuration , choose one element from and force it to be turned on. Repeat this procedure until all sets are covered, and say be the resulting configuration. Now observe that . Therefore, it follows that

Under the assumption (6), this is an -approximation algorithm to the optimal social cost.

3 Public Service Advertising

In this section and the following one, we show that price of anarchy is avoidable in covering games even using best response-inspired dynamics as long as these dynamics incorporate some form of suggestion from a weak central authority that is aware of a high quality equilibrium.

The first model we study in this paper is the public service advertising (PSA) model in [1] in which a central authority broadcasts a strategy for each agent, which some agents receive and temporarily follow. Player behavior is described in two phases:

  • Play begins in an arbitrary state, and a central authority advertises joint action . Each agent receives the proposed strategy independently with probability . Agents that receive this signal are called receptive. Receptive agents play their advertising strategies throughout Phase 1, and non-receptive agents undergo best response dynamics to settle on a joint strategy that is a Nash equilibrium given the fixed behavior of receptive agents. We call this joint strategy .

  • All agents participate in best response dynamics until convergence to some Nash equilibrium .

Since our covering game is a potential game and all potential games eventually converge to a Nash equilibrium under best response dynamics, both phases are guaranteed to terminate. Furthermore, convergence occurs in poly-time with respect to parameters .333This is because is bounded above and below by functions of these parameters and decreases under best response dynamics.

3.1 Effect of Advertising in PSA

In this section we show that advertising helps significantly in covering games. In particular, we show that if the advertising strategy has low social cost, then the cost of the resulting equilibrium is low even if only a small constant fraction of the agents receive and respond to the signal. Theorem 2 formalizes the general result of this section, and Theorem 4 improves this result for particular advertising strategies. For the convenience, in this section we assume costs and weights are bounded above and below, i.e.

(6)
Theorem 2.

For any advertising strategy in the PSA model,

(7)

Theorem 2 implies that if is obtained from the -approximation poly-time algorithm described in Section 2.2, the following corollary holds.

Corollary 3.

There exists a poly-time algorithm to find an advertising strategy for the PSA model such that

Effective advertising.

We additionally consider advertising strategies particular to our game for improved performance of the model. We say that advertising strategy satisfies condition () if

()

where is the smallest number of sets containing a given on element in as the unique on element. We say is the ‘core’ minimum degree of on elements in . Intuitively, the condition means that each on element in the advertising strategy ‘solely’ contributes a large number of sets to cover. We establish the following stronger theorem which implies that agents will reach a state of social cost at the end of Phase 2 if satisfied the condition .

Theorem 4.

For an advertising strategy satisfying the condition in the PSA model,

The following corollary implies that it is possible to find such an advertising strategy of low cost.

Corollary 5.

There exists a poly-time algorithm to find an advertising strategy for the PSA model such that

Proof.

Here we explain how to find an advertising strategy satisfying the condition as well as being of low cost. Observe that any joint strategy with for a large enough constant (depending on constants , , ) satisfies the condition . Then starting from the joint strategy with social cost obtained from the algorithm in Section 2.2, one can greedily construct a joint strategy satisfying the condition with social cost (greedily turning off every agent that is the unique on element in fewer than sets). For the advertising strategy satisfying the condition as constructed above, the conclusion of Corollary 5 follows from Theorem 4. ∎

Proof of Theorem 2

From (4) and , any sequence of best response moves increases social cost by at most a constant factor. All agents best respond in Phase 2, and hence . It suffices to bound . At a high level, we do this by providing a bound (i.e. Lemma 6) on the total weight of uncovered sets that are not uncovered in and then we give a bound (i.e. Lemma 7) on the number of agents that are off in but on at the end of Phase 1.

First, let us introduce some notation. We say two agents contained in a common set are neighbors. Let and denote the set of agents that are on and off in , respectively. Let (and ) denotes the set of agents in (and ), who are off (and on) in . Let denote the collection of sets uncovered in , and let denote the collection of sets not in but uncovered in . Then from (2), (6) and , we have

where we note that

Therefore, the following two lemmas bounding and leads to the desired bound on , which completes the proof of Theorem 2 from .

Lemma 6.

.

Proof.

Each set in should contain an off element in that is best responding in . Hence,

where the second inequality is from the fact that is best responding (i.e. its cost exceeds the total weight of uncovered sets including it since it chooses to be off). This completes the proof of Lemma 6. ∎

Lemma 7.

.

Proof.

Since each element in plays best response in , should be contained in a set as the unique on element. We define disjoint sets and such that

and if . By definition of , it easily follows that

(8)

Now consider . Let be the collection of ‘left’ uncovered sets, i.e.  if is a non-empty subset of . Hence, by definition of , is in for each . This implies that

(9)

We let be the collection of sets containing a unique element in . Then, we have

(10)

This is because the number of sets with more than one element in is bounded by (remember that each pair of agents is contained in at most common sets). Clearly, there are no such sets when .

We now bound the expected size of . It follows that

(11)

where we let be the collection of sets including as the unique element in . Further, we observe that

where we define such that no pair of sets in have common elements in and the size of is not too small i.e. . From definitions of and , the existence of such set follows. Since no pair of sets in have common elements in , the events that all are receptive for become independent with each other and each happens with probability at least . Therefore,

(12)

Combining (11) and (12) implies that

(13)

where the last equality is from the following proposition of which proof is presented in Appendix A.

Proposition 8.

For constant and ,

Finally, combining (8), (9), (10) and (13) leads to the desired conclusion of Lemma 7, where note that when . ∎

Proof of Theorem 4

We will use the same notation and as in the proof of Theorem 2. As we explain in the proof of Theorem 2, it suffices to prove that the social cost at the end of Phase 1 is with probability .

To this end, the following lemma establishes the condition ensures that all agents in turn on with probability at the end of Phase 1. Under the event, only sets in are uncovered and the additional social cost incurred by agents in . The lemma shows that such additional cost is at most . Hence, this completes the proof of Theorem 4.

Lemma 9.

If the advertising strategy satisfies the condition , then

Proof.

We will use the same notation in the proof Lemma 7. As in the proof Lemma 7, for any there is some subset such that no pair of sets in have common elements in and .Then as we derived in in the proof Lemma 7,

where the last inequality is from the condition . From the union bound, and hence .

Now assume the event that all nodes in are on. Observe that for each best responding , is no greater than the total weight of all sets containing as the unique on agent. Since we assume all nodes in are on  these sets are a subset of . Further, since there is no overlap in these sets between different agents in , we can sum over all to derive . This completes the proof of Lemma 9.

3.2 Extension to Unbounded Costs and Weights

All the results and proof techniques in this paper naturally extend to general weights and costs. In particular, one can obtain the following theorem (analogous to Theorem 2) in the PSA model without the assumption (6) via calculating explicit quantities in each step in the proof of Theorem 2.

Theorem 10.

For any advertised strategy in the PSA model,

(14)

4 Learn-then-decide

We now study the set cover game in the learn-then-decide (LTD) model of  [2]. In contrast to PSA, agents in LTD are neither strictly receptive nor strictly best responders in the initial exploration phase, but they choose one of these options for the final exploitation phase:

  • Play begins in an arbitrary state, and a central authority advertises joint action . Player is associated with fixed probability . Agents are chosen to update uniformly at random for each of time steps. When updates, he plays with probability or best response with probability . The state at time is denoted .

  • At time , all agents in random order individually commit arbitrarily to or best response. Then agents take turns in random order playing their chosen strategy until they reach a Nash equilibrium given the fixed behavior of followers.

Effect of Advertising in LTD

For the convenience, in this section we again assume costs and weights are bounded above and below, i.e. the assumption (6). The following result in the LTD model is analogous to Theorem 2 in the PSA model.

Theorem 11.

There exists a such that for any advertising strategy in the LTD model,

(15)

Theorem 11 implies that if is obtained from the -approximation poly-time algorithm described in Section 2.2, the following corollary holds.

Corollary 12.

There exists a poly-time algorithm to find an advertising strategy for the LTD model such that

Proof of Theorem 11

To begin with, we note that while LTD differs from PSA in both phases, the proof that cost is low in Phase 1 of LTD is very similar to the proof of Theorem 2. However, showing that the cost stays low in Phase 2 imposes new challenges.

We will use the same notation as in the proof of Theorem 2. We first define for as the event that every element in updates at least once before time after every element in has updated at least once, and then every element in again updates at least once at some time . Clearly there exist some such that happens with the following high probability, i.e.

Then, we have

(16)

where for the inequality we use the fact that the social cost is always bounded above by from (2) and (6).

Therefore, it suffice to bound , where our choice of is primarily for guaranteeing that happens with such a high probability. We first bound the expected social cost at the end of Phase 1 under the event as below. And later, we will bound the increase in the social cost in Phase 2.

Lemma 13.
Proof.

Similarly as in the proof of Theorem 2, we again note that

(17)
(18)

We again remind that we will use the same notation as in the proof of Theorem 2.

Hence, it suffices to bound and in terms of and . First consider . We separately analyze the weights of two types of . First consider a set in , i.e. a set consisting only of elements in . Suppose we attribute the weight of such a set to its element that updated most recently before the end of Phase 1. Because played best response most recently, the weight of all sets in attributed to is at most . Summing over all gives

(19)

Now consider a set in , i.e. a set which has elements in both and and all of them are off at the end of Phase 1. By definition of , . Under assuming the event , the proof arguments to bound in the proof of Lemma 7 identically work in the LTD model (using instead of ), i.e. we have

(20)

From (19) and (20), it follows that

(21)

Now under assuming event , one can observe that the conclusion and proof strategy of Lemma 7 also works for in the LTD model, i.e.

(22)

Therefore, combining these bounds (17), (18), (21) and (22) leads to the desired bound of Lemma 13. ∎

We now bound the cost increase in Phase 2 assuming . From (4) and it suffices to provide a bound on the expected increase in the potential function throughout Phase 2, i.e.

(23)

The following lemma bounds the expected potential increase under assuming event . Finally, combining (16), (23), Lemma 13 and Lemma 14 lead to the desired conclusion of Theorem 11.

Lemma 14.
Proof.

Since best response moves do not increase the potential function , we only consider updates of agents following the advertising strategy in Phase 2. Since each ‘ follower’ changes strategies at most once in Phase 2, it suffices to consider a single off-on move (following ) for each agent in and a single on-off move (following ) for each agent in . For each , an off-on movie (i.e.  changes his decision from off to on) increases potential by at most . Hence,

the total potential increase by on-off moves is at most . (24)

Now consider another type of moves, i.e. a single on-off move for each agent in . For each that first turns off at time , let be the collection of sets containing such that all of their other elements are off at time . Then the potential increases by at most at time . Hence,

the total potential increase by off-on moves is at most . (25)

We will bound the expectation of . To this end, we consider two types of set including an element in : (a) has an element that was on at the end of Phase 1, and (b) otherwise. Observe that the expected number of sets of type (b) which does not in is already bounded in the proof of Lemma 13 by i.e.

Therefore, we have

(26)

Thus, we only focus on set of type (a). Let be the sets containing and all of their elements in being off at the end of Phase 1. Our key observation here is that set of type (a) will only possibly become uncovered when an turns off if all but at most sets in have an element in that updates before updates. Otherwise, have too many uncovered sets to turn off before turns off (hence, it remains on). For an arbitrary updating ordering of agents in , there are at least