Learning in Auctions: Regret is Hard, Envy is Easy

Learning in Auctions: Regret is Hard, Envy is Easy

Constantinos Daskalakis
EECS, MIT
costis@csail.mit.edu
Supported by a Microsoft Research Faculty Fellowship, and NSF Award CCF-0953960 (CAREER) and CCF-1551875. This work was done in part while the author was visiting the Simons Institute for the Theory of Computing.
   Vasilis Syrgkanis
Microsoft Research, NYC
 vasy@microsoft.com
This work was done in part while the author was visiting the Simons Institute for the Theory of Computing.
Abstract

A large line of recent work studies the welfare guarantees of simple and prevalent combinatorial auction formats, such as selling items via simultaneous second price auctions (SiSPAs) [CKS08, BR11, FFGL13]. These guarantees hold even when the auctions are repeatedly executed and the players use no-regret learning algorithms to choose their actions. Unfortunately, off-the-shelf no-regret learning algorithms for these auctions are computationally inefficient as the number of actions available to each player is exponential. We show that this obstacle is insurmountable: there are no polynomial-time no-regret learning algorithms for SiSPAs, unless , even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions.

To answer this question, we propose a novel concept of learning in auctions, termed “no-envy learning.” This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have valuations from the broad class of fractionally subadditive (XOS) valuations (assuming demand oracle access to the valuations) or coverage valuations (even without demand oracles). No-envy learning outcomes are a relaxation of no-regret learning outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our result for XOS valuations can be viewed as the first instantiation of approximate welfare maximization in combinatorial auctions with XOS valuations, where both the designer and the agents are computationally bounded and agents are strategic. Our positive and negative results extend to many other simple auction formats that have been studied in the literature via the smoothness paradigm.

Our positive results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts and states of nature are both infinite, and the payoff function of the learner is non-linear. We show that this algorithm has applications outside of auction settings, establishing big gains in a recent application of no-regret learning in security games. Our efficient learning result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.

1 Introduction

A central challenge in Algorithmic Mechanism Design is to understand the effectiveness and limitations of mechanisms to induce economically efficient outcomes in a computationally efficient manner. A practically relevant and most actively studied setting for performing this investigation is that of combinatorial auctions.

This setting involves a seller with a set of indivisible items, which he wishes to sell to a set of buyers. Each buyer is characterized by a valuation function , assumed monotone, which maps each bundle of items to the buyer’s value for this bundle. This function is known to the buyer, but is unknown to the seller and the other buyers. The seller’s goal is to find a partition of the items together with prices so as to maximize the total welfare resulting from allocating bundle to each buyer and charging him . The total buyer utility from such an allocation would be and the seller’s revenue would be , so the total welfare from such an allocation would simply be .

Given the seller’s uncertainty about the buyer’s valuations, she needs to interact with them to select a good allocation. However, the buyers are strategic, aiming to optimize their own utility, . Hence, the seller needs to design her allocation and price computation rules carefully so that a good allocation is found despite the agents’ strategization in response to these rules. How much of the optimal welfare can the seller guarantee?

A remarkable result in Economics is that welfare can be exactly optimized, as long as we have unbounded computational and communication resources, via the celebrated VCG mechanism [Vic61, Cla71, Gro73]. This mechanism asks bidders to report their valuations, uses their reports at face value to select an optimal partition of the items, and computes payments in a way that it is in the best interest of all bidders to truthfully report their valuations; in particular, it is a dominant strategy truthful mechanism, and because of its truthfulness it guarantees that an optimal allocation is truly selected.

Despite its optimality and truthfulness, the VCG mechanism is overly demanding in terms of both computation and communication. Reporting the whole valuation functions is too expensive for the bidders to do for most interesting types of valuations. Moreover, optimizing welfare exactly with respect to the reported valuations is also difficult in many cases. Unfortunately, if we are only able to do it approximately, the truthfulness of the VCG mechanism disappears, and no welfare guarantees can be made. Even with computational concerns set aside, it is widely acknowledged that the VCG mechanism is rarely used in practice [AM06]. At the same time, many practical scenarios involve the allocation of items through simple mechanisms which are often not centrally designed and non-truthful. Take eBay, for example, where several different items are sold simultaneously and sequentially via ascending price and other types of auctions. Or consider sponsored search where several keywords are auctioned simultaneously and sequentially using generalized second price auctions. For most interesting families of valuations such environments induce non truthful behavior, and are thus difficult to study analytically.

The prevalence of such simple decentralized auction environments provides motivation for a quantitative analysis of the quality of outcomes in simple non-truthful mechanisms. A growing volume of research has taken up this challenge, developing tools for studying the welfare guarantees of non-truthful mechanisms; see e.g. [Bik99, CKS08, BR11, HKMN11, FKL12, ST13, FFGL13]. Using the approximation perspective, this literature bounds the Price-of-Anarchy (PoA) of simple non-truthful mechanisms, and has provided remarkable insights into their economic efficiency.

To illustrate these results, let us consider Simultaneous Second Price Auctions, which we will abbreviate to “SiSPAs” in the remainder of this paper. While we focus our attention on these auctions, our results extend to the most common other forms of auctions studied in the PoA literature; see Section 7 for a discussion. As implied by its name, a SiSPA asks every bidder to bid on each of the items separately and allocates each item using a second price auction based on the bids submitted solely for this item.

Facing a SiSPA, a bidder whose valuation is non-additive is not able to express his complex preferences over bundles of items. It is thus a priori not clear how he will bid, and what the resulting welfare will be. One situation where a prediction can be made is when the bidders have some information about each other, either knowing each other’s valuations, or knowing a distribution from which each others valuations are drawn. In this case, we can study the SiSPA’s Nash or Bayesian Nash equilibrium behavior, computing the welfare in equilibrium. Remarkably, the work on the PoA of mechanisms has shown that the equilibrium welfare of SiSPAs (and of other types of simple auctions) is guaranteed to be within a constant factor of optimum, even when the bidders’ valuations are subadditive [FFGL13].111A subadditive valuation is one satisfying for all . When bidders have no information about each other, the problem becomes ill-posed, as it is impossible for the bidders to form beliefs about each others bids in order to choose their own bid.

A way out of the conundrum comes from the realization that simple mechanisms often occur repeatedly, involving the same set of bidders; think sponsored search. In such a setting it is natural to assume that bidders engage in learning to compute their new bids as a function of their experience so far. One of the most standard types of learning behavior is that of no-regret learning. A bidder’s bids over executions of a SiSPA satisfy the no-regret learning guarantee if the bidder’s cumulative utility over the executions is within an additive of the cumulative utility that the bidder would have achieved from the best in hindsight vector of bids , if he were to place the same bid on item in all executions of the SiSPA. Assuming that bidders use no-regret learning to update their bids in repeated executions of a SiSPA (or other types of simple auctions) the afore-referenced work has shown that the average bidder welfare across the executions is within a constant factor of the otpimal welfare, even when the bidders’ valuations are subadditive [FFGL13].

These guarantees are astounding, especially given the intractability results for dominant strategy truthful mechanisms, which hold even when the bidders have submodular valuations [Dob11, DV12, DV15]—a family of valuations that is smaller than subadditive.222A submodular valuation is one satisfying , for all . However, moving to simple non-truthful auctions does not come without a cost. Cai and Papadimitriou [CP14] have recently established intractability results for computing Bayesian-Nash equilibria in SiSPAs, even for quite simple types of valuations, namely mixtures of additive and unit-demand [CP14].333A unit-demand valuation is one satisfying , for all . At the same time, implementing no-regret learning in combinatorial auctions is quite tricky as the action space of the bidders explodes. For example, in SiSPAs there is a continuum of possible bid vectors that a bidder may submit and, even if we tried to discretize this set, their number would typically be exponential in the number of items in order to maintain a good approximation from the discretization. Unfortunately, no-regret algorithms typically require in every step computation that is linear in the number of available actions, hence in our case exponential in the number of items.

An important open question in the literature has thus been whether this obstacle can be overcome via specialized no-regret algorithms that only need polynomial computation. Our first result shows that this obstacle is insurmountable. We show that in one of the most basic settings where no-regret learning is non-trivial, it cannot be implemented in polynomial-time unless RP NP.

Theorem 1.

Suppose that a unit-demand bidder whose value for each item is participates in executions of a SiSPA. Unless RP NP, there is no learning algorithm running in time polynomial in , , and and whose regret is any polynomial in , , and . The computational hardness holds even when the learner faces i.i.d. samples from a fixed distribution of competing bids, and whether or not no-overbidding is required of the bids produced by the learner.

Note that our theorem proves an intractability result even if pseudo-polynomial dependence on the description of is permitted in the regret bound and the running time. The no-overbidding assumption mentioned in the statement of our theorem represents a collection of conditions under which no-regret learning in second-price auctions gives good welfare guarantees [CKS08, FFGL13]. An example of such no-overbidding condition is this: For each subset , the sum of bids across items in does not exceed the bidder’s value for bundle . Sometimes this condition is only required to hold on average. It will be clear that our hardness easily applies whether or not no-overbidding is imposed on the learner, so we do not dwell on this issue more in this paper.

How can we show the in-existence of computationally efficient no-regret learning algorithms? A crucial (and general) connection that we establish in this paper is that it suffices to prove an inapproximability result for a corresponding offline combinatorial optimization problem. More precisely, we prove Theorem 1 by establishing an inapproximability result for an offline optimization problem related to SiSPAs, together with a “transfer theorem” that transfers in-approximability from the offline problem to intractability for the online problem. The transfer theorem is a generic statement applicable to any online learning setting. In particular, we show the following; see Section 3 for details.

  1. In SiSPAs, finding the best response payoff against a polynomial-size supported distribution of opponent bids is strongly NP-hard to additively approximate for a unit-demand bidder. Another way to say this is that one step of a specific learning algorithm, namely Follow-The-Leader (FTL), is inapproximable. See Theorems 18 and 5.

  2. In any setting where finding an optimum for an explicitly given distribution of functions over some set is hard to additively approximate, no efficient no-regret learner against sequences of functions from exists, unless . This result is generic, saying that whenever one step of FTL is inapproximable, there is no no-regret learner. See Theorem 19.

The intractability result of Theorem 1 casts shadow in the ability of computationally bounded learners to achieve no-regret guarantees in combinatorial auctions where their action space explodes with the number of items and the number of items is large. We have shown this for SiSPAs, but our techniques easily extend to Simultaneous First Price Auctions, and we expect to several other commonly studied mechanisms for which PoA bounds are known. With the absence of efficiently implementable learning algorithms, it is unclear when we should expect computationally bounded bidders to actually converge to approximately efficient outcomes in these auctions.

From a design standpoint it may be interesting to identify conditions for the bidder valuations and the format of the auction under which no-regret learning is both efficiently implementable and leads to approximately optimal outcomes. While this direction is certainly interesting, it would not address the question of what welfare we should expect of SiSPAs and other simple auctions that have been studied in the literature, or how much of the PoA bounds can be salvaged for computationally bounded bidders. Moreover, recent results of Braverman et al. [BMW16] show that for a large class of auction schemes where no-regret algorithms are efficiently computable, no-better than a logarithmic in the number of items welfare guarantee can be achieved (which is achievable by the single-bid auction of [DMSW15]).

We propose an alternative approach to obtaining robust welfare guarantees of simple auctions for computationally bounded players by introducing a new type of learning dynamics, which we call no-envy, and which are founded upon the concept of Walrasian equilibrium. In all our results, no-envy learning outcomes are a super-set of no-regret learning outcomes. We show that this super-set simultaneously achieves two important properties: i) while being a broader set, it still maintains the welfare guarantees of the set of no-regret learning outcomes established via PoA analyses; ii) there exist computationally efficient no-envy learning algorithms; when these algorithms are used by the bidders, their joint behavior converges (in a decentralized manner) to the set of no-envy learning outcomes for a large class of valuations (which includes submodular). Thus no-envy learning provides a way to overcome the computational intractability of no-regret learning in auctions with implicitly given exponential action spaces. We describe our results in the following section. We will focus our attention on SiSPAs but the definition of no-envy learning naturally extends to any mechanism and all our positive results extend to a large class of smooth mechanisms; see Section 7.

CCE No-RegretNo-EnvyNEPoA IntractablePoA TractablePoA CEfor unit-demand
Figure 1: We depict the state of the world for simultaneous second price auctions with XOS bidders. NE denotes the set of Nash equilibria, CE the set of correlated equilibria, CCE the set of coarse correlated equilibria which are equivalent to the limit empirical distributions of no-regret dynamics. Last with No-Envy we denote the limit empirical distributions of no-envy dynamics. PoA refers to the ratio of the optimal welfare over the worst welfare achieved by any solution in each set. Tractability in this figure refers to the existence of polynomial time decentralized algorithms that each player can invoke in the agnostic setting and converge to the no-regret or no-envy condition at a polynomial error rate.

1.1 No-Envy Dynamics: Computation and Welfare.

No-envy dynamics is a twist to no-regret dynamics. Recall that in no-regret dynamics the requirement is that the cumulative utility of the bidder after rounds be within an additive error of the optimum utility he would have achieved had he played the best fixed bid in hindsight. In no-envy dynamics, we require that the bidder’s cumulative utility be within an additive of the optimum utility that he would have achieved if he was allocated the best in hindsight fixed bundle of items in all rounds and paid the price of this bundle in each round. The guarantee is inspired by Walrasian equilibrium: In auctions, the prices that a bidder faces on each bundle of items is determined by the bids of the other bidders. Viewed as a price-taker, the bidder would want to achieve utility at least as large as the one he would have achieved if he purchased his favorite bundle at its price. No-envy dynamics require that the average utility of the bidder across rounds is within of what he would have achieved by purchasing the optimal bundle at its average price in hindsight.

Inspired by Walrasian equilibrium, no-envy learning defines a natural benchmark against which to evaluate an online sequence of bids. It is easy to see that in SiSPAs the no-regret learning requirement is stronger than the no-envy learning requirement. Indeed, the no-envy requirement is implied by the no-regret requirement against a subset of all possible bid vectors, namely those in . So no-envy learning is more permissive than no-regret learning, allowing for a broader set of outcomes. This not true necessarily for other auction formats, but it holds for the types of valuation functions and auctions studied in this paper; (see proof of Lemma 6 and Definition 12). In particular, in all our no-envy learning upper bounds the set of outcomes reachable via no-envy dynamics is always a superset of the outcomes reachable via no-regret dynamics. Moreover, this is true even if the no-envy dynamics are constrained to not overbid.

To summarize, for all types of valuations and auction formats studied in this paper, no-envy learning is a relaxation of no-regret learning, permitting a broader set of outcomes. While no-regret learning outcomes are intractable, we show that this broader set of outcomes is tractable. At the same time, we show that this broader set of outcomes maintains approximate welfare optimality. So we have increased the set of possible outcomes, but maintained their economic efficiency and endowed them with computational efficiency.

We proceed to describe our results for the computational and economic efficiency of no-envy learning. Before proceeding, we should point out that, while in our world the no-envy learning guarantee is a relaxation of the no-regret learning guarantee, the problem of implementing no-envy learning sequences remains similarly challenging. Take SiSPAs, for example. As we have noted no-envy learning is tantamount to requiring the bidder to not have regret against all bid vectors in . This set is exponential in the number of items , so it is unclear how to run an off-the-shelf no-regret learner efficiently. In particular, we are still suffering from the combinatorial explosion in the number of actions, which lead to our lower bound of Theorem 1. Yet the curse of dimensionality is now much more benign. Our upper bounds, discussed next, establish that we can harness big computational savings when we move from competing against any bid vector to competing against bid vectors in . Except to do this we still need to develop new general-purpose, no-regret algorithms for online learning settings where the number of experts is exponentially large and the cost/utility functions are arbitrary.

1.1.1 Efficient No-Envy Learning.

We show that no-envy learning can be efficiently attained for bidders with fractionally subadditive (XOS) valuations. A valuation belongs to this family if for some collection of vectors , where each , it satisfies:

(1)

Note that the XOS class is larger than that of submodular valuations. In many applications, the set describing an XOS valuation may be large. Thus instead of inputting this set explicitly into our algorithms, we will assume that we are given an oracle, which given returns the vector such that . Such an oracle is known as an XOS oracle [DS06, Fei06]. We will also sometimes assume, as it is customary in Walrasian equilibrium, that we are given access to a demand oracle, which given a price vector returns the bundle maximizing . We show the following.

Theorem 2.

Consider a bidder with an XOS valuation participating in a sequence of SiSPAs. Assuming access to a demand and an XOS oracle for ,444For submodular valuations this is equivalent to assuming access only to demand oracles, as XOS oracles can be simulated in polynomial time assuming demand oracles [DNS10] there exists a polynomial-time algorithm for computing the bidder’s bid vector at every time step such that after iterations the bidder’s average utility satisfies:

(2)

where is the average cost of item in the executions of the SiSPA as defined by the bids of the competing bidders, is an upper bound on the competing bid for any item and is an upper bound on . The learning algorithm with the above guarantee also satisfies the no overbidding condition that the sum of bids for any set of items is never larger than the bidder’s value for that set. Moreover, the guarantee holds with no assumption about the behavior of competing bidders. Finally, extensions of this algorithm to other smooth mechanisms are provided in Section 7.

The proof of Theorem 2 is carried out in three steps, of which the first and last are specific to SiSPAs, while the second provides a general-purpose Follow-The-Perturbed-Leader (FTPL) algorithm in online learning settings where the number of experts is exponentially large and the cost/utility functions are arbitrary:

  1. The first ingredient is simple, using the XOS oracle to reduce no-envy learning in SiSPAs to no-regret learning in a related “online buyer’s problem,” where the learner’s actions are not bid vectors but instead what bundle to purchase, prior to seeing the prices; see Definition 6. Theorem 6 provides the reduction from no-envy learning to this problem using XOS oracles. This reduction can also be done (albeit starting from approximate no-envy learning) for several mechanisms that have been analyzed through the smoothness framework of [ST13] as we elaborate in Section 7.

  2. The second step proposes a FTPL algorithm for general online learning problems where the learner chooses some action and the environment chooses some state , from possibly infinite, unstructured sets and , and where the learner’s reward is tied to these choices through some function that need not be linear. Since need not have finite-dimensional representation and need not be linear, we cannot efficiently perturb (either explicitly or implicitly) the cumulative rewards of the elements in as required in each step of FTPL [KV05]; see [BCB12] and its references for an overview of such approaches. Instead of perturbing the cumulative rewards of actions in directly, our proposal is to do this indirectly by augmenting the history that the learner has experienced so far with some randomly chosen fake history, and run Follow-The-Leader (FTL) subject to these augmentations. While it is not a priori clear whether our perturbation approach is a useful one, it is clear that our proposed algorithm only needs an offline optimization oracle to be implemented, as each step is an FTL step after the fake history is added. When applying this algorithm to the online buyer’s problem from Step 1, the required offline optimization oracle will conveniently end up being simply a demand oracle.

    Our proposed general purpose learner is presented in Section 5. The way our learner accesses function is via an optimization oracle, which given a finite multiset of elements from outputs an action in that is optimal against the uniform distribution over the multiset. See Definition 7. In Theorem 8, we bound the regret experienced by our algorithm in terms of ’s stability. Roughly speaking, the goal of our randomized augmentations of the history in each step of our learning algorithm is to smear the output of the optimization oracle applied to the augmented sequence over , allowing us to couple the choices of Be-The-Perturbed-Leader and Follow-The-Pertubed-Leader.

  3. To apply our general purpose algorithm from Theorem 8 to the online buyer’s problem for SiSPAs from Step 1, we need to bound the stability of the bidder’s utility function subject to a good choice of a history augmentation sampler. This is done in Section 5.1. There turns out to be a simple sampler for our application here, where only one price vector is added to the history, whose prices are independently distributed according to an exponential distribution with mean and variance .

  4. While our motivation comes from mechanism design, our FTPL algorithm from Step 2 is general purpose, and we believe it will find applications in other settings. We provide some relevant discussion in Section F, where we show how our algorithm implies regret bounds independent of when is finite, as well as quantitative improvements on the regret bounds of a recent paper of Balacn et al. for security games [BBHP15].

In the absence of demand oracles, we provide positive results for the subclass of XOS called coverage valuations. To explain these valuations, consider a bidder with needs, , associated with values . There are available items, each covering a subset of these needs. So we can view each item as a set of the needs it satisfies. The value that the bidder derives from a set of the items is the total value from the needs that are covered, namely:

(3)
Theorem 3.

Consider a bidder with an explicitly given coverage valuation participating in a sequence of SiSPAs. There exists a polynomial-time algorithm for computing the bidder’s bid vector at every time step such that after iterations the bidder’s utility satisfies:

(4)

where , and are as in Theorem 2, and the algorithm satisfies the same no overbidding condition stated in that theorem. There is no assumption about the behavior of the competing bidders, and extensions of this algorithm to other smooth mechanisms are provided in Section 7.

Notice that our no-envy guarantee (4) in Theorem 3 has incurred a loss of a factor of in front of , compared to the no-envy guarantee (2). This relaxed guarantee is an even broader relaxation of the no-regret guarantee. Still, as we show in the next section this does not affect our approximate welfare guarantees. We prove Theorem 3 via an interesting connection between the online buyer’s problem for coverage valuations and the convex rounding approach for truthful welfare maximization proposed by [DRY11]. In the online buyer’s problem, recall that the buyer needs to decide what set to buy at each step, prior to seeing the prices. It is natural to have the buyer include each item to his set independently, thereby defining an expert for all points , where is the probability that item is included. It turns out that the expected utility of the buyer under such distribution is not necessarily convex, so this choice of experts turns our online learning problem non-convex. Instead we propose to massage each expert into a distribution and run online learning on the massaged experts. In Definition 15 we put forth conditions for the massaging operation under which online learning becomes convex and gives approximate no-regret (Lemma 23). We then instantiate with the Poisson sampling of [DRY11], establishing Theorem 3. Our approach is summarized in Section 6 and the details can be found in Appendix E.

1.1.2 Welfare Maximization.

Arguably one of the holy grails in Algorithmic Mechanism Design, since its inception, has been to obtain polynomial-time mechanisms optimizing welfare in incomplete information settings. We show that SiSPAs achieve constant factor approximation welfare guarantees for the broad class of XOS valuations at every no-envy or approximate no-envy learning outcome. Thus the relaxation from no-regret to no-envy learning does not degrade the quality of the welfare guarantees, and has the added benefit that no-envy outcomes can be attained by computationally bounded players in a decentralized manner, using our results from the previous section. In Section 7, we show that this property applies to a large class of mechanisms that have been analyzed in the literature via the smoothness paradigm [ST13].

Corollary 4.

When each bidder participating in a sequence of SiSPAs has an XOS valuation (endowed with a demand and XOS oracle) or an explicitly given coverage valuation , there exists a polynomial-time computable learning algorithm such that, if each bidder employs this algorithm to compute his bids at each step , then after rounds the average welfare is guaranteed to be at least:

(5)

If all bidders have XOS valuations with demand and XOS oracles the factor in front of OPT is .

We regard Corollary 4, in particular our result for XOS valuations with demand queries, as alleviating the intractability of no-regret learning in simple auctions. It also provides a new perspective to mechanism design, namely mechanism design with no-envy bidders. In doing so, it proposes an answer to the question raised by [FGL15] about whether demand oracles can be exploited for welfare maximization with submodular bidders. We show a positive answer for the bigger class of XOS valuations, albeit with a different solution concept. (It still remains open whether there exist poly-time dominant strategy truthful mechanisms for submodular bidders with demand queries.) We believe that no-envy learning is a fruitful new approach to mechanism design, discussing in Section 7 the meaning of the solution concept outside of SiSPAs.

2 Preliminaries

We analyze the online learning problem that a bidder faces when participating in a sequence of repeated executions of a simultaneous second price auction (SiSPA) with items. While we focus on SiSPAs our results extend to the most commonly studied formats of simple auctions, as discussed in Section 7. A sequence of repeated executions of a SiSPA corresponds to a sequence of repeated executions of a game involving players (bidders). At each execution , each player submits a bid on each item . We denote by the vector of bidder ’s bids at time and by the profile of bids of all players on all items. Given these bids, each item is given to the bidder who bids for it the most and this bidder pays the second highest bid on the item. Ties are broken according to some arbitrary tie-breaking rule. Each player has some fixed (across executions) valuation over bundles of items. If at time he ends up winning a set of items and is asked to pay a price of for each item , then his utility is , i.e. his utility is assumed quasi-linear. An important class of valuations that we will consider in this paper is that of XOS valuations, defined in Equation (1), which are a super-set of submodular valuations but a subset of subadditive valuations. We will also consider the class of coverage valuations, defined in Equation 3, which are a subset of XOS. Different results will consider different types of access to an XOS valuation through an XOS oracle, a demand oracle, or a value oracle, as described in the introduction. For more properties of these oracles see [DNS10].

Online bidding problem.

From the perspective of a single player , all that matters to him to calculate his utility in a SiSPA is the highest bid submitted by the other bidders on each item , as well as the probability that he wins each item if he ties with the highest other bid on that item. For simplicity of notation, we will assume throughout the paper that the player always loses an item when he ties first. All our results, both positive and negative, easily extend to the more general case of arbitrary bid-profile dependent tie-breaking. Since we will analyze learning from the perspective of a single player, we will drop the index of player . For a fixed bid profile of the opponents, we will refer to the highest other bid on item as the threshold of item and denote it with . We also denote with . The player wins an item if he submits a bid and loses the item otherwise. When he wins item , he pays . We are interested in learning algorithms that achieve a no-regret guarantee even when the thresholds of the items are decided as it is customary by an adversary. Thus, the online learning problem that a player faces in a simultaneous second price auction is defined as follows:

Definition 1 (Online bidding problem).

At each execution/day/time/step , the player picks a bid vector and the adversary picks adaptively (based on the history of the player’s past bid vectors but not on the bidder’s current bid vector ) a threshold vector . The player wins the set and gets reward:

(6)

We allow a learning algorithm to be randomized, i.e. submit a random bid vector at each step whose distribution may depend on the history of past threshold vectors. We will evaluate a learning algorithm based on its regret against the best fixed bid vector in hindsight.

Definition 2 (Regret of Learning Algorithm).

The expected average regret of a randomized online learning algorithm against a sequence of threshold vectors is:

(7)

where recall that is random and depends on , as specified by the online learning algorithm. The regret of the algorithm against an adaptive adversary is the maximum regret against any adaptively chosen sequence of threshold vectors. An algorithm has polynomial regret rate if .

3 Hardness of No-Regret Learning

We will show that there does not exist an efficiently computable learning algorithm with polynomial regret rate for the online bidding problem for SiSPAs unless RP NP, proving a proof of Theorem 1. We first examine a related offline optimization problem which we show is -hard to approximate to within a small additive error. We then show how this inapproximability result implies the non-existence of polynomial-time no-regret learning algorithms for SiSPAs unless RP NP. Throughout this section we will consider the following very restricted class of valuations: the player is unit-demand and has a value for getting any item, i.e. his value for any set of items is given by . Our intractability results are strong intractability results in the sense that they hold even if we assume that is provided in the input in unary representation.

Optimal Bidding Against An Explicit Threshold Distribution is Hard.

We consider the following optimization problem:

Definition 3 (Optimal Bidding Problem).

A distribution of threshold vectors over a set of items is given explicitly as a list of vectors, where is assumed to choose a uniformly random vector from the list. A bidder has a unit-demand valuation with the same value for each item, given in unary. The problem asks for a bid vector that maximizes the bidder’s expected utility against distribution . In fact, it only asks to compute the expected value from an optimal bid vector, i.e.

(8)

We show that the optimal bidding problem is -hard via a reduction from -regular set-cover. In fact we show that it is hard to approximate, up to an additive approximation that is inverse-polynomially related to the input size. This will be useful when using the hardness of this problem to deduce the in-existence of efficiently computable learning algorithms with polynomial regret rates.

Theorem 5 (Hardness of Approximately Optimal Bidding).

The optimal bidding problem is -hard to approximate to within an additive even when: the threshold vectors in the support of (the explicitly given distribution) take values in , , and .

An interesting interpretation Theorem 5 is that the Follow-The-Leader (FTL) algorithm is intractable in SiSPAs for unit-demand bidders. Indeed, every step of FTL needs to find a bid vector that is a best response to the empirical distribution of the threshold vectors that have been encountered so far. See Theorem 18 in Appendix A and the discussion around this theorem.

Efficient No-Regret implies Poly-time Approximately Optimal Bidding.

Given the hardness of optimal bidding in SiSPAs, we are ready to sketch the proof of our main impossibility result (Theorem 1) for online bidding in SiSPAs. Our result holds even if the possible threshold vectors that the bidder may see take values in some known discrete finite set. It also holds even if we weaken the regret requirements of the online bidding problem, only requiring that the player achieves no-regret with respect to bids of the form , i.e., the bid on each item is either or an -th faction of the player’s value. Notice that any such bid is a non-overbidding bid. Hence, the no-regret requirement that we impose is weaker than achieving no-regret against any fixed bid/any fixed no-overbidding bid. We will refer to the afore-described weaker learning task as the simplified online bidding problem. We sketch here how to deduce from the inapproximability of optimal bidding the impossibility of polynomial-time no-regret learning (even for the simplified online bidding problem), deferring full details to Appendix A.2.

Proof sketch of Theorem 1. We present the structure of our proof and the challenges that arise, leaving details for Section A.2. Consider a hard distribution for the optimal bidding problem from Theorem 5, and let be the bid vector that optimizes the expected utility of the bidder when a threshold vector is drawn from . Also, let be the corresponding optimal expected utility. (Theorem 5 says that approximating is NP-hard.) Now let us draw i.i.d. samples from . Clearly, if is large enough, then, with high probability, the expected utility of against the uniform distribution over is approximately equal to .

Now let us present the sequence to a no-regret learning algorithm. The learning algorithm is potentially randomized so let us call the expected average utility (over the randomness in the algorithm and keeping sequence fixed) that the algorithm achieves when facing the sequence of threshold vectors . If the regret of the algorithm is , this means that . In particular, if scales polynomially with then, for large enough , is lower bounded by (minus some small error), and hence by (minus some small error). Hence, (plus some small error) provides an upper bound to . Moreover, if we run our no-regret learning algorithm a large enough number of times against the same sequence of threshold vectors and average the average utility achieved by the algorithm in these executions, we can get a very good estimate of , and hence a very good upper bound for , with high probability. The paragraph “Upper Bound” in Appendix A.2 gives the details of this part.

The challenge that we need to overcome now is that, in principle, the expected average utility of our no-regret learner against sequence could be much larger than and hence and , as the algorithm is allowed to change its bid vector in every step. We need to argue that this cannot happen. In particular, we would like to upper bound by . We do this via a Martingale argument exploiting the randomness in the choice of the sequence . Using Azuma’s inequality, we show that for large enough , the is upper bounded by plus some small error with high probability. In fact we show something stronger: if is large enough then, with high probability, plus some small error upper bounds the algorithm’s average utility (not just average expected utility), where now both the threshold and the bid vectors are left random. Hence, we can argue that, with high probability, if we run our algorithm times over a (long enough) sequence of random threshold vectors and we compute the average (across the executions) of the average (across the steps) utility of our algorithm, then this double average is upper bounded by plus some small error. Hence, we get a lower bound on . (One execution would indeed suffice, but we need to argue about the average across executions given the way we obtain our upper bound in the previous paragraph.) The paragraph “Lower Bound” in Appendix A.2 gives the details of this part.

Overall, if we choose and large enough polynomials in the description of the hard instance of the optimal bidding problem from Theorem 5, then all approximation errors can be made arbitrary inverse polynomials, providing any desired (inverse polynomial) approximation to the optimal utility against distribution , with high probability. Since getting an inverse polynomial approximation is an -hard problem, this implies that there cannot exist a polynomial-time no-regret learning algorithm with polynomial regret rate, unless .      

4 Walrasian Equilibria and No-Envy Learning in Auctions

The hardness of no-regret learning in simultaneous auctions motivates the investigation of other notions of learning that have rational foundations and at the same time admit efficient implementations. Our inspiration in this paper comes from the study of markets and the well-studied notion of Walrasian equilibrium. Recall that an allocation of items to buyers together with a price on each item constitutes a Walrasian equilibrium if no buyer envies some other allocation at the current prices. That is the bundle allocated to each buyer maximizes the difference of his value for the bundle minus the cost of the bundle. Implicitly the Walrasian equilibrium postulates some degree of rationality on the buyers: given the prices of the items, each buyer wants a bundle of items such that he has no-envy against getting any other bundle at the current prices.

We adapt this no-envy requirement to SiSPAs (and other mechanisms in Section 7). In a SiSPA a player is facing a set of prices on the items, which are determined by the bids of the other players and are hence unknown to him when he is choosing his bid vector. In a sequence of repeated executions of a SiSPA, the player needs to choose a bid vector at every time-step. The fact that he does not know the realizations of the item prices when making his choice turns the problem into a learning problem. We will say that the sequence of actions that he took satisfies the no-envy guarantee, if in the long run he does not regret not buying any fixed set at its average price.

Definition 4 (No-Envy Learning Algorithm).

An algorithm for the online bidding problem is a no-envy algorithm if, for any adaptively chosen sequence of threshold vectors by an adversary, the bid vectors chosen by the algorithm satisfy:

(9)

where and . It has polynomial envy rate if .

To allow for even larger classes of settings to have efficiently computable no-envy learning outcomes, we will also define a relaxed notion of no-envy. In this notion the player is guaranteed that his utility is at least some -fraction of his value for any set , less the average price of that set. The latter is a more reasonable relaxation in the online learning setting given that, unlike in a market setting, the players do not know the realization of the prices when they make their decision.

Definition 5 (Approximate No-Envy Learning Algorithm).

An algorithm for the online bidding problem is an -approximate no-envy algorithm if, for any adaptively chosen sequence of threshold vectors by an adversary, the bid vectors chosen by the algorithm satisfy:

(10)

To gain some intuition about the difference between no-envy and no-regret learning guarantees consider the following. When we compute the utility from a fixed bid vector in hindsight, then in every iteration the set of items that the player would have won is nicely correlated with that round’s threshold vector in the sense that the player wins an item in that round only when the item’s threshold is low. On the contrary, when evaluating the player’s utility had he won a specific set of items in all rounds the player may win and pay for an item even when the price of the item is high. The results of this section imply that for XOS valuations, the no-regret condition is stronger than the no-envy condition. Hence, when we analyze no-envy learning algorithms for XOS bidders we relax the algorithm’s benchmark. Correspondingly, if the bidders of a SiSPA are XOS and use no-envy learning algorithms to update their bid vectors, the set of outcomes that they may converge to is broader than the set of no-regret outcomes. So, in comparison to no-regret learning outcomes, our positive results in this section pertain to a broader set of outcomes, endowing them with computational tractability and as we will see also approximate welfare optimality.

Roadmap.

In the rest of this section we reduce the no-envy learning problem to a related online learning problem, which we call the online buyer’s problem. We show that achieving no-envy in the online bidding problem can be reduced to achieving no-regret in the online buyer’s problem. Similarly, achieving -approximate no-envy can be reduced to some form of approximate no-regret. Lastly we show that no-envy learning implies good welfare: if all players in the simultaneous second-price auction game follow a no-envy learning algorithm then the average welfare of the selected allocations is approximately optimal. In subsequent sections we will provide efficiently computable no-envy or approximate no-envy algorithms for the online buyer’s problem. Finally, our positive results extend to the most commonly studied mechanisms through the smoothness framework, as we elaborate in Section 7.

4.1 Online Buyer’s Problem

We first show that we can reduce the no-envy learning problem to a related online learning problem, which we call the online buyer’s problem.

Definition 6 (Online buyer’s problem).

Imagine a buyer with some valuation over a set of items who is asked to request a subset of the items to buy each day before seeing their prices. In particular, at each time-step an adversary picks a set of thresholds/prices for each item adaptively based on the past actions of the buyer. Without observing the thresholds at step , the buyer picks a set of items to buy. His instantaneous reward is:

(11)

i.e., the buyer receives the set and pays the price for each item in the set.

For simplicity, we overload notation and denote by the reward in the online bidding problem from a bid vector and with the reward in the online buyer’s problem from a set . We relate the online buyer’s problem to the online bidding problem in SiSPAs in a black-box way, by showing that when the valuations are XOS, then any algorithm which achieves no-regret or “approximate” no-regret for the online buyer’s problem can be turned in a black-box and efficient manner into a no-envy algorithm for the online bidding problem, assuming access to an XOS oracle.

Lemma 6 (From buyer to bidder).

Suppose that we are given access to an efficient learning algorithm for the online buyer’s problem which guarantees for any adaptive adversary:

(12)

where . Then we can construct an efficient -approximate no-envy algorithm for the online bidding problem, assuming access to XOS value oracles. Moreover, this algorithm never submits an overbidding bid.

A trivial example: efficient no-envy for -capacitated XOS.

Consider a buyer with a -capacitated XOS valuation, i.e. the valuation is XOS and for any set : . If , then it suffices for the buyer to achieve no-regret against sets of size , which are . This is polynomial if . Thus we can simply invoke any off-the-shelf no-regret learning algorithm, such as multiplicative weight updates [ACBFS95], where each set of is treated as an expert, and apply it to the online buyer’s problem. This would be efficiently computable and would lead to a regret rate of . By Lemma 6, we then get an efficiently computable exact no-envy algorithm with the same envy rate.

The challenge addressed by our paper is to remove the bound on , which we address in the next sections.

4.2 No-Envy Implies Approximately Optimal Welfare

We conclude by showing that if all players in a SiSPA use an -approximate no-envy learning algorithm, then the average welfare is a -approximation to the optimal welfare, less an additive error term stemming from the envy of the players. In other words the price of anarchy of -approximate no-envy dynamics is upper bounded by .

Theorem 7.

If players participating in repeated executions of a SiSPA use an -approximate no-envy learning algorithm with envy rate and which does not overbid, then in executions of the SiSPA the average bidder welfare is at least , where Opt is the optimal welfare for the input valuation profile .

5 Online Learning with Oracles

In this section we devise novel follow-the-perturbed leader style algorithms for general online learning problems. We then apply these algorithms and their analysis to get no-envy learning algorithms (Section 5.1) for the online bidding problem. In Section F we instantiate our analysis to learning problems where the adversary can only pick one among finitely many parameters and give implications of this setting to no-regret learning algorithms (Section F) for the online bidding problem, with a finite number of possible thereshold vectors. In Section F.2, we also give implications to security games [BBHP15].

Consider an online learning problem where at each time-step an adversary picks a parameter and the algorithm picks an action . The algorithm receives a reward: , which could be positive or negative. We will assume that the rewards are uniformly bounded by some function of the parameter , for any action , i.e.: . We will denote with a sequence of parameters . Moreover, we denote with: , the cumulative utility of a fixed action for a sequence of choices of the adversary.

Definition 7 (Optimization oracle).

We will consider the case where we are given oracle access to the following optimization problem: given a sequence of parameters compute some optimal action for this sequence:

(13)

We define a new type of perturbed leader algorithms where the perturbation is introduced in the form of extra samples of parameters:

Algorithm 1 (Follow the perturbed leader with sample perturbations).

At each time-step :

  1. Draw a random sequence of parameters independently and based on some time-independent distribution over sequences. Both the length of the sequence and the parameter at each iteration of the sequence can be random.

  2. Denote with the augmented sequence of parameters where we append the extra parameter samples at the beginning of sequence

  3. Invoke oracle and play action:

    (14)

Using a reduction of [HP05] (see their Lemma 12) we can show that to bound the regret of Algorithm 1 against adaptive adversaries it suffices to bound the regret against oblivious adversaries (who pick the sequence non-adaptively), of the following algorithm, which only draws the samples once ahead of time (see Appendix C.1). In subsequent sections, we analyze this algorithm and setting.

Algorithm 2 (Follow the perturbed leader with fixed sample perturbations).

Draw a random sequence of parameters based on some distribution over sequences and at the beginning of time. At each time-step , invoke oracle and play action: .

Perturbed Leader Regret Analysis.

We give a general theorem on the regret of a perturbed leader algorithm with sample perturbations. In the sections that follow we will give instances of this analysis in two online learning settings related to no-envy and no-regret dynamics in our bidding problem and provide concrete regret bounds.

Theorem 8.

Suppose that the distribution over sample sequences , satisfies the stability property that for any sequence of parameters and for any :

(15)

Then the expected regret of Algorithm 2 against oblivious adversaries is upper bounded by:

(16)

Hence, the regret of Algorithm 1 against adaptive adversaries is bounded by the same amount.

5.1 Efficient No-Envy Learning with Demand Oracles

We will apply the perturbed leader approach to the online buyer’s problem we defined in Section 4.1. Then using Lemma 6 we can turn any such algorithm to a no-envy learning algorithm for the original bidding problem in second price auctions, when the valuations fall into the XOS class.

In the online buyer’s problem the action space is the collection of sets , while the parameter set of the adversary is to pick a threshold for each item , i.e. . The reward , at each round from picking a set , if the adversary picks a vector is given by Equation (11). We will instantiate Algorithm 2 for this problem and apply the generic approach of the previous section. We will specify the exact distribution over sample sequences that we will use and we will bound the functions , and . First, observe that the reward is bounded by a function of the threshold vector: , where is an upper bound on the valuation function, i.e. .

Optimization oracle.

It is easy to see that the offline problem for a sequence of parameters is exactly a demand oracle, where the price on each item is its average threshold in hindsight.

Single-sample exponential perturbation.

We will use the following sample perturbation: we will only add one sample , where the coordinate of the sample is distributed independently and according to an exponential distribution with parameter , i.e. for any the density of at is , while it is for .

The most important part of the analysis is proving a stability bound for our algorithm. We provide such a proof in Appendix D.1. Given the stability bound we then apply Theorem 8 to get a bound for Algorithm 2 with a single sample exponential perturbation.

Theorem 9.

Algorithm 2 when applied to the online buyers problem with a single-sample exponential perturbation with parameter , where is the maximum threshold that the adversary can pick and is the maximum value, runs in randomized polynomial time, assuming a demand oracle and achieves regret:

Theorem 9, Lemma 6 and the reduction form oblivious to adaptive adversaries, imply a polynomial time no-envy algorithm for the online bidding problem assuming access to demand and XOS oracles. If valuations are submodular, then XOS oracles can be simulated in polynomial time via demand oracles [DNS10], thereby only requiring access to demand oracles. Thus we get Theorem 2.

6 Efficient No-Envy Learning via Convex Rounding

In this section we show how to design efficient approximate no-envy learning algorithms via the use of the convex rounding technique, which has been used in approximation algorithms and in truthful mechanism design, and via online convex optimization applied to an appropriately defined online learning problem in a relaxed convex space. Though our techniques can be phrased more generally, throughout the section we will mostly cope with the concrete case where the valuation of the player is an explicitly given coverage valuation. These valuations have been well-studied in combinatorial auctions [DRY11] and are a subset of submodular valuations. Answering value and XOS queries for such valuations can be done in polynomial time [DRY11, DNS10].

Definition 8 (Coverage valuation).

A coverage valuation is given via the means of a vertex-weighted hyper-graph . Each item corresponds to a hyper-edge. Each vertex has a weight . The value of the player for a set is the sum of the vertices of the hyper-graph, that is contained in the union of the hyper-edges corresponding to the items in .

Proving Theorem 3.

Based on Lemma 6, in order to design an -approximate no-envy algorithm for the online bidding problem, it suffices to design an efficient algorithm for the online buyer’s problem with guarantees as described in Lemma 6. In the remainder of the section we will design such an algorithm for the online buyer’s problem with and for explicit coverage valuations, thereby proving Theorem 3. Subsequently, by Theorem 7 the latter will imply a price of anarchy guarantee of for such dynamics. The only missing piece in the proof of Theorem 3 is the following lemma, whose full proof we defer to Appendix E.

Lemma 10.

If the bidder’s valuation is an explicitly given coverage valuation, there exists a polynomial-time computable learning algorithm for the online buyer’s problem that guarantees that for any adaptively chosen sequence of thresholds with :

(17)

Proof sketch. Suppose that the buyer picks a set at each iteration at random from a distribution where each item is included independently with probability to the set. Then for any vector , the expected utility of the buyer from such a choice is , where is the multi-linear extension of and is the inner product between vectors and . If was concave we could invoke online convex optimization algorithms, such as the projected gradient descent of [Zin03] and get a regret bound, which would imply a regret bound for the buyers problem. However, is not concave for most valuation classes. We will instead use a convex rounding scheme, which is a mapping from any vector to a distribution over sets such that is a concave function of . We also require that the marginal probability of each item be at most the original probability of that item in . If the rounding scheme satisfies that for any integral associated with set , , then we can call an online convex optimization algorithm on the concave function . Then we show that this yields an -approximate no-envy algorithm for the online buyers problem.      

7 No-Envy Learning for General Mechanisms

In this section we generalize our approach to most smooth mechanisms [ST13] that have been analyzed in the literature. For ease of exposition we only focus on mechanisms for combinatorial auction settings, even though the approach could be employed for more general mechanism design settings.

A general mechanism for a combinatorial auction setting is defined via an action space available to each player , an allocation function, which maps each action profile to a feasible partition of the items among players, as well as a payment function, which maps an action profile to a vector of payments for for each player. We denote with and the allocation and payment of player . These functions could also output randomized allocations and payments, but for simplicity of notation we restrict to deterministic mechanisms.

First and foremost we need to generalize the definition of no-envy to general mechanisms other than simultaneous second price auctions. To achieve this we need to define the equivalent of a threshold vector for a general mechanism. We define the notion of a threshold-payment for a player and a set , which will coincide with the sum of thresholds - - for the case of a simultaneous second price auction.

Definition 9 (Threshold Payment).

Given a set and an action profile , the threshold payment for player for set is the minimum payment he needs to make to win set , i.e.:

(18)

The threshold function is additive if:

(19)

for some item specific functions , derived based on the auction rules.

The average threshold payment for a set , takes the role of the average price of the set, in a repeated learning environment. Thus we can analogously define a no-envy learning algorithm for any repeated mechanism setting, where mechanisms is repeated over time among the same players for iterations and at each iteration each player picks an action .

Definition 10 (No-Envy Learning for General Mechanisms).

An algorithm for a repeated mechanism setting is an -approximate no-envy algorithm if for any adaptively and adversarially chosen sequence of opponent actions :

(20)

where . It has polynomial envy rate if .

Sufficient conditions on the mechanism.

We now give conditions on the mechanism , such that it admits efficient no-envy learning dynamics and such that any approximate no-envy outcome is also approximately efficient. Our conditions can be viewed as a stronger version of the smooth mechanism definition of Syrgkanis and Tardos [ST13], as well as a generalization of the value and revenue covering formulation of Hartline et al. [HHT14].

We begin by reminding the reader of the definition of a smooth mechanism [ST13] specialized to a combinatorial auction setting.

Definition 11 ([St13]).

A mechanism is -smooth if for any action profile , there exists for each player an action for each player , such that:

(21)

where is the revenue of the auctioneer and Opt is the optimal welfare.

To apply our approach we will refine the smoothness definition and require a stronger “smoothness” property, albeit one that holds for almost all mechanisms that have been analyzed via the smooth mechanism framework. Our stronger smoothness version is more inline with the revenue and value covering framework of [HHT14] and can be thought of as an ex-post version of that framework. However, unlike the approach in [HHT14] our definition applies to general multi-dimensional mechanism design environments.

We will follow the terminology of [HHT14] of revenue and value covering. Our definition is stronger than the smooth mechanism definition in two ways. First it requires a deviation inequality for each individual player, rather than on aggregate across players. Moreover, it requires a smoothness inequality not only for the optimal allocation but rather we would require one for every possible allocation. In that respect it is closer to the solution-based smoothness of [LST16] and to the original definition of smooth games of [Rou09]. All of these strengthenings seem essential for our approach on designing no-envy dynamics to work.

Now we are ready to present the definitions of ex-post value and threshold covering, which are a stronger version of the smoothness definition.

Definition 12 (Ex-post -value covered).

A mechanism is ex-post -value covered if for any feasible allocation profile , there exists for each player an action such that for any action profile :

(22)
Definition 13 (Ex-post -threshold covered).

A mechanism is ex-post -threshold covered if for any action profile and allocation profile :

(23)

where .

It is easy to see that if a mechanism is -value covered and -threshold covered, then it is -smooth according to [ST13]. We add the extra welfare term, to enable the analysis of second-price auctions too. This term is related to the weakly -smooth mechanisms in [ST13].

No-envy learning and welfare.

Now we are ready to give the generalizations of our main theorems for general mechanisms. First we argue that if a mechanism is -threshold covered and players use no-envy learning, then the average welfare is approximately optimal. The proof of this theorem follows along very similar lines as in the proof of Theorem 7 and hence we omit the proof.

Theorem 11 (No-Envy Welfare for General Mechanisms).

If a mechanism is ex-post -threshold covered and each player invokes an -approximate no-envy algorithm with envy rate , then after iterations the average welfare in the auction is at least , where Opt is the optimal welfare for the input valuation profile .

Efficient no-envy algorithms.

Next we argue that if a mechanism is -value covered, then the existence of an efficient no-envy learning algorithm reduces to the existence of an efficient no-regret algorithm for the natural generalization of the online buyer’s problem.

Definition 14 (Online buyer’s problem for general mechanisms).

A buyer with some valuation over a set of items wants to decide on each day which items to buy. At each time-step an adversary picks an opponent action profile adaptively based on the past actions of the buyer. Without observing at step , the buyer picks a set to buy. His reward is:

(24)

i.e., the buyer receives the set and pays the threshold price for the set.

Lemma 12 (From buyer to bidder in general mechanisms).

Suppose that the mechanism is ex-post -value covered and that we are given access to an efficient learning algorithm for the online buyer’s problem which guarantees for any adaptive adversary:

(25)

Then we can construct an efficient -approximate no-envy algorithm for the online bidding problem, assuming access to XOS value oracles.

The proof of the latter Lemma follows along very similar lines as in the proof of Lemma 6, hence we omit its proof.

Last it is easy to see that when the threshold functions are additive, then the online buyer’s problem for general mechanisms is exactly the same as the online buyer’s problem for the simultaneous second price auction mechanism. Thus our results in the main sections of the paper, provide an efficient algorithm for the online buyer’s problem with for XOS valuations assuming access to a demand and XOS oracle and with for coverage valuations assuming access to a value oracle.

Theorem 13.

Consider a bidder with an XOS valuation participating in -value covered mechanism with additive threshold functions. Assuming access to a demand and an XOS oracle for , there exists a polynomial-time algorithm for computing the bidder’s action at every time step such that after iterations the bidder’s average utility satisfies:

(26)

where is an upper bound on the threshold function for any item and is an upper bound on . The guarantee holds with no assumption about the behavior of competing bidders.

Theorem 14.

Consider a bidder with an explicitly given coverage valuation participating in -value covered mechanism with additive threshold functions. There exists a polynomial-time algorithm for computing the bidder’s action at every time step such that after iterations the bidder’s utility satisfies:

(27)

and are as in Theorem 13. There is no assumption about the behavior of the competing bidders,

Main result for general mechanisms.

Combining the aforementioned discussion and analysis we can draw the following main conclusion of this section.

Corollary 15.

When each bidder participating in a sequence of ex-post -value covered and -threshold covered mechanisms with additive threshold functions, has an XOS valuation (endowed with a demand and XOS oracle) or an explicitly given coverage valuation , there exists a polynomial-time computable learning algorithm such that, if each bidder employs this algorithm to compute his action at each step , then after rounds the average welfare is guaranteed to be at least:

If all bidders have XOS valuations with demand and XOS oracles the factor in front of OPT is .

We provide below two example applications of the latter theorems:

Application: Simultaneous Second Price Auctions.

Revisiting simultaneous second price auctions it is easy to see that the mechanism is -value covered and -threshold covered when players actions are restricted to no-overbidding actions and valuations are XOS. The value covering follows from the fact that for any set , if we use as action , the bid vector that corresponds to the additive valuation returned by the XOS oracle for set (see proof of Lemma 6). As is shown in the proof of Lemma 6, this action satisfies that for any opponents action vector:

(28)

which is exactly the -value covering inequality. The -threshold covering inequality follows from the fact that for any feasible allocation , since players do not overbid:

(29)

Thus we can apply the general theorems of this section with , and to recover the main theorems that we derived for SiSPAs in the previous sections.

Application: Simultaneous First Price Auctions.

In a simultaneous first price auction at each item the bidder pays his own bid conditional on winning, rather than the second highest bid. Based on the proof of [ST13], that the simultaneous first price auction is -smooth, it is easy to see that the mechanisms is actually -value covered and -threshold covered. Thus we can apply the latter theorems with , and .

Application: Simultaneous All-Pay Auctions.

In a simultaneous all-pay auction at each item the bidder pays his bid no matter whether he wins or not. Based on the proof of [ST13], that the simultaneous first price auction is -smooth, it is easy to see that the mechanisms is actually -value covered and -threshold covered. Thus we can apply the latter theorems with , and .

Beyond additive threshold functions and combinatorial auctions.

Finding efficient algorithms for the online buyer’s problem with general threshold functions is an interesting open problem that we defer to future work. Such an extension will enable an application of the approach presented in this paper to other value and threshold covered mechanisms, such as the greedy combinatorial auctions of [LB10]. Moreover, extending the algorithms for the online buyer’s problem to valuations defined on mechanism design settings that are more general than combinatorial auction settings, such as the lattice valuations defined in [ST13], will also enable the generalization of our approach to compositions of more general mechanisms, such as position auctions. Both of these generalization seem fruitful future directions. Our approach here shows that the problem of efficient learning algorithms that retain welfare guarantees reduces to finding efficient learning algorithms for the online buyer’s problem.

8 Further Related Work

Closer to our intractability results is the work of Cai and Papadimitriou [CP14], who show intractability of computing Bayesian-Nash equilibrium, as well as certain notions of Bayesian no-regret learning, in SiSPAs. In the Bayesian model each player’s valuation is not fixed, but drawn from some distribution independently. They show that both computing best responses and a Bayes-Nash equilibrium in such a setting are PP-hard. They also show that Bayesian coarse correlated equilibria are NP-hard, and hence a certain type of Bayesian no-regret learning (namely when bidders re-sample their type in every round) is intractable. There are two important differences of their hardness results compared to ours:

  • First, the hardness of best response in their setting is driven by the fact that the opponent bids implicitly define a distribution of exponential support. In contrast, our inapproximability of best response is shown for an explicitly given opponent bid distribution.

  • Theirs is a setting where Bayesian coarse correlated equilbria are already hard, implying in particular that no-regret learning (with resampling of types in every round) is intractable. In contrast, in our setting [CKS08] has provided a centralized polynomial-time algorithm for computing a pure Nash equilibrium in complete information SiSPAs with submodular bidders. Moreover, for some special cases of combinatorial auctions with submodular bidders, [DFK15], show that computing an equilibrium with good welfare is as easy as the algorithmic problem, ignoring incentives. The centralized nature of the algorithms in these papers and the complete information assumption make this result inherently different from the setting that we want to analyze, which is the agnostic setting where players don’t know anything about the game and behave in a decentralized manner. In particular, in our setting, the intractability comes from the distributed nature of the computation and the incomplete, non-Bayesian, information that the bidders have.

There is a large body of work on price of anarchy in auctions, in the incomplete information Bayesian/non-Bayesian setting and under no-regret learning behavior. We cannot do justice to the vast literature but here are some example papers: [CKS08, BR11, HKMN11, FFGL13, MT12, dKMST13, LB10]. The price of anarchy of no-regret learning outcomes was first analyzed by [BHLR08] in the context of routing games and was generalized to many games in [Rou09] and to many mechanisms in [ST13], via the notion of smoothness. There is a strong connection between the smoothness framework and no-envy dynamics. In particular, the no-envy guarantee directly implies the lower bounds on the bidder’s utility, which needed for the smoothness proof to go through. This is the main reason why no-envy implies price of anarchy guarantees.

Another major stream of work in algorithmic mechanism design addresses the design of computationally efficient dominant strategy truthful mechanisms [Dob11, DV12, DL13, DRY11, DV15, Dob16]. For instance, [Dob11] shows that with only value queries, no distribution over deterministc truthful mechanisms can achieve better than polynomial approximations for submodular bidders. With demand queries [DL13] shows that no truthful in expectation mechanism can achieve better than -approximation. For coverage valuations [DRY11] gives a -approximation, truthful in expectation randomized mechanism. For submodular bidders with demand queries the best truthful mechanism was recently given by [Dob16] achieving -approximation. In contrast, our result shows that for no-envy XOS bidders with demand oracles, simultaneous item auctions achieve constant factor approximations.

Moreover, several papers address only the algorithmic problem of welfare maximization in combinatorial auctions with complement-free valuations. For instance, [Fei06] provides a 2-approximation for combinatorial auctions with sub-additive bidders and a -approximation for XOS bidders, with access to demand oracles, improving upon prior work of [DS06] which also required XOS oracles. Our work can also be viewed as providing a simple and distributed algorithm for welfare maximization with XOS bidders, with a -approximation guarantee: simply run our no-envy algorithms in a simultaneous first price auction game and then pick the best solution after a sufficient number of iterations.

There is a large body of work on online learning and online convex optimization to which we cannot possibly do justice. We refer the reader to two recent surveys [BCB12, SS12]. There is also a large body of work on online linear optimization where the number of experts is exponentially large, but the utility is linear in some low dimensional space. This setting was initiated by [KV05] and spurred a long line of work. We refer the reader to the relevant section of [BCB12]. Our results on perturbed leader algorithms generalize these results beyond the linear setting and we have provided some example applications beyond SiSPAs in Sections F and F.2.

Our work is also related to the recent work of [HK15] on the power of best-response oracles in online learning. This paper gives query complexity lower bounds for the general online learning problem. In contrast, our approach defines sufficient conditions (the stability) under which best-response oracles are sufficient for efficient learning and hence optimization is equivalent to online learning. Therefore, we provide a positive counterpart to these negative results.

References

  • [ACBFS95] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS ’95, pages 322–, Washington, DC, USA, 1995. IEEE Computer Society.
  • [AM06] Lawrence M Ausubel and Paul Milgrom. The lovely but lonely vickrey auction. Combinatorial auctions, 17:22–26, 2006.
  • [BBHP15] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, pages 61–78, New York, NY, USA, 2015. ACM.
  • [BCB12] Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Machine Learning, 5(1):1–122, 2012.
  • [BHLR08] Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regret minimization and the price of total anarchy. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC ’08, pages 373–382, New York, NY, USA, 2008. ACM.
  • [Bik99] Sushil Bikhchandani. Auctions of heterogeneous objects. Games and Economic Behavior, 26(2):193 – 220, 1999.
  • [BMW16] Mark Braverman, Jieming Mao, and S. Matthew Weinberg. Interpolating between truthful and non-truthful mechanisms for combinatorial auctions. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, pages 1444–1457. SIAM, 2016.
  • [BR11] Kshipra Bhawalkar and Tim Roughgarden. Welfare guarantees for combinatorial auctions with item bidding. In Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’11, pages 700–709. SIAM, 2011.
  • [CKS08] George Christodoulou, Annamária Kovács, and Michael Schapira. Bayesian combinatorial auctions. In Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I, ICALP ’08, pages 820–832, Berlin, Heidelberg, 2008. Springer-Verlag.
  • [Cla71] EdwardH. Clarke. Multipart pricing of public goods. Public Choice, 11(1):17–33, 1971.
  • [CP14] Yang Cai and Christos Papadimitriou. Simultaneous bayesian auctions and computational complexity. In Proceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, pages 895–910, New York, NY, USA, 2014. ACM.
  • [DFK15] Shahar Dobzinski, Hu Fu, and Robert Kleinberg. On the complexity of computing an equilibrium in combinatorial auctions. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15, pages 110–122. SIAM, 2015.
  • [dKMST13] Bart de Keijzer, Evangelos Markakis, Guido Schäfer, and Orestis Telelis. Inefficiency of standard multi-unit auctions. In HansL. Bodlaender and GiuseppeF. Italiano, editors, Algorithms – ESA 2013, volume 8125 of Lecture Notes in Computer Science, pages 385–396. Springer Berlin Heidelberg, 2013.
  • [DL13] Shahar Dobzinski and Renato Paes Leme. Efficiency guarantees in auctions with budgets. CoRR, abs/1304.7048, 2013.
  • [DMSW15] Nikhil Devanur, Jamie Morgenstern, Vasilis Syrgkanis, and S. Matthew Weinberg. Simple auctions with simple strategies. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, pages 305–322, New York, NY, USA, 2015. ACM.
  • [DNS10] Shahar Dobzinski, Noam Nisan, and Michael Schapira. Approximation algorithms for combinatorial auctions with complement-free bidders. Math. Oper. Res., 35(1):1–13, February 2010.
  • [Dob11] Shahar Dobzinski. An impossibility result for truthful combinatorial auctions with submodular valuations. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, STOC ’11, pages 139–148, New York, NY, USA, 2011. ACM.
  • [Dob16] Shahar Dobzinski. Breaking the logarithmic barrier for truthful combinatorial auctions with submodular bidders. STOC’16, 2016.
  • [DRY11] Shaddin Dughmi, Tim Roughgarden, and Qiqi Yan. From convex optimization to randomized mechanisms: Toward optimal combinatorial auctions. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, STOC ’11, pages 149–158, New York, NY, USA, 2011. ACM.
  • [DS06] Shahar Dobzinski and Michael Schapira. An improved approximation algorithm for combinatorial auctions with submodular bidders. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, pages 1064–1073, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics.
  • [DV12] Shahar Dobzinski and Jan Vondrak. The computational complexity of truthfulness in combinatorial auctions. In Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pages 405–422, New York, NY, USA, 2012. ACM.
  • [DV15] Shaddin Dughmi and Jan Vondrák. Limitations of randomized mechanisms for combinatorial auctions. Games and Economic Behavior, 92:370 – 400, 2015.
  • [Fei98] Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, July 1998.
  • [Fei06] Uriel Feige. On maximizing welfare when utility functions are subadditive. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, STOC ’06, pages 41–50, New York, NY, USA, 2006. ACM.
  • [FFGL13] Michal Feldman, Hu Fu, Nick Gravin, and Brendan Lucier. Simultaneous auctions are (almost) efficient. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC ’13, pages 201–210, New York, NY, USA, 2013. ACM.
  • [FGL15] Michal Feldman, Nick Gravin, and Brendan Lucier. Combinatorial auctions via posted prices. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15, pages 123–135. SIAM, 2015.
  • [FKL12] Hu Fu, Robert Kleinberg, and Ron Lavi. Conditional equilibrium outcomes via ascending price processes with applications to combinatorial auctions with item bidding. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 586–586. ACM, 2012.
  • [Gro73] Theodore Groves. Incentives in teams. Econometrica, 41(4):pp. 617–631, 1973.
  • [Haz06] Elad Hazan. Efficient Algorithms for Online Convex Optimization and Their Applications. PhD thesis, Princeton, NJ, USA, 2006. AAI3223851.
  • [HHT14] Jason Hartline, Darrell Hoy, and Sam Taggart. Price of anarchy for auction revenue. In Proceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, pages 693–710, New York, NY, USA, 2014. ACM.
  • [HK15] Elad Hazan and Tomer Koren. The computational power of optimization in online learning. CoRR, abs/1504.02089, 2015.
  • [HKMN11] A. Hassidim, Haim Kaplan, Yishay Mansour, and Noam Nisan. Non-price equilibria in markets of discrete goods. In Proceedings of the 12th ACM conference on Electronic commerce, EC ’11, pages 295–296, New York, NY, USA, 2011. ACM.
  • [HP05] Marcus Hutter and Jan Poland. Adaptive online prediction by following the perturbed leader. J. Mach. Learn. Res., 6:639–660, December 2005.
  • [KV05] Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291 – 307, 2005. Learning Theory 2003 Learning Theory 2003.
  • [LB10] B. Lucier and A. Borodin. Price of anarchy for greedy auctions. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pages 537–553, Philadelphia, PA, USA, 2010. Society for Industrial and Applied Mathematics.
  • [LCK16] Yuqian Li, Vincent Conitzer, and Dmytro Korzhyk. Catcher-evader games. CoRR, abs/1602.01896, 2016.
  • [LST16] Thodoris Lykouris, Vasilis Syrgkanis, and Éva Tardos. Learning and efficiency in games with dynamic population. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, pages 120–129. SIAM, 2016.
  • [MT12] Evangelos Markakis and Orestis Telelis. Uniform price auctions: Equilibria and efficiency. In Maria Serna, editor, Algorithmic Game Theory, Lecture Notes in Computer Science, pages 227–238. Springer Berlin Heidelberg, 2012.
  • [Rou09] T. Roughgarden. Intrinsic robustness of the price of anarchy. In Proceedings of the 41st annual ACM symposium on Theory of computing, STOC ’09, pages 513–522, New York, NY, USA, 2009. ACM.
  • [SS12] Shai Shalev-Shwartz. Online learning and online convex optimization. Found. Trends Mach. Learn., 4(2):107–194, February 2012.
  • [ST13] Vasilis Syrgkanis and Éva Tardos. Composable and efficient mechanisms. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC ’13, pages 211–220, New York, NY, USA, 2013. ACM.
  • [Tre01] Luca Trevisan. Non-approximability results for optimization problems on bounded degree instances. In Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing, STOC ’01, pages 453–461, New York, NY, USA, 2001. ACM.
  • [Vic61] William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1):8–37, 1961.
  • [Zin03] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages 928–936, 2003.

Appendix A Omitted Proofs from Section 3

a.1 Proof of Theorem 5


Theorem 5 The optimal bidding problem is -hard to approximate to within an additive even when: the threshold vectors in the support of (the explicitly described distribution) take values in , where , and .

Proof.

We break the proof in two Lemmas. In the first we show -hardness of the exact problem and then we show hardness of the additive approximation problem.

Lemma 16 (Hardness of Optimal Bidding).

The optimal bidding problem is -hard even if the threshold vectors in the support of take values in , where and .

Proof.

Before we move to the reduction we introduce some notation that is useful for the special case when thresholds are in . First each threshold vector can be uniquely represented by a set , which corresponds to the items on which the threshold is . Hence, the bid distribution can be therefore described by a collection of sets , such that each set arises with probability .

Moreover, observe that in the optimization problem we might as well only consider strategies where the player a bid vector in . Bidding any bid in is equivalent to bidding . Bidding anything in is equivalent to bidding . Moreover, bidding in is dominated by bidding . The reason is that bidding increases your probability of winning only in the cases when the threshold is . But in those cases your utility is negative since . Thus it is always optimal to remove those winning cases.

Thus any bidding strategy is also uniquely characterized by a set , which is the set of items on which the player bids . If a bidder chooses a set , then he loses all items in only if a set arises, such that , since then all items in have a threshold of . Thus the probability that he wins some item is equal to:

(30)

Moreover, he pays only for the items for which he bids