The Sample Complexity of Upto$\varepsilon$ MultiDimensional Revenue Maximization
Abstract
We consider the sample complexity of revenue maximization for multiple bidders in unrestricted multidimensional settings. Specifically, we study the standard model of additive bidders whose values for heterogeneous items are drawn independently. For any such instance and any , we show that it is possible to learn an Bayesian Incentive Compatible auction whose expected revenue is within of the optimal BIC auction from only polynomially many samples.
Our approach is based on ideas that hold quite generally, and completely sidestep the difficulty of characterizing optimal (or nearoptimal) auctions for these settings. Therefore, our results easily extend to general multidimensional settings, including valuations that aren’t necessarily even subadditive, and arbitrary allocation constraints. For the cases of a single bidder and many goods, or a single parameter (good) and many bidders, our analysis yields exact incentive compatibility (and for the latter also computational efficiency). Although the singleparameter case is already wellunderstood, our corollary for this case extends slightly the stateoftheart.
1 Introduction
A fundamental question at the heart of the literature on mechanism design is that of revenue maximization by a single seller who is offering for sale any number of goods to any number of (potential) bidders. In the classic economic literature, this problem is studied in a Bayesian setting: the seller has prior knowledge of (often, independent) distributions from which the valuation of each bidder for each good is drawn, and wishes to devise a truthful mechanism that maximizes her revenue in expectation over these prior distributions. Over the past few years, numerous works at the interface of economics and computation are now studying a more demanding model: that of mechanism design from samples. In this model, rather than possessing complete knowledge of the distributions from which the bidders’ values for the various items are drawn, the seller more realistically only has access to samples from these distributions (e.g., past data). The goal in this setting is to learn with high probability an auction with good revenue guarantees given polynomially many (in the parameters of the problem) samples.
Revenue maximization from samples is somewhat ubiquitously seen as a “next step” beyond Bayesian revenue maximization. That is, existing works so far in this context take settings for which simple auctions in the related Bayesian problem are already wellunderstood and prove that these simple auctions can be learned efficiently via samples (up to an loss, which will always be lost when optimizing from samples). For example: in singleparameter settings, seminal work of Myerson (1981) completely characterizes a simple and optimal auction in the Bayesian setting, and works such as Cole and Roughgarden (2014); Morgenstern and Roughgarden (2015); Devanur et al. (2016); Hartline and Taggart (2016); Roughgarden and Schrijvers (2016); Gonczarowski and Nisan (2017) prove that these simple mechanisms or variants thereof can be learned with polynomially many samples. Similarly, in multiparameter settings with independent items, works of Chawla et al. (2007, 2010, 2015); Hart and Nisan (2012); Babaioff et al. (2014); Rubinstein and Weinberg (2015); Yao (2015); Cai et al. (2016); Chawla and Miller (2016); Cai and Zhao (2017) prove that simple mechanisms achieve constantfactor approximations in rich multidimensional settings, and works of Morgenstern and Roughgarden (2016); Balcan et al. (2016, 2018); Cai and Daskalakis (2017); Syrgkanis (2017) prove that simple mechanisms with these guarantees can be learned with polynomially many samples. These analyses rely on a delicate understanding of the structure and/or inherent dimensionality of auctions that give such revenue guarantees to show how to learn such an auction without overfitting the samples.
It is therefore unsurprising that the problem of learning an upto revenuemaximizing multiitem auction from samples has not been previously studied, since the structure/dimensionality of optimal (precisely or upto) multiitem auctions is not understood even when there is only one bidder, and even with independent items. Such auctions are known to be extremely complex, suffering from properties such as randomization (Thanassoulis, 2004), uncountable menu complexity (Daskalakis et al., 2013), and nonmonotonicity (Hart and Reny, 2015). Such domains provably lack the natural starting point of all previous works: a structured/lowdimensional mechanism in the Bayesian setting to learn via samples.
In this paper we show that despite these challenges, upto optimal multiitem auctions can be learned from polynomially many samples from the underlying bidderitem distributions. More formally, in a setting with bidders and items where the value of each bidder for each item is drawn independently from a distribution supported on for some that is known to the seller, we show that polynomially many samples suffice for learning, with probability at least , an item almosttruthful auction that maximizes the expected revenue among all possible item almosttruthful auctions up to an additive . Below, BIC refers to Bayesian Incentive Compatible: an auction for which it is in every bidders’ interest to bid truthfully, given that all other bidders do so as well.
Theorem 1 (Main Result — informal version of Theorem 4).
For bidders with independent values for items supported on , for every and for every , the sample complexity of learning, w.p. , an BIC auction that maximizes revenue (among all BIC auctions) up to an additive is .
The above Theorem is informal mostly because we have not specified exactly how bidders value bundles of items. Essentially the bidders may have arbitrary (i.e., not necessarily additive, not even necessarily subadditive) valuations subject to some Lipschitz condition (i.e., changing the value of bidder for item by only changes the bidder’s value for any outcome by at most for some absolute constant ).
The main challenge in proving our result for items is noted above: the structure of (upto) optimal mechanisms for such settings is not understood, even for additive valuations. In particular, there is no known lowdimensional class of mechanisms that is guaranteed to contain an (upto) optimal mechanism for any product distribution, thus barring the use of many learningtheoretic arguments. Our result relies on a succinct structured argument, allowing to reduce revenue maximization from samples to related problems of revenue maximization from given discrete distributions.
As the corresponding Bayesian question remains open (i.e., whether one can find, given the distributions explicitly, an upto optimal mechanism in polytime), our result is of course informationtheoretic: it shows that polynomially many samples suffice for a computationally unbounded seller, but provides no computationally efficient learning algorithm. Concretely, the algorithm that we give uses as a black box an oracle that can perform (optimal or almostoptimal) multiitem Bayesian revenue maximization given (the full description of) finite prior distributions.
1.1 Brief Overview of Techniques
Most prior works (for single as well as multidimensional settings) take the following approach: first, define a class of auctions as a function of . Second, prove that, for all possible distributions , the class contains an upto optimal mechanism for . Finally, prove that the bestinclass (up to ) for can be learned with polynomially many samples. In prior works, ingenuity is required for both steps: is explicitly defined, proved to contain upto optimal auctions, and proved to have some lowdimensional structure allowing efficient learnability.
Our approach indeed follows this rough outline, with two notable simplifying exceptions. First is our approach to defining . Here, we first define be the space of all auctions that are optimal for an empirical distribution over many rounded samples (that is, optimal for any discrete product distribution where each marginal is: a) only supported on multiples of and b) uniform over a multiset of size ). While, unlike popular existing approaches, the set grows with the number of samples , we show that the rate of its growth is moderate enough so that there exists a “sweetspot” number of samples such that on the one hand contains an auction that is upto optimal for the “true distribution” and on the other hand, the bestinclass from can be learned from samples. So in the language of prior work, one could say that we set for this .
To show that does in fact contain, for all distributions , an auction that is upto optimal for , we simply take enough samples to guarantee uniform convergence (of the revenue) over and additionally the optimal auction for . It’s far from obvious why this should suffice, as the optimal auction for is not an element of , nor even of the same format.
Second, our argument that the bestinclass can be in fact learned (up to ) with samples is simply a counting argument, and does not require any notions of a learning dimension. This is indeed in the spirit of some recent singledimensional results, however in those results the counting argument is highly dependent on the structure of auctions in . As discussed above, such dependence is damning for multidimensional settings where such structure provably doesn’t exist. Again, the proof does require some hammers (notably, arguments originally developed for reduced forms via samples in Cai et al. (2012), and a concentration inequality of Babichenko et al. (2017); Devanur et al. (2016)), but they are applied in a fairly transparent manner.
The above approach should help explain how we are able to extend far beyond prior works, which relied on a detailed analysis of specific structured mechanisms: The key tools we use are applicable quite generally, whereas the specific mechanisms analyzed in prior work are only known to maintain guarantees only in restricted settings. For example, Theorem 1 already constitutes the first upto optimalmechanism learning result for any multiparameter setting even if it held only for additive valuations (and one bidder). But the approach is so general that extending it to arbitrary Lipschitz valuations with independent items is simply a matter of updating notation.
1.2 Applications and Extensions
Specialized to a singlebidder setting, our construction in fact yields exact truthfulness (more on that in Section 6), showing that an optimal mechanism can be found for a single bidder with independent item values (with Lipschitz valuations) using only polynomially many samples. This should be contrasted with a result of Dughmi et al. (2014), which shows that achieving this is not possible for correlated distributions, even for a buyer with additive valuations.
Corollary 1 (Single Bidder — informal version of Theorem 5).
For one bidder with independent values for items supported on , for every , the sample complexity of learning, w.p. , an IC auction that maximizes revenue (among all IC auctions) up to an additive is .
Specialized to singledimensional settings, our analysis once again yields a strengthened result, both in giving exact Dominant Strategy Incentive Compatibility (DSIC)
Corollary 2 (SingleParameter — informal version of Theorem 6).
For singleparameter bidders with independent values in , for every , the sample complexity of efficiently learning, w.p. , a DSIC auction that maximizes revenue (among all DSIC auctions) up to an additive is .
Corollary 2 nicely complements the existing literature on singleparameter sample complexity in the following ways. First, our algorithm/analysis immediately follows as a special case of Theorem 1 (without referencing structural results about optimal singleparameter auctions), so it is in some sense more principled. Second, our analysis holds even for arbitrary constraints on the allocations (putting it in the same class as the stateoftheart
Finally, portions of our approach are specific to Bayesian Incentive Compatible auctions (versus Dominant Strategy Incentive Compatible auctions), but portions are not. We’re therefore able to use the same techniques to conclude similar, albeit qualitatively weaker, results for DSIC auctions in Theorem 11. See Appendix A for further details.
1.3 Related Work and Brief Discussion
Two active lines of work are directly related to the present paper. First are papers that study rich multidimensional settings, and aim to show that mechanisms with good approximation guarantees can be learned with few samples, such as Morgenstern and Roughgarden (2016); Balcan et al. (2016, 2018); Cai and Daskalakis (2017); Syrgkanis (2017). The main approach in each of these works is to show that specific classes of structured mechanisms (e.g., classes that are known to allow for constantfactor revenue maximization) are inherently lowdimensional with respect to some notion of dimensionality. Our results are stronger than these in some regards and weaker in others. More specifically, our results are stronger in the sense that with comparably many samples, our mechanisms guarantee an upto approximation to the optimal mechanism instead of a constantfactor. Our results are weaker in the sense that our learning algorithms are informationtheoretic (do not run in polytime), and our mechanisms are not “as simple.” As discussed earlier, both weaknesses are necessary in order to possibly surpass the constantfactor barrier (at least, barring the resolution of major open questions, such as a computationally efficient upto approximation even when all distributions are explicitly known. Again, note that should this question be resolved affirmatively, our results would immediately become computationally efficient as well).
Most related to our work, at least in terms of techniques, is the rich line of works on singledimensional settings (Dhangwatnotai et al., 2015; Cole and Roughgarden, 2014; Huang et al., 2015; Morgenstern and Roughgarden, 2015; Devanur et al., 2016; Hartline and Taggart, 2016; Roughgarden and Schrijvers, 2016; Gonczarowski and Nisan, 2017). These works show that upto optimal mechanisms can be learned in richer and richer settings. In comparison to these works, our singledimensional results slightly extend the stateoftheart (Hartline and Taggart, 2016; Gonczarowski and Nisan, 2017) as a corollary of a more general theorem that applies to multidimensional settings. Even restricted to singledimensional settings, our proof is perhaps more transparent.
We conclude with a brief discussion and an open problem. Corollaries 2 and 1 are both deduced from Theorem 1 by use of an argument as to why the resulting BIC auction is in fact BIC, or by using an BIC to BIC reduction that loses negligible revenue. Given that we have explicitly referenced the existence of a quite general BICtoBIC reduction, the reader may be wondering why this reduction does not in fact allow our general results to be exactly BIC as well.
The main barrier is the following: in order to actually run the BICtoBIC reduction as part of our auction for bidders, one must take samples exponential in the number of items from each bidders’ value distribution. This means that even though we can learn an BIC mechanism with few samples, plugging it through the reduction to remove the would cost us exponentially many samples in addition. Note that our current use of these theorems is nonconstructive: we only use them to claim that the revenues achievable by the optimal BIC and BIC mechanisms are not far off. This conclusion does not actually require running the reduction, but rather simply observing that it could be run (more details in Section 5).
When bidder valuations are drawn from a product distribution, it seems conceivable (especially given our results), that sample complexity polynomial in the number of items should suffice. Indeed, if each bidders’ values are drawn i.i.d., this is known due to exploitations of symmetry (Daskalakis and Weinberg, 2012). But subexponential sample complexity is not known to suffice for any other restricted class of distributions, despite remarkable recent progress in developing connections to combinatorial Bernoulli factories (Dughmi et al., 2017). We state below what we consider to be the main open problem left by our work in the context of this paper, but readers familiar with blackbox reductions in Bayesian mechanism design will immediately recognize a corresponding open problem for the original welfaremaximization setting studied in Hartline et al. (2011); Bei and Huang (2011) that is equally enticing.
Open Problem 1.
Given an BIC auction for some product distribution, even in an additive multiitem setting, is it possible to transform it into a (precisely) BIC auction with negligible () revenue loss using polynomially many samples from this product distribution?
The remainder of this paper is structured as follows. In Section 2, we formally present the model and setting. In Section 3, we formally state our results, which are informally stated above as Corollaries 2, 1 and 1. In Section 4, we overview the main ideas behind the proof of Theorem 1, a proof that we give in full detail in Section 5. In Section 6, we derive Corollaries 2 and 1. We present some extensions in Section 7. In Appendix A, we state and prove a result analogous to Theorem 1 for DSIC auctions, using similar proof techniques. Parts of certain proofs are relegated to Appendix B.
2 Model and Preliminaries
The Decision Maker (Seller), Bidders, and Outcomes.
A single decision maker has the power to choose a social outcome, such as who gets which good that is for sale, or such as which pastime activities are offered in which of the weekends of the upcoming year. There are bidders who have stakes in this outcome. (The decision maker will be able to charge the bidders and will wish to maximize her revenue.) The possible set of allowed outcomes is denoted by and can be completely arbitrary. A central example is that of an parameter auction: the decision maker is a seller who has items for sale, and the set of outcomes/allocations is , where an allocation specifies for each bidder and good the amount of good that bidder wins. The traditional multiitem setting is the special case with , while outcomes with fractional coordinates occur for example in the canonical model of position auctions, where smaller coordinates denote smaller clickthrough rates.
Values.
Bidder has a valuation function over the set of possible outcomes . This function is parametrized by values (we will not explicitly write , but refer to the parameters implicitly for ease of notation. Moreover, as is completely determined by , we will sometimes simply refer to as bidder ’s value, and to as bidder ’s value for parameter ) and drawn from a given distribution such that:

(Independent items) The s are independent random variables, drawn from distributions which are all supported in .

(Lipschitz) There exists an absolute constant , such that if is obtained from by modifying one of the s by at most an additive , then for all .
For example, in the multiitem setting described above, (and ).
We note that both properties above (independent items and Lipschitz) together imply that the valuation of each bidder for each outcome is bounded in .
Payments, Priced Outcomes, and Mechanisms.
A payment specification specifies for each bidder to be charged . A priced outcome is a pair of allocation and payment specification. The utility of bidder with value from priced outcome is . An auction/mechanism is a function that maps each valuation profile to a distribution over priced outcomes. The seller’s expected revenue from a mechanism is , where is the payment specification chosen by the mechanism for the valuation profile .
Truthfulness.
An auction is individually rational (IR) if the expected utility of a truthful bidder is nonnegative at any valuation profile, i.e.: where the expectation is over the randomness of the auction. For , an auction is dominantstrategy incentive compatible (DSIC) if truthful bidding maximizes a bidder’s expected utility at any valuation profile up to an additive , i.e.: for every , , and , where the expectation is once again over the randomness of the auction. An auction is DSIC if it is DSIC. An auction is Bayesian incentive compatible (BIC) if truthful bidding maximizes a bidder’s utility in expectation over all valuations of the other bidders, up to an additive , i.e.: for every and , where the expectation is both over the valuations of the bidders other than and over the randomness of the auction. An auction is BIC if it is BIC.
Additional Notation.
We will use the following additional notation in our analysis, where :

For , we denote by the value of , rounded down to the nearest integer multiple of .

We use to denote the set of integer multiples of in .

For every , we denote by the distribution of for .
Existing Tools.
In our analysis, we will make use of the following two theorems, which we state below in a way that is adapted to the notation of our paper. The first shows the optimal revenue over all BIC auctions and the optimal revenue over all BIC auctions are close (while this is stated in Rubinstein and Weinberg (2015) with respect to multiparameter settings with allocations in , the same proof holds verbatim for arbitrary outcome sets ):
Theorem 2 (Rubinstein and Weinberg, 2015;^{9} see also Daskalakis and Weinberg, 2012).
Let be any joint distribution over arbitrary valuations, where the valuations of different bidders are independent. The maximum revenue attainable by any IR and BIC auction for a given product distribution is at most greater than the maximum revenue attainable by any IR and BIC auction for that distribution.
The second is a Chernoffstyle concentration inequality for product distributions:
3 Main Results
In this Section, we formally state our main results, which were informally presented as Corollaries 2, 1 and 1 in the introduction. We start with our main result.
Theorem 4 (Main Result).
For every and for every , the sample complexity of learning an upto optimal IR and BIC auction is . That is, there exists a deterministic algorithm
The following corollary of our main result should be contrasted with a result of Dughmi et al. (2014), which shows that finding an optimal mechanism for a single additive bidder with correlated item distributions requires exponentially many samples.
Theorem 5 (Single Bidder).
When there is bidder, for every , the sample complexity of learning an upto optimal IR and IC
The following corollary of our main result unifies and even somewhat extends the stateoftheart results for singleparameter () revenue maximization. To state it we restrict ourselves to the setting where revenue maximization has been solved by Myerson (1981): assume that and that .
Theorem 6 (SingleParameter).
In an parameter setting with and , for every , the sample complexity of efficiently learning an upto IR and DSIC auction is . That is, there exists a deterministic algorithm with running time that given samples from each , with probability outputs an IR and DSIC auction that attains from expected revenue at most an additive smaller than any IR and BIC auction.
4 Main Result Proof Overview
In this Section we roughly sketch our learning algorithm and present each of the main ideas behind its analysis, by presenting a proof overview structured to present each of these ideas separately. The proof overview is given in this Section only for an additive multiitem setting, and some elements of the proof are omitted or glossed over for readability. The full proof, which contains all omitted details and applies to a general arbitrary Lipschitz setting, and in which the main ideas that are surveyed in this Section separately are quite intermingled, is given in Section 5.
Our learning algorithm is similar in nature to the one presented in Devanur et al. (2016) for certain singleparameter environments, however the analysis that we will use to show that it does not overfit the samples is completely different (even for singleparameter environments, where our analysis holds for arbitrary allocation constraints). Recall that our result is (necessarily) informationtheoretic and not computationally efficient. Therefore, some steps in the algorithm perform operations that are not known to be performable in polytime (but can certainly be performed without access to any ). In particular, our algorithm will solve an instance of a Bayesian revenue maximization problem for a precisely given input of finite support (step 2).
Algorithm.
We start with (to be determined later) independent samples from each . Our algorithm roughly proceeds as follows:

For each item and good , round all samples from down to the nearest multiple of . Denote the uniform distribution over these rounded samples by .

Find an IR and IC (see below) multiitem auction that maximizes the revenue from the product of the rounded empirical distributions . Denote this auction by .

Return the auction , which on input , rounds down all actual bids to the nearest multiple of , , and allocates and charges payments according to the output of when run on these rounded bids.
We start by showing that if in step 2 of our algorithm we take an IR and DSIC auction that maximizes the revenue from the product of the rounded empirical distributions, then there exists such that the auction output by our algorithm is DSIC and its revenue from is, with probability at least , uptoclose to the maximum revenue attainable from by any DSIC auction. (The formal statement and full proof are given in Appendix A.) We note that the auction output by the algorithm is indeed DSIC, since the output in step 2 is DSIC, and the rounding of the actual bids as defined in step 3 only loses another .
Uniform Convergence of the Revenue of all Possible Output Mechanisms.
Note that for every , each rounded sample from step 1 of the algorithm is independently distributed according to . The main challenge is in showing that the resulting auction gives upto optimal revenue not only on the rounded empirical distributions , but also on the rounded true distributions . That is, the main challenge is in showing that no overfitting occurs, in the absence of any structural properties that we can exploit for the mechanisms that are optimal (or upto optimal) for .
This is the point where our approach makes a sharp departure from prior works. Prior work deems this task to be hopeless, and proceeds by proving structural results on optimal mechanisms for restricted domains. We circumvent this by instead simply counting the number possible inputs we will ever query in step 2, and observing that the number of mechanisms over which we have to obtain uniform convergence is at most this number. A crucial observation is that while we do have to consider more and more mechanisms as the number of samples grows, the number of mechanisms that we have to consider grows moderately enough so as to not eclipse our gains from increasing the number of samples that we take. For this argument to hold, it is essential that our distributions are product distributions.
Let be the set of all product distributions where each is the uniform distribution over some multiset of values from . Let be the set of all mechanisms returned by step 2 of the algorithm for some distribution . At the heart of our analysis, and of this part of our analysis in particular, is the observation that . Crucially, this expression has only in the base and not in the exponent. Indeed, for every , for every , and for every integer multiple of in (there are many such values), the probability of this value in can be any of the values . Therefore, .
We will choose so that with probability at least , it simultaneously holds for all mechanisms that
(1) 
To this end, we will use a Chernoffstyle concentration bound (Theorem 3) for product distributions, which when applied to our setting shows that for each auction separately Equation 1 is violated with probability exponentially small in . So, to have Equation 1 hold with probability at most for all auctions in simultaneously, we choose so that the violation probability for each auction separately is at most , and use a union bound. Since , we have that it is enough to take such that is of order of magnitude at least , which is clearly possible by taking a suitable that is polynomial in , , , , and . So, taking a number of sample of this magnitude gives that with probability at least , Equation 1 simultaneously holds for all mechanisms in and so the mechanism output by step 2 of the algorithm gets upto the same revenue on the product of the rounded empirical distributions as it does on the product of the rounded true distributions . So, the revenue that the mechanism output by (step 3 of) the algorithm attains from is identical to the revenue that the mechanism output by step 2 of the algorithm attains from , which is upto the optimal revenue attainable from .
Revenue Close to Optimal.
Our next task is to show that with high probability the optimal revenue attainable from by any DSIC auction is upto the same as the optimal revenue attainable from by any DSIC auction, which would imply that the revenue that attains from is close to optimal, as required. Let be the DSIC auction that maximizes the revenue (among such auctions) in expectation over . At the heart of this part of our analysis is the fact that while our algorithm cannot hope to find , we can nonetheless carefully reason about it in our analysis, as it is nonetheless fixed and welldefined (in particular, it does not depend on the drawn samples). Let be the mechanism defined over as follows: for each bidder and item , let be the input bid of bidder for item (a multiple of ), and replace it by a bid independently drawn from the distribution conditioned upon being in the interval ; the auction allocates and charges payments according to the output of when run on these drawn replacement bids. Obviously, the auction is an DSIC auction whose revenue from is identical to that of the auction from , i.e., to the optimal revenue from , so it is enough to show that the revenue of the auction from and from is the same upto with high probability, that is, that Equation 1 also holds for the mechanism with high probability. To do so, we modify the definition of the set to also include the (welldefined even prior to sampling, despite being unknown to our algorithm) mechanism — since the order of magnitude of does not change, the order of magnitude of the number of samples required to guarantee that Equation 1 holds for all auctions in (including ) does not change.
Bayesian Incentive Compatibility.
We conclude our proof overview
by adapting the proof to the more delicate BIC notion of incentive compatibility, thus showing that if in step 2 of our algorithm we take an BIC (rather than DSIC) and IR auction that maximizes the revenue from the product of the rounded empirical distributions, then there exists such that the auction output by our algorithm is, with probability at least , an BIC auction whose revenue from is uptoclose to the maximum revenue attainable from by any BIC auction (and therefore, by Theorem 2, uptoclose to the revenue attainable from this distribution by any BIC auction)
(2) 
We note that for every mechanism , we require that Equation 2 hold for distinct combinations of of and . Crucially, this number does not depend on . So, the number of instances of Equation 2 that we would like to hold simultaneously with high probability is ,
and so we have instances of either Equation 1 or Equation 2 that we would like to hold simultaneously with high probability.
5 Proof of Main Result
In this Section, we give the full details of the proof of our main result, Theorem 4. The proofs of supporting Lemmas are relegated to Appendix B.
Proof of Theorem 4.
We assume that for every and , we have (to be determined later) independent samples from . Algorithm 1 presents our learning Algorithm, which is similar in nature to the one presented in Devanur et al. (2016) for certain singleparameter environments, however the analysis that we will use to show that it does not overfit the samples is completely different (even for singleparameter environments, where our analysis holds for arbitrary allocation constraints).
We now analyze Algorithm 1. Note that for every , we have that independently.
Let be the set of all product distributions where each is the uniform distribution over some multiset of values from . Let be the set of all mechanisms of the form OptimizationOracle() for all . At the heart of our analysis is the observation that . (Crucially, this expression has only in the base and not in the exponent!) Indeed, for every , for every , and for every integer multiples of in (there are many such values), the probability of this value in can be any of the values . (The inequality on is strict since, for example, not all probabilities can be simultaneously.) Therefore, .
Let be the IR and BIC auction for that maximizes the expected revenue (among such auctions) from (our learning Algorithm cannot hope to find , but in our analysis we may carefully reason about it, as it is nonetheless well defined; in particular, it does not depend on ). Let be the (randomized) mechanism defined over as follows: let be an input valuation; for each , independently draw ; let ; the allocation of is , and the payment of each bidder is .
Lemma 5.1.
is an IR and BIC mechanism for , whose expected revenue from is smaller than the expected revenue of from .
We will choose so that with probability at least , both of the following simultaneously hold for all mechanisms :

(3)

For every agent and types :
(4)
We note that for every mechanism , we require that Equation 4 hold for distinct combinations of of and . Crucially, this number does not depend on .
By Theorem 3 (with , and note that any mechanism’s revenue is bounded by ), we have that for each mechanism separately Section 5 holds with probability at least , and for each combination of separately Equation 4 holds with probability at least .
Choosing so that each of these probabilities is at least , we obtain that both Section 5 holds simultaneously for all mechanisms and Equation 4 holds simultaneously for all combinations with probability at least . We now estimate . Since , we have that it is enough to take such that
Therefore,
Let be the output of the call to OptimizationOracle in Algorithm 1, and let be the final output of the Algorithm (the output of EmpiricalOptimize).
Lemma 5.2.
If Equation 4 holds for each combination of , then:

is a BIC mechanism for .

The expected revenue of from at most smaller than the expected revenue of from .

is a BIC mechanism for .
Lemma 5.3.
is an IR mechanism whose expected revenue from identical to the expected revenue of from . Furthermore, if is BIC for , then is BIC for .
6 From Approximate to Exact Incentive Compatibility
In this Section, we derive sample complexity results for exact incentive compatibility for the special cases of a single bidder (Theorem 5) or a single good / singleparameter setting (Theorem 6). As mentioned in the introduction, whether this can also be done for more general settings remains an open question.
6.1 One Bidder
In this Section, we will prove Theorem 5.
For a single bidder, the following Theorem, which to the best of our knowledge first implicitly appeared in Balcan et al. (2005), where it is attributed to Nisan,
Theorem 7 (Nisan, circa 2005).
Let be an IR and IC
For completeness, we provide a proof of this Theorem. The idea is that discounting more expensive priced outcomes more heavily makes sure that incentives do not drive the buyer toward a much cheaper priced outcome. More concretely, due to the auction being only IC, the utility of a buyer from choosing a cheaper priced outcome can be higher by at most . Since for any priced outcome whose price is cheaper by more than a compared to the buyer’s original priced outcome, the given discount is smaller by more than , this smaller discount more than eliminates any potential utility gain due to choosing the cheaper priced outcome, so such a cheaper priced outcome would not become the mostpreferred one.
Proof.
Fix a type for the bidder. Let be the priced outcome (a distribution over priced outcomes, i.e., a random variable, if is randomized) according to when the bidder has type . It is enough to show that the bidder pays at least in expectation according to when he has type . (We denote the expected price of, e.g., by .) Let be a possible priced outcome of , and let be the priced outcome of that corresponds to it. We will show that if , then the bidder strictly prefers the priced outcome of that corresponds to over (and so does not choose in , completing the proof). Indeed, since in this case , we have that