Bernoulli Factories and Black-Box Reductions in Mechanism Design Shaddin Dughmi was supported in part by NSF CAREER Award CCF-1350900. Jason Hartline was supported in part by NSF grant CCF-0830773. Robert Kleinberg was partially supported by NSF grant CCF-1512964. Portions of this work were completed while Dr. Kleinberg was a researcher at Microsoft Research New England. Rad Niazadeh was supported by NSF grant CCF-1512964 and by a Google Ph.D. Fellowship. Portions of this work were completed while he was an intern at Microsoft Research Redmond.

# Bernoulli Factories and Black-Box Reductions in Mechanism Design 1

## Abstract

We provide a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design for welfare maximization problems. Unlike prior results, our reduction achieves exact incentive compatibility for problems with multi-dimensional and continuous type spaces.

The key technical barrier preventing exact incentive compatibility in prior black-box reductions is that repairing violations of incentive constraints requires understanding the distribution of the mechanism’s output, which is typically #P-hard to compute. Reductions that instead estimate the output distribution by sampling inevitably suffer from sampling error, which typically precludes exact incentive compatibility.

We overcome this barrier by employing and generalizing the computational model in the literature on Bernoulli Factories. In a Bernoulli factory problem, one is given a function mapping the bias of an “input coin” to that of an “output coin”, and the challenge is to efficiently simulate the output coin given only sample access to the input coin. Consider a generalization which we call the expectations from samples computational model, in which a problem instance is specified by a function mapping the expected values of a set of input distributions to a distribution over outcomes. The challenge is to give a polynomial time algorithm that exactly samples from the distribution over outcomes given only sample access to the input distributions.

In this model, we give a polynomial time algorithm for the function given by exponential weights: expected values of the input distributions correspond to the weights of alternatives and we wish to select an alternative with probability proportional to an exponential function of its weight. This algorithm is the key ingredient in designing an incentive compatible mechanism for bipartite matching, which can be used to make the approximately incentive compatible reduction of Hartline et al. (2015) exactly incentive compatible.

## 1 Introduction

We resolve a five-year-old open question from Hartline et al. (2011, 2015): There is a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design for welfare maximization problems.2 The key distinction between our result and those of Hartline et al. (2011, 2015) is that both (a) the agents’ preferences can be multi-dimensional and from a continuous space (rather than single-dimensional or from a discrete space), and (b) the resulting mechanism is exactly Bayesian incentive compatible (rather than approximately Bayesian incentive compatible).

A mechanism solicits preferences from agents, i.e., how much each agent prefers each outcome, and then chooses an outcome. Incentive compatibility of a mechanism requires that, though agents could misreport their preferences, it is not in any agent’s best interest to do so. A quintessential research problem at the intersection of mechanism deign and approximation algorithms is to identify black-box reductions from approximation mechanism design to approximation algorithm design. The key algorithmic property that makes a mechanism incentive compatible is that, from any individual agent’s perspective, it must be maximal-in-range, specifically, the outcome selected maximizes the agent’s utility less some cost that is a function of the outcome (e.g., this cost function can depend on other agents’ reported preferences.).

The black-box reductions from Bayesian mechanism design to Bayesian algorithm design in the literature are based on obtaining an understanding of the distribution of outcomes produced by the algorithm through simulating the algorithm on samples from agents’ preferences. Notice that, even for structurally simple problems, calculating the exact probability that a given outcome is selected by an algorithm can be #P-hard. For example, Hartline et al. (2015) show such a result for calculating the probability that a matching in a bipartite graph is optimal, for a simple explicitly given distribution of edge weights. On the other hand, a black-box reduction for mechanism design must produce exactly maximal-in-range outcomes merely from samples. This challenge motivates new questions for algorithm design from samples.

#### The Expectations from Samples Model.

In traditional algorithm design, the inputs are specified to the algorithm exactly. In this paper, we formulate the expectations from samples model. This model calls for drawing an outcome from a distribution that is a precise function of the expectations of some random sources that are given only by sample access. Formally, a problem for this model is described by a function where is an abstract set of feasible outcomes and is the family of probability distributions over . For any input distributions on support with unknown expectations , an algorithm for such a problem, with only sample access to each of the input distributions, must produce sample outcome from that is distributed exactly according to .

Producing an outcome that is approximately drawn according to the desired distribution can typically be done from estimates of the expectations formed from sample averages (a.k.a., Monte Carlo sampling). On the other hand, exact implementation of many natural functions is either impossible for information theoretic reasons or requires sophisticated techniques. Impossibility generally follows, for example, when is discontinuous. The literature on Bernoulli Factories (e.g., Keane and O’Brien, 1994), which inspires our generalization to the expectations from samples model and provides some of the basic building blocks for our results, considers the special case where the input distribution and output distribution are both Bernoullis (i.e., supported on ).

We propose and solve two fundamental problems for the expectations from samples model. The first problem considers the biases of Bernoulli random variables as the marginal probabilities of a distribution on (i.e., satisfies ) and asks to sample from this distribution. We develop an algorithm that we call the Bernoulli Race to solve this problem.

The second problem corresponds to the “soft maximum” problem given by a regularizer that is a multiple of the Shannon entropy function . The marginal probabilities on outcomes that maximize the expected value of the distribution over outcomes plus the entropy regularizer are given by exponential weights,3 i.e., the function outputs with probability proportional to . A straightforward exponentiation and then reduction to the Bernoulli Race above does not have polynomial sample complexity. We develop an algorithm that we call the Fast Exponential Bernoulli Race to solve this problem.

#### Black-box Reductions in Mechanism Design.

A special case of the problem that we must solve to apply the standard approach to black-box reductions is the single-agent multiple-urns problem. In this setting, a single agent faces a set of urns, and each urn contains a random object whose distribution is unknown, but can be sampled. The agent’s type determines his utility for each object; fixing this type, urn is associated with a random real-valued reward with unknown expectation . Our goal is to allocate the agent his favorite urn, or close to it.

As described above, incentive compatibility requires an algorithm for selecting a high-value urn that is maximal-in-range. If we could exactly calculate the expected values from the agent’s type, this problem is trivial both algorithmically and from a mechanism design perspective: simply solicit the agent’s type then allocate him the urn with the maximum . As described above, with only sample access to the expected values of each urn, we cannot implement the exact maximum. Our solution is to apply the Fast Exponential Bernoulli Race as a solution to the regularized maximization problem in the expectations from samples model. This algorithm – with only sample access to the agent’s values for each urn – will assign the agent to a random urn with a high expected value and is maximal-in-range.

The multi-agent reduction from Bayesian mechanism design to Bayesian algorithm design of Hartline et al. (2011, 2015) is based on solving a matching problem between multiple agents and outcomes, where an agent’s value for an outcome is the expectation of a random variable which can be accessed only through sampling.4 Specifically, this problem generalizes the above-described single-agent multiple-urns problem to the problem of matching agents to urns with the goal of approximately maximizing the total weight of the matching (the social welfare). Again, for incentive compatibility we require this expectations from samples algorithm to be maximal-in-range from each agent’s perspective. Using methods from Agrawal and Devanur’s (2015) work on stochastic online convex optimization, we reduce this matching problem to the single-agent multiple-urns problem.

As stated in the opening paragraph, our main result – obtained through the approach outlined above – is a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design. The analysis assumes that agents’ values are normalized to the interval and gives additive loss in the welfare. The reduction is an approximation scheme and the dependence of the runtime on the additive loss is inverse polynomial. The reduction depends polynomially on a suitable notion of the size of the space of agent preferences. For example, applied to environments where agents have preferences that lie in high-dimensional spaces, the runtime of the reduction depends polynomially on the number of points necessary to approximately cover each agent’s space of preferences. More generally, the bounds we obtain are polynomial in the bounds of Hartline et al. (2011, 2015) but the resulting mechanism, unlike in the proceeding work, is exactly Bayesian incentive compatible.

#### Organization.

The organization of the paper separates the development of the expectations from samples model and its application to black-box reductions in Bayesian mechanism design. Section 2 introduces Bernoulli factories and reviews basic results from the literature. Section 3 defines two central problems in the expectations from samples model, sampling from outcomes with linear weights and sampling from outcomes with exponential weights, and gives algorithms for solving them. We return to mechanism design problems in Section 4 and solve the single-agent multiple urns problem. In Section 5 we give our main result, the reduction from Bayesian mechanism design to Bayesian algorithm design.

## 2 Basics of Bernoulli Factories

We use the terms Bernoulli and coin to refer to distributions over and , interchangeably. The Bernoulli factory problem is about generating new coins from old ones.

###### Definition 2.1 (Keane and O’Brien, 1994).

Given function , the Bernoulli factory problem is to output a sample of a Bernoulli variable with bias (i.e. an -coin), given black-box access to independent samples of a Bernoulli distribution with bias (i.e. a -coin).

To illustrate the Bernoulli factory model, consider the examples of and . For the former one, it is enough to flip the -coin twice and output if both flips are , and otherwise. For the latter one, the Bernoulli factory is still simple but more interesting: draw from the Poisson distribution with parameter , flip the -coin times and output if all coin flips where , and otherwise (see below).5

The question of characterizing functions for which there is an algorithm for sampling -coins from -coins has been the main subject of interest in this literature (Keane and O’Brien, 1994; Nacu and Peres, 2005). In particular, Keane and O’Brien (1994) provides necessary and sufficient conditions for under which a Bernoulli factory exists. Moreover, Nacu and Peres (2005) suggests an algorithm for simulating an -coin based on polynomial envelopes of . The canonical challenging problem of Bernoulli factories – and a primitive in the construction of more general Bernoulli factories – is the Bernoulli Doubling problem: for . See Łatuszyński (2010) for a survey on this topic.

Questions in Bernoulli factories can be generalized to multiple input coins. Given , the goal is sample from a Bernoulli with bias given sample access to independent Bernoulli variables with unknown biases . Linear functions were studied and solved by Huber (2015). For example, the special case and , a.k.a., Bernoulli Addition, can be solved by reduction to the Bernoulli Doubling problem (formalized below).

Questions in Bernoulli factories can be generalized to allow input distributions over real numbers on the unit interval (rather than Bernoullis over ). In this generalization the question is to produce a Bernoulli with bias with sample access to draws from a distribution supported on with expectation . These problems can be easily solved by reduction to the Bernoulli factory problem:

1. Continuous to Bernoulli: Can implement Bernoulli with bias with one sample from distribution with expectation . Algorithm:

• Draw and .

• Output .

Below are enumerated the important building blocks for Bernoulli factories.

1. Bernoulli Down Scaling: Can implement for with one sample from . Algorithm:

• Draw and .

• Output (i.e., if both coins are , otherwise ).

2. Bernoulli Doubling: Can implement for with samples from in expectation. The algorithm is complicated, see Nacu and Peres (2005).

3. Bernoulli Probability Generating Function: Can implement for distribution over non-negative integers with samples from in expectation. Algorithm:

• Draw and (i.e., samples).

• Output (i.e., if all coins are , otherwise ).

4. Bernoulli Exponentiation: Can implement for and non-negative constant with samples from in expectation. Algorithm: Apply the Bernoulli Probability Generating Function algorithm for the Poisson distribution with parameter .

5. Bernoulli Averaging: Can implement with one sample from or . Algorithm:

• Draw , , and .

• Output .

6. Bernoulli Addition: Can implement for with samples from and in expectation. Algorithm: Apply Bernoulli Doubling to Bernoulli Averaging.

It may seem counterintuitive that Bernoulli Doubling is much more challenging that Bernoulli Down Scaling. Notice, however, that for a coin with bias , Bernoulli Doubling with a finite number of coin flips is impossible. The doubled coin must be deterministically heads, while any finite sequence of coin flips of has non-zero probability of occuring. On the other hand a coin with probability for some small has a similar probability of each sequence but Bernoulli Doubling must sometimes output tails. Thus, Bernoulli Doubling must require a number of coin flips that goes to infinity as goes to zero.

## 3 The Expectations from Samples Model

The expectations from samples model is a combinatorial generalization of the Bernoulli factory problem. The goal is to select an outcome from a distribution that is a function of the expectations of a set of input distributions. These input distributions can be accessed only by sampling.

###### Definition 3.1.

Given function for domain , the expectations from samples problem is to output a sample from given black-box access to independent samples from distributions supported on with expectations .

Without loss of generality, by the Continuous to Bernoulli construction of Section 2, the input random variables can be assumed to be Bernoullis and, thus, this expectations of samples model can be viewed as a generalization of the Bernoulli factory question to output spaces beyond . In this section we propose and solve two fundamental problems for the expectations of samples model. In these problems the outcomes are the a finite set of outcomes and the input distributions are Bernoulli distributions with biases .

In the first problem, biases correspond to the marginal probabilities with which each of the outcomes should be selected. The goal is to produce random from so that the probability of is exactly its marginal probability . More generally, if the biases do not sum to one, this problem is equivalently the problem of random selection with linear weights.

The second problem we solve corresponds to a regularized maximization problem, or specifically random selection from exponential weights. For this problem the baiases of the Bernoulli input distributions correspond to the weights of the outcomes. The goal is to produce a random from according to the distribution given by exponential weights, i.e., the probability of selecting from is .

### 3.1 Random Selection with Linear Weights

###### Definition 3.2 (Random Selection with Linear Weights).

The random selection with linear weights problem is to sample from the probability distribution defined by for each in with only sample access to distributions with expectations .

We solve the random selection with linear weights problem by an algorithm that we call the Bernoulli race (Algorithm 1). The algorithm repeatedly picks a coin uniformly at random and flips it. The winning coin is the first one to come up heads in this process.

###### Theorem 3.1.

The Bernoulli Race (Algorithm 1) samples with linear weights (Definition 3.2) with an expected samples from input distributions with biases .

###### Proof.

At each iteration, the algorithm terminates if the flipped coin outputs and iterates otherwise. Since the coin is chosen uniformly at random, the probability of termination at each iteration is . The total number of iterations (and number of samples) is therefore a geometric random variable with expectation .

The selected outcome also follows the desired distribution, as shown below.

 \bf Pr[i is selected] =∞∑k=1\bf Pr[i is selected at% time k]\bf Pr[algorithm reaches time k] =vim∞∑k=1(1−1m∑jvj)k−1=vim1m∑jvj=vi∑jvj.\qed

### 3.2 Random Selection with Exponential Weights

###### Definition 3.3 (Random Selection with Exponential Weights).

For parameter , the random selection with exponential weights problem is to sample from the probability distribution defined by for each in with only sample access to distributions with expectations .

The Basic Exponential Bernoulli Race, below, samples from the exponential weights distribution. The algorithm follows the paradigm of picking one of the input distributions, exponentiating it, sampling from the exponentiated distribution, and repeating until one comes up heads. While this algorithm does not generally run in polynomial time, it is a building block for one that does.

###### Theorem 3.2.

The Basic Exponential Bernoulli Race (Algorithm 2) samples with exponential weights (Definition 3.3) with an expected samples from input distributions with biases and .

###### Proof.

The correctness and runtime follows from the correctness and runtimes of Bernoulli Exponentiation and the Bernoulli Race. ∎

### 3.3 The Fast Exponential Bernoulli Race

Sampling from exponential weights is typically used as a “soft maximum” where the parameter controls how close the selected outcome is to the true maximum. For such an application, exponential dependence on in the runtime would be prohibitive. Unfortunately, when is bounded away from one, the runtime of the Basic Logistic Bernoulli Race (Algorithm 2; Theorem 3.2) is exponential in . A simple observation allows allows the resolution of this issue: the exponential weights distribution is invariant to any uniform additive shift of all weights. This section applies this idea to develop the Fast Logistic Bernoulli Race.

Observe that for any given parameter , we can easily implement a Bernoulli random variable whose bias is within an additive of . Note that, unlike the other algorithms in this section, a precise relationship between and is not required.

###### Lemma 3.3.

For parameter , there is an algorithm for sampling from a Bernoulli random variable with bias , where , with samples from input distributions with biases .

###### Proof.

The algorithm is as follows: Sample times from each of the coins, let be the empirical estimate of coin ’s bias obtained by averaging, then apply the Continuous to Bernoulli algorithm (Section 2) to map to a Bernoulli random variable.

Standard tail bounds imply that with probability at least , and therefore . ∎

Since we are interested in a fast logistic Bernoulli race as grows large, we restrict attention to . We set in the estimation of (by Lemma 3.3). This estimate will be used to boost the bias of each distribution in the input so that the maximum bias is at least . The boosting of the bias is implemented with Bernoulli Addition which, to be fast, requires the cumulative bias be bounded away from one. Thus, the probabilities are scaled down by a factor of , this scaling is subsequently counterbalanced by adjusting the parameter . The formal details are given below.

###### Theorem 3.4.

The Fast Exponential Bernoulli Race (Algorithm 3) samples with exponential weights (Definition 3.3) with an expected samples from the input distributions.

###### Proof.

The correctness and runtime follows from the correctness and runtimes of the Basic Exponential Bernoulli Race, Bernoulli Doubling, Lemma 3.3 (for estimate of ), and the facts that and that the distribution given by exponential weights is invariant to additive shifts of all weights.

A detailed analysis of the runtime follows. Since the algorithm builds a number of sampling subroutines in a hierarchy, we analyze the runtime of the algorithm and the various subroutines in a bottom up fashion. Steps 3 and 4 implement a coin with bias with runtime per sample, as per the bound of Lemma 3.3. The coin implemented in Step 5 is sampled in constant time. Observe that , and the runtime of Bernoulli Doubling implies that samples from the coins of Steps 4 and 5 suffice for sampling ; we conclude that a -coin can be sampled in time . Finally, note that for , we have ; Theorem 3.2 then implies that the Basic Exponential Bernoulli Race samples at most times from the -coins; we conclude the claimed runtime. ∎

## 4 The Single-Agent Multiple-Urns Problem

We investigate incentive compatible mechanism design for the single-agent multiple-urns problem. Informally, mechanism is needed to assign an agent to one of many urns. Each urn contains objects and the agent’s value for being assigned to an urn is taken in expectation over objects from the urn. The problem asks for an incentive compatible mechanism with good welfare (i.e., the value of the agent for the assigned urn).

### 4.1 Problem Definition and Notations

A single agent with type from type space desires an object from outcome space . The agent’s value for an outcome is a function of her type and denoted by . The agent is a risk-neutral quasi-linear utility maximizer with utility for randomized outcome and expected payment . There are urns. Each urn is given by a distribution over outcomes in . If the agent is assigned to urn she obtains an object from the urn’s distribution .

A mechanism can solicit the type of the agent (who may misreport if she desires). We further assume (1) the mechanism has black-box access to evaluate for any type and outcome , (2) the mechanism has sample access to the distribution of each urn . The mechanism may draw objects from urns and evaluate the agent’s reported value for these objects, but then must ultimately assign the agent to a single urn and charge the agent a payment. The urn and payment that the agent is assigned are random variables in the mechanism’s internal randomization and randomness from the mechanisms potential samples from the urns’ distributions.

The distribution of the urn the mechanism assigns to an agent, as a function of her type , is denoted by where is the marginal probability that the agent is assigned to urn . Denote the expected value of the agent for urn by . The expected welfare of the mechanism is . The expected payment of this agent is denoted by . The agent’s utility for the outcome and payment of the mechanism is given by . Incentive compatibility is defined by the agent with type preferring her outcome and payment to that assigned to another type .

###### Definition 4.1.

A single-agent mechanism is incentive compatible if, for all :

 ∑jvj(t)xj(t)−p(t) ≥∑jvj(t)xj(t′)−p(t′) (1)

A multi-agent mechanism is Bayesian Incentive Compatible (BIC) if equation (1) holds for the outcome of the mechanism in expectation of the truthful reports of the other agents.

### 4.2 Incentive Compatible Approximate Scheme

If the agent’s expected value for each urn is known, or equivalently mechanism designer knows the distributions for all urns rather than only sample access, this problem would be easy and admits a trivial optimal mechanism: simply select the urn maximizing the agent’s expected value according to her reported type , and charge her a payment of zero. What makes this problem interesting is that the designer is restricted to only sample the agent’s value for an urn. In this case, the following Monte-carlo adaptation of the trivial mechanism is tempting: sample from each urn sufficiently many times to obtain a close estimate of with high probability (up to any desired precision ), then choose the urn maximizing and charge a payment of zero. This mechanism is not incentive compatible, as illustrated by a simple example.

• Consider two urns. Urn contains only outcome , whereas two contains a mixture of outcomes and , with slightly more likely than . Now consider an agent who has (true) values , , and for outcomes , , and respectively. If this agent reports her true type, the trivial Monte-carlo mechanism — instantiated with any desired finite degree of precision — assigns her urn most of the time, but assigns her urn with some nonzero probability. The agent gains by misreporting her value of outcome as , since this guarantees her preferred urn .

The above example might seem counter-intuitive, since the trivial Monte-carlo mechanism appears to be doing its best to maximize the agent’s utility, up to the limits of (unavoidable) sampling error. One intuitive rationalization is the following: an agent can slightly gain by procuring (by whatever means) more precise information about the distributions than that available to the mechanism, and using this information to guide her strategic misreporting of her type. This raises the following question:

#### Question:

Is there an incentive-compatible mechanism for the single-agent multiple-urns problem which achieves welfare within of the optimal, and samples only times (in expectation) from the urns?

We resolve the above question in the affirmative. We present approximation scheme for this problem that is based on our solution to the problem of random selection with exponential weights (Section 3.2). The solution to the single-agent multiple-urns problem is a main ingredient in the Bayesian mechanism that we propose in Section 5 as our black-box reduction mechanism.

To explain the approximate scheme, we start by recalling the following standard theorem in mechanism design.

###### Theorem 4.1.

For outcome rule , there exists payment rule so that single-agent mechanism is incentive compatible if and only if is maximal in range, i.e., , for some cost function .6

The payments that satisfy Theorem 4.1 can be easily calculated with black-box access to outcome rule . For a single-agent problem, this payment can be calculated in two calls to the function , one on the agent’s reported type and the other on a type randomly drawn from the path between the origin and . Further discussion and details are given in Appendix A. It suffices, therefore, to identify a mechanism that samples from urns and assigns the agent to an urn that induces an outcome rule that is good for welfare, i.e., , and is maximal in range. The following theorem solves the problem.

###### Theorem 4.2.

There is an incentive-compatible mechanism for the single-agent multiple-urns problem which achieves an additive -approximation to the optimal welfare in expectation, and runs in time in expectation.

###### Proof.

Consider the problem of selecting a distribution over urns to optimize welfare plus (a scaling of) the Shannon entropy function, i.e., .7 It is well known that the optimizer is given by exponential weights, i.e., the marginal probability of assigning the th urn is given by . In Section 3.3 we gave a polynomal time algorithm for sampling from exponential weights, specifically, the Fast Exponential Bernoulli Race (Algorithm 3). Proper choice of the parameter trades off faster runimes with increased welfare loss due to entropy term. The entropy is maximized at the uniform distribution with entropy . Thus, choosing guarantees that the welfare is within an additive of the optimal welfare . The bound of the theorem then follows from the analysis of the Fast Exponential Bernoulli Race (Theorem 3.4) with this choice of . ∎

## 5 A Bayesian Incentive Compatible Black-box Reduction

A central question at the interface between algorithms and economics is on the existence of black-box reductions for mechanism design. Given black-box access to any algorithm that maps inputs to outcomes, can a mechanism be constructed that (a) induces agents to truthfully report the inputs and (b) produces an outcome that is as good as the one produced by the algorithm? The mechanism must be computationally tractable, specifically, making no more than a polynomial number of elementary operations and black-box calls to the algorithm.

A line of research initiated by Hartline and Lucier (2010, 2015) demonstrated that, for the welfare objective, Bayesian black-box reductions exist. In the Bayesian setting, agents’ types are drawn from a distribution. The algorithm is assumed to obtain good welfare for types from this distribution. The constructed mechanism is an approximation scheme; For any it gives a mechanism that is Bayesian incentive compatible (Definition 4.1) and obtains a welfare that is at least an additive form the algorithms welfare. Before formalizing this problem, for further details on Bayesian mechanism design and our set of notations in this paper, which are based on those in Hartline et al. (2015), we refer the reader to Appendix B.

###### Definition 5.1 (BIC black-box reduction problem).

Given black-box oracle access to an allocation algorithm , construct an allocation algorithm that is Bayesian incentive compatible; approximately preserves welfare, i.e., any agent’s expected welfare under is at least that under less ; and runs in polynomial time in and .

In this literature, Hartline and Lucier (2010, 2015) solve the case of single-dimensional agents and Hartline et al. (2011, 2015) solve the case of multi-dimensional agents with discrete type spaces. For the relaxation of the problem where only approximate incentive compatibility is required, Bei and Huang (2011) solve the case of multi-dimensional agents with discrete type space, and Hartline et al. (2011, 2015) solve the general case. These reductions are approximation schemes that are polynomial in the number of agents, the desired approximation factor, and a measure of the size of the agents’ type spaces (e.g., its dimension).

### 5.1 Surrogate Selection and the Replica-Surrogate Matching

A main conclusion of the literature on Bayesian reductions for mechanism design is that the multi-agent problem of reducing Bayesian mechanism design to algorithm design, itself, reduces to a single-agent problem of surrogate selection. Consider any agent in the original problem and the induced algorithm with the inputs form other agents hardcoded as random draws from their respective type distributions. The induced algorithm maps the type of this agent to a distribution over outcomes. If this distribution over outcomes is maximal-in-range then there exists payments for which the induced algorithm is incentive compatible (Theorem 4.1). If not, the problem of surrogate selection is to map the type of the agent to an input to the algorithm to satisfy three properties:

• The composition of surrogate selection and the induced algorithm is maximal-in-range,

• The composition approximately preserves welfare,

• The surrogate selection preserves the type distribution.

Condition (c), a.k.a. stationarity, implies that fixing the non-maximaility-of-range of the algorithm for a particular agent does not affect the outcome for any other agents. With such an approach each agent’s incentive problem can be resolved independently from that of other agents.

###### Theorem 5.1 (Hartline et al., 2015).

The composition of an algorithm with a profile of surrogate selection rules satisfying conditions (a)–(c) is Bayesian incentive compatible and approximately preserves the algorithm’s welfare (the loss in welfare is the sum of the losses in welfare of each surrogate selection rule).

The surrogate selection rule of Hartline et al. (2015) is based on setting up a matching problem between random types from the distribution (replicas) and the outcomes of the algorithm on random types from the distribution (surrogates). The true type of the agent is one of the replicas, and the surrogate selection rule outputs the surrogate to which this replica is matched. This approach addresses the three properties of surrogate selection rules as (a) if the matching selected is maximal-in-range then the composition of the surrogate selection rule with the induced algorithm is maximal-in-range, (b) the welfare of the matching is the welfare of the reduction and the optimal matching approximates the welfare of the original algorithm, and (c) any maximal matching gives a stationary surrogate selection rule. For a detailed discussion on why maximal-in-range matching will result in a BIC mechanism after composing the corresponding surrogate selection rule with the allocation algorithm, we refer the interested reader to Lemma C.1 and Lemma C.2 in Appendix C.

###### Definition 5.2.

The replica-surrogate matching surrogate selection rule; for a -to- matching algorithm , a integer market size , and load ; maps a type to a surrogate type as follows:

1. Pick the real-agent index uniformly at random from .

2. Define the replica type profile , an -tuple of types by setting and sampling the remaining replica types i.i.d. from the type distribution .

3. Sample the surrogate type profile , an -tuple of i.i.d. samples from the type distribution .

4. Run matching algorithm on the complete bipartite graph between replicas and surrogates.

5. Output the surrogate that is matched to .

The value that a replica obtains for the outcome that the induced algorithm produces for a surrogate, henceforth, surrogate outcome, is a random variable. The analysis of Hartline et al. (2015) is based on the study of an ideal computational model where the value of any replica for any surrogate outcome is known exactly. In this computationally-unrealistic model and with these values as weights, the maximum weight matching algorithm can be employed in the replica-surrogate matching surrogate selection rule above, and it results in a Bayesian incentive compatible mechanism. Hartline et al. (2015) analyze the welfare of the resulting mechanism in the case where the load is , prove that conditions (a)-(c) are satisfied, and give (polynomial) bounds on the size that is necessary for the expected welfare of the mechanism to be an additive from that of the algorithm.

• Given a BIC allocation algorithm ~missingA

If is maximum matching, conditions (a)-(c) clearly continue to hold for our generalization to load . Moreover, the welfare of the reduction is monotone non-decreasing in .

###### Lemma 5.2.

In the ideal computational model (where the value of a replica for being matched to a surrogate is given exactly) the per-replica welfare of the replica-surrogate maximum matching is monotone non-decreasing in load .

###### Proof.

Consider a non-optimal matching that groups replicas into groups of size and finds the optimal -to- matching between replicas in each group and the surrogates. As these are random matchings, the expected welfare of each such matching is equal to the expected welfare of the matching. These matchings combine to give a feasible matching between the replicas and surrogates. The total expected welfare of the optimal -to- matching between replicas and surrogates is no less than times the expected welfare of the matching. Thus, the per-replica welfare, i.e., normalized by , is monotone in . ∎

Our main result is an approximation scheme for the ideal reduction of Hartline et al. (2015). We identify a and a polynomial time (in and ) -to- matching algorithm for the black-box model and prove that the expected welfare of this matching algorithm (per-replica) is within an additive of the expected welfare per-replica of the optimal matching in the ideal model with load (as analyzed by Hartline et al., 2015). The welfare of the ideal model is monotone non-decreasing in load (Lemma 5.2); therefore it will be sufficient to identify a polynomial load where there is a polynomial time algorithm in the black-box model that has loss relative to the ideal model for that same load .

In the remainder of this section we replace this ideal matching algorithm with an approximation scheme for the black-box model where replica values for surrogate outcomes can only be estimated by sampling. For any our algorithm gives an additive loss of the welfare of the ideal algorithm with only a polynomial increase to the runtime. Moreover, the algorithm produces a perfect (and so maximal) matching, and therefore the surrogate selection rule is stationary; and the algorithm is maximal-in-range for any replica (including the true type of the agent), and therefore the resulting mechanism is Bayesian incentive compatible.

### 5.2 Entropy Regularized Matching

In this section we define an entropy regularized bipartite matching problem and discuss its solution. We will refer to the left-hand-side vertices as replicas and the right-hand-side vertices as surrogates. The weights on the edge between replica and surrogate will be denoted by . In our application to the replica-surrogate matching defined in the previous section, the weights will be set to for .

###### Definition 5.3.

For weights , the entropy regularized matching program for parameter is:

 max{xi,j}(i,j)∈[km]×[m] ∑i,jxi,jvi,j−δ∑i,jxi,jlogxi,j, s.t. ∑ixi,j≤k ∀j∈[m], ∑jxi,j≤1 ∀i∈[km].

The optimal value of this program is denoted .

The dual variables for right-hand-side constraints of the matching polytope can be interpreted as prices for the surrogate outcomes. Given prices, the utility of a replica for a surrogate outcome given prices is the difference between the replica’s value and the price. The following lemma shows that for the right choice of dual variables, the maximizer of the entropy regularized matching program is given by exponential weights with weights equal to the utilities.

###### Observation 1.

For the optimal Lagrangian dual variables for surrogate feasibility in the entropy regularized matching program (Definition 5.3), namely,

 α∗ =argminα≥0maxx{L(x,α): ∑jxi,j≤1 , ∀i} where L(x,α)≜∑i,jxi,jvi,j−δ∑i,jxi,jlogxi,j+∑jαj(k−∑ixi,j) is the Lagrangian objective function; the optimal solution x∗ to the primal is given by exponential weights: x∗i,j =exp(vi,j−α∗iδ)∑j′exp(vi,j′−α∗j′δ),∀i,j.

Observation 1 recasts the entropy regularized matching as, for each replica, sampling from the distribution of exponential weights. For any replica and fixed dual variables our Fast Exponential Bernoulli Race (Algorithm 3) gives a polynomial time algorithm for sampling from the distribution of exponential weights in the expectations from samples computational model.

###### Lemma 5.3.

For replica and any prices (dual variables) , allocating a surrogate drawn from the exponential weights distribution

 xi,j=exp(vi,j−αjδ)∑j′exp(vi,j′−αj′δ),∀j∈[m], (2)

is maximal-in-range, as defined in Definition 4.1, and this random surrogate can be sampled with samples from replica-surrogate-outcome value distributions.

###### Proof.

To see that the distribution is maximal-in-range when assigning surrogate outcome to replica , consider the regularized welfare maximization

 argmaxx′∑jvi,jx′j−δ∑jx′jlogx′j−∑jαjx′j

for replica . Similar to Observation 1, first-order conditions imply that the exponential weights distribution in (2) is the unique maximizer of this concave program.

To apply the Fast Exponential Bernoulli Race to the utilities, of the form , we must first normalize them to be on the interval . This normalization is accomplished by adding to the utilities (which has no effect on the exponential weights distribution, and therefore preserves maximality-in-range), and then scaling by . The scaling needs to be corrected by setting in the Fast Exponential Bernoulli Race (Algorithm 3) to . The expected number of samples from the value distributions that are required by the algorithm, per Theorem 3.4, is .

If we knew the optimal Lagrangian variables from Observation 1, it would be sufficient to define the surrogate selection rule by simply sampling from the exponential weights distribution (which is polynomial time per Lemma 5.3) that corresponds to the agent’s true type (indexed ). Notice that the wrong values of correspond to violating primal constraints (for the surrogates) and thus the outcome from sampling from exponential weights for such would not correspond to a maximal-in-range matching. In the next section we give a polynomial time approximation scheme that is maximal-in-range for each replica and approximates sampling from the correct .

### 5.3 Online Entropy Regularized Matching

In this section, we reduce the entropy regularized matching problem to the problem of sampling from exponential weights (as described in Lemma 5.3) via an online algorithm. Consider replicas being drawn adversarially, but in a random order, over times . The basic observation is that approximate dual variables are sufficient for an online assignment of each replica to a surrogate via Lemma 5.3 to approximate the optimal (offline) regularized matching. Recall, the replicas are independently and identically distributed in the original problem.

Our construction borrows techniques used in designing online algorithms for stochastic online convex programming problems (Agrawal and Devanur, 2015; Chen and Wang, 2013), and stochastic online packing problems (Agrawal et al., 2009; Devanur et al., 2011; Badanidiyuru et al., 2013; Kesselheim et al., 2014). Our online algorithm (Algorithm 4, below) considers the replicas in order, updates the dual variables using multiplicative weight updates based on the current allocation, and allocates to each agent by sampling from the exponential weights distribution as given by Lemma 5.3. The algorithm is parameterized by , the scale of the regularizer; by , the rate at which the algorithm learns the dual variables ; and by scale parameter , which we set later.

The algorithm needs to satisfy four properties to be useful in a polynomial time reduction. First, it needs to produce a perfect matching so that the replica-surrogate matching surrogate selection rule is stationary, specifically via condition (c). Second, it needs to be maximal-in-range for the real agent (replica ). In fact, all replicas are treated symmetrically and allocated by sampling from an exponential weights distribution that is maximal-in-range via Lemma 5.3. Third, it needs to have good welfare compared to the ideal matching. Fourth, its runtime needs to be polynomial. The first two properties are immediate and imply the theorem below. The last two properties are analyzed below.

###### Theorem 5.4.

The mechanism that maps types to surrogates via the replica-surrogate matching surrogate selection rule with the online entropy regularized matching algorithm (with payments from Theorem 4.1) is Bayesian incentive compatible.

### 5.4 Social Welfare Loss

We analyze the welfare loss of the online entropy regularized matching algorithm (Algorithm 4) with regularizer parameter , learning rate , and scale parameter set as a -fraction of an estimate of the value of the offline program (Definition 5.3).

###### Theorem 5.5.

There are parameter settings for online entropy regularized matching algorithm (Algorithm 4) for which (1) its per-replica expected welfare is within an additive of the welfare of the optimal replica surrogate matching, and (2) given oracle access to , the running time of this algorithm is polynomial in and .

To prove this theorem, we first argue how to set to be a constant approximation to the -fraction of optimal value of the convex program with high probability, and with efficient sampling. Second, we argue that the online and offline optimal entropy regularized matching algorithms have nearly the same welfare. Finally, we argue that the offline optimal entropy regularized matching has nearly the welfare of the offline optimal matching. The proof of the theorem is then given by combining these results with the right parameters.

#### Parameter γ and approximating the offline optimal.

Pre-setting to be an estimate of the optimal objective of the convex program in Definition 5.3 is necessary for the competitive ratio guarantee of Algorithm 4. Also, should be set in a symmetric and incentive compatible way across replicas, to preserve stationarity property. To this end, we look at an instance generated by an independent random draw of replicas (while fixing the surrogates). In such an instance, we estimate the expected values by sampling and taking the empirical mean for each edge in the replica-surrogate bipartite graph. We then solve the convex program exactly (which can be done in polynomial time using an efficient separation oracle). Obviously, this scheme is incentive compatible as we do not even use the reported type of true agent in our calculation for , and it is symmetric across replicas. In Appendix D we show how this approach leads to a constant approximation to the optimal value of the offline program in Definition 5.3 with high probability.

###### Lemma 5.6.

If , then there exist a polynomial time approximation scheme to calculate (i.e. it only needs polynomial in , , and samples to black-box allocation ) such that

 OPT(v)/k≤γ≤O(1)OPT(v)/k

with probability at least .

#### Competitive ratio of the online entropy regularized matching algorithm.

Assuming is set to be a constant approximation to the -fraction of the optimal value of the offline entropy regularized matching program, we prove the following lemma.

###### Lemma 5.7.

For a fixed regularizer parameter , learning rate , regularized welfare estimate , and market size that satisfy

 mlogmη2≤k    and    OPT(v)/k≤γ≤O(1)OPT(v)/k ,

the online entropy regularized matching algorithm (Algorithm 4) obtains at least an fraction of the welfare of the optimal entropy regularized matching (Definition 5.3).

###### Proof.

Recall that denotes the optimal objective value of the entropy regularized matching program. We will analyze the algorithm up to the iteration that the first surrogate becomes unavailable (because all copies are matched to previous replicas).

Define the contribution of replica to the Lagrangian objective of Observation 1 for allocation and dual variables as

 L(i)(xi,α) ≜∑jvi,jxi,j−δ∑jxi,jlogxi,j+∑jγαj(1m−xi,j). (3)

The difference between the outcome for replica in the online algorithm and the solution to the offline optimization is that the algorithm selects the outcome with respect to dual variables while the offline algorithm selects the outcome with respect to the optimal dual variables (Observation 1). Denote the outcome of the online algorithm by

 xi=(xi,1,…,xi,m)=argmaxx′i∈ΔmL(i)(x′i,γα(i)) ,

and its contribution to the objective by

 ALGi≜∑jvi,jxi,j−δ∑jxi,jlogxi,j .

Likewise for the outcome of the offline optimization by and . Denote by the indicator vector for the online algorithm sampling from .

Optimality of for dual variables in equation (3) implies

 ALGi+∑jγα(i)j(1m−xi,j)≥OPTi+∑jγα(i)j(1m−x∗i,j)

so, by rearranging the terms and taking expectations conditioned on the observed history, we have

 \bf E[ALGi∣Hi−1] ≥γ\bf E[α(i)⋅xi∣Hi−1]+\bf E[OPTi∣Hi−1]−γ\bf E[α(i)⋅x∗i∣Hi−1] =\bf E[OPTi]−γα(i)⋅\bf E[x∗i]+γα(i)⋅^xi−(\bf E[OPTi]−\bf E[OPTi∣Hi−1]) +γα(i)⋅(\bf E[x∗i]−\bf E[x∗i∣Hi−1])+γα(i)⋅(\bf E[xi∣Hi−1]−^xi) ≥1mkOPT(v)+γα(i)⋅(^xi−1m1)−Li−L′i

where

 Li≜γ α(i).(^xi−\bf E[xi|Hi−1]) , L′i≜|(\bf E[r∗i]−\bf E[r∗i|Hi−1]|+γ∥\bf E[x∗i]−\bf E[x∗i|Hi−1]∥ .

By summing the above inequalities for we have:

 τ−1∑i=1\bf E[ALGi|Hi−1]≥τ−1mkOPT(v)+γτ−1∑i=1α(i)⋅(^xi−1m1)−τ−1∑i=1(Li+L′i) (4)

In order to bound the term , let . Then, by applying the regret bound of exponential gradient (or essentially multiplicative weight update) online learning algorithm for any realization of random variables (which will result in to be the exponential weights distributions with weights ), we have

 τ−1∑i=1gi(α(i))≥(1−η)max∥α∥1≤1,α≥0τ−1∑i=1gi(α)−logmη≥(1−η)(k−τ−1m)−logmη (5)

where the last inequality holds because at the time , either there exists such that , or and all surrogate outcome budgets are exhausted. In the former case, we have

 max∥α∥1≤1,α≥0τ−1∑i=1gi(α)≥τ−1∑i=1gi(ej)≥k−τ−1m,

and in the latter case we have

 max∥α∥1≤1,α≥0τ−1∑i=1gi(α)≥0≥k−τ−1m.

Combining (4) and (5), letting , and assuming for , we have:

 mk∑i=1\bf E[ALGi|Hi−1] ≥τ−1∑i=1\bf E[ALGi|Hi−1]≥τ−1mkOPT(v)+γ(1−η)(k−τ−1m)−γlogmη−τ−1∑i=1Qi ≥τ−1mkOPT(v)+OPT(v)k(1−η)(k−τ−1m)−O(1)⋅OPT(v)logmkη−mk∑i=1Qi ≥(1−η)OPT(v)−O(η)⋅OPT(v)−mk∑i=1Qi (6)

where the last inequality holds simply because . By taking expectations from both sides, we have

 \bf E[ALG]≥(1−O(η))⋅OPT(v)−mk∑i=1(\bf E[L