Bernoulli Factories and BlackBox Reductions
in Mechanism Design
^{1}
Abstract
We provide a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design for welfare maximization problems. Unlike prior results, our reduction achieves exact incentive compatibility for problems with multidimensional and continuous type spaces.
The key technical barrier preventing exact incentive compatibility in prior blackbox reductions is that repairing violations of incentive constraints requires understanding the distribution of the mechanism’s output, which is typically #Phard to compute. Reductions that instead estimate the output distribution by sampling inevitably suffer from sampling error, which typically precludes exact incentive compatibility.
We overcome this barrier by employing and generalizing the computational model in the literature on Bernoulli Factories. In a Bernoulli factory problem, one is given a function mapping the bias of an “input coin” to that of an “output coin”, and the challenge is to efficiently simulate the output coin given only sample access to the input coin. Consider a generalization which we call the expectations from samples computational model, in which a problem instance is specified by a function mapping the expected values of a set of input distributions to a distribution over outcomes. The challenge is to give a polynomial time algorithm that exactly samples from the distribution over outcomes given only sample access to the input distributions.
In this model, we give a polynomial time algorithm for the function given by exponential weights: expected values of the input distributions correspond to the weights of alternatives and we wish to select an alternative with probability proportional to an exponential function of its weight. This algorithm is the key ingredient in designing an incentive compatible mechanism for bipartite matching, which can be used to make the approximately incentive compatible reduction of Hartline et al. (2015) exactly incentive compatible.
1 Introduction
We resolve a fiveyearold open question from Hartline et al. (2011, 2015):
There is a polynomial time reduction from Bayesian incentive
compatible mechanism design to Bayesian algorithm design for welfare
maximization problems.
A mechanism solicits preferences from agents, i.e., how much each agent prefers each outcome, and then chooses an outcome. Incentive compatibility of a mechanism requires that, though agents could misreport their preferences, it is not in any agent’s best interest to do so. A quintessential research problem at the intersection of mechanism deign and approximation algorithms is to identify blackbox reductions from approximation mechanism design to approximation algorithm design. The key algorithmic property that makes a mechanism incentive compatible is that, from any individual agent’s perspective, it must be maximalinrange, specifically, the outcome selected maximizes the agent’s utility less some cost that is a function of the outcome (e.g., this cost function can depend on other agents’ reported preferences.).
The blackbox reductions from Bayesian mechanism design to Bayesian algorithm design in the literature are based on obtaining an understanding of the distribution of outcomes produced by the algorithm through simulating the algorithm on samples from agents’ preferences. Notice that, even for structurally simple problems, calculating the exact probability that a given outcome is selected by an algorithm can be #Phard. For example, Hartline et al. (2015) show such a result for calculating the probability that a matching in a bipartite graph is optimal, for a simple explicitly given distribution of edge weights. On the other hand, a blackbox reduction for mechanism design must produce exactly maximalinrange outcomes merely from samples. This challenge motivates new questions for algorithm design from samples.
The Expectations from Samples Model.
In traditional algorithm design, the inputs are specified to the algorithm exactly. In this paper, we formulate the expectations from samples model. This model calls for drawing an outcome from a distribution that is a precise function of the expectations of some random sources that are given only by sample access. Formally, a problem for this model is described by a function where is an abstract set of feasible outcomes and is the family of probability distributions over . For any input distributions on support with unknown expectations , an algorithm for such a problem, with only sample access to each of the input distributions, must produce sample outcome from that is distributed exactly according to .
Producing an outcome that is approximately drawn according to the desired distribution can typically be done from estimates of the expectations formed from sample averages (a.k.a., Monte Carlo sampling). On the other hand, exact implementation of many natural functions is either impossible for information theoretic reasons or requires sophisticated techniques. Impossibility generally follows, for example, when is discontinuous. The literature on Bernoulli Factories (e.g., Keane and O’Brien, 1994), which inspires our generalization to the expectations from samples model and provides some of the basic building blocks for our results, considers the special case where the input distribution and output distribution are both Bernoullis (i.e., supported on ).
We propose and solve two fundamental problems for the expectations from samples model. The first problem considers the biases of Bernoulli random variables as the marginal probabilities of a distribution on (i.e., satisfies ) and asks to sample from this distribution. We develop an algorithm that we call the Bernoulli Race to solve this problem.
The second problem corresponds to the “soft maximum” problem given
by a regularizer that is a multiple of the Shannon entropy
function . The marginal
probabilities on outcomes that maximize the expected value of the
distribution over outcomes plus the entropy regularizer
are given by exponential weights,
Blackbox Reductions in Mechanism Design.
A special case of the problem that we must solve to apply the standard approach to blackbox reductions is the singleagent multipleurns problem. In this setting, a single agent faces a set of urns, and each urn contains a random object whose distribution is unknown, but can be sampled. The agent’s type determines his utility for each object; fixing this type, urn is associated with a random realvalued reward with unknown expectation . Our goal is to allocate the agent his favorite urn, or close to it.
As described above, incentive compatibility requires an algorithm for selecting a highvalue urn that is maximalinrange. If we could exactly calculate the expected values from the agent’s type, this problem is trivial both algorithmically and from a mechanism design perspective: simply solicit the agent’s type then allocate him the urn with the maximum . As described above, with only sample access to the expected values of each urn, we cannot implement the exact maximum. Our solution is to apply the Fast Exponential Bernoulli Race as a solution to the regularized maximization problem in the expectations from samples model. This algorithm – with only sample access to the agent’s values for each urn – will assign the agent to a random urn with a high expected value and is maximalinrange.
The multiagent reduction from Bayesian mechanism design to Bayesian
algorithm design of Hartline et al. (2011, 2015) is based on solving a
matching problem between multiple
agents and outcomes, where an agent’s value for an outcome is the
expectation of a random variable which can be accessed only through
sampling.
As stated in the opening paragraph, our main result – obtained through the approach outlined above – is a polynomial time reduction from Bayesian incentive compatible mechanism design to Bayesian algorithm design. The analysis assumes that agents’ values are normalized to the interval and gives additive loss in the welfare. The reduction is an approximation scheme and the dependence of the runtime on the additive loss is inverse polynomial. The reduction depends polynomially on a suitable notion of the size of the space of agent preferences. For example, applied to environments where agents have preferences that lie in highdimensional spaces, the runtime of the reduction depends polynomially on the number of points necessary to approximately cover each agent’s space of preferences. More generally, the bounds we obtain are polynomial in the bounds of Hartline et al. (2011, 2015) but the resulting mechanism, unlike in the proceeding work, is exactly Bayesian incentive compatible.
Organization.
The organization of the paper separates the development of the expectations from samples model and its application to blackbox reductions in Bayesian mechanism design. Section 2 introduces Bernoulli factories and reviews basic results from the literature. Section 3 defines two central problems in the expectations from samples model, sampling from outcomes with linear weights and sampling from outcomes with exponential weights, and gives algorithms for solving them. We return to mechanism design problems in Section 4 and solve the singleagent multiple urns problem. In Section 5 we give our main result, the reduction from Bayesian mechanism design to Bayesian algorithm design.
2 Basics of Bernoulli Factories
We use the terms Bernoulli and coin to refer to distributions over and , interchangeably. The Bernoulli factory problem is about generating new coins from old ones.
Definition 2.1 (Keane and O’Brien, 1994).
Given function , the Bernoulli factory problem is to output a sample of a Bernoulli variable with bias (i.e. an coin), given blackbox access to independent samples of a Bernoulli distribution with bias (i.e. a coin).
To illustrate the Bernoulli factory model, consider the examples of
and . For the former one, it is enough to
flip the coin twice and output if both flips are , and
otherwise. For the latter one, the Bernoulli factory is still
simple but more interesting: draw from the Poisson distribution
with parameter , flip the coin times and output
if all coin flips where , and otherwise (see below).
The question of characterizing functions for which there is an algorithm for sampling coins from coins has been the main subject of interest in this literature (Keane and O’Brien, 1994; Nacu and Peres, 2005). In particular, Keane and O’Brien (1994) provides necessary and sufficient conditions for under which a Bernoulli factory exists. Moreover, Nacu and Peres (2005) suggests an algorithm for simulating an coin based on polynomial envelopes of . The canonical challenging problem of Bernoulli factories – and a primitive in the construction of more general Bernoulli factories – is the Bernoulli Doubling problem: for . See Łatuszyński (2010) for a survey on this topic.
Questions in Bernoulli factories can be generalized to multiple input coins. Given , the goal is sample from a Bernoulli with bias given sample access to independent Bernoulli variables with unknown biases . Linear functions were studied and solved by Huber (2015). For example, the special case and , a.k.a., Bernoulli Addition, can be solved by reduction to the Bernoulli Doubling problem (formalized below).
Questions in Bernoulli factories can be generalized to allow input distributions over real numbers on the unit interval (rather than Bernoullis over ). In this generalization the question is to produce a Bernoulli with bias with sample access to draws from a distribution supported on with expectation . These problems can be easily solved by reduction to the Bernoulli factory problem:

Continuous to Bernoulli: Can implement Bernoulli with bias with one sample from distribution with expectation . Algorithm:

Draw and .

Output .

Below are enumerated the important building blocks for Bernoulli factories.

Bernoulli Down Scaling: Can implement for with one sample from . Algorithm:

Draw and .

Output (i.e., if both coins are , otherwise ).


Bernoulli Doubling: Can implement for with samples from in expectation. The algorithm is complicated, see Nacu and Peres (2005).

Bernoulli Probability Generating Function: Can implement for distribution over nonnegative integers with samples from in expectation. Algorithm:

Draw and (i.e., samples).

Output (i.e., if all coins are , otherwise ).


Bernoulli Exponentiation: Can implement for and nonnegative constant with samples from in expectation. Algorithm: Apply the Bernoulli Probability Generating Function algorithm for the Poisson distribution with parameter .

Bernoulli Averaging: Can implement with one sample from or . Algorithm:

Draw , , and .

Output .


Bernoulli Addition: Can implement for with samples from and in expectation. Algorithm: Apply Bernoulli Doubling to Bernoulli Averaging.
It may seem counterintuitive that Bernoulli Doubling is much more challenging that Bernoulli Down Scaling. Notice, however, that for a coin with bias , Bernoulli Doubling with a finite number of coin flips is impossible. The doubled coin must be deterministically heads, while any finite sequence of coin flips of has nonzero probability of occuring. On the other hand a coin with probability for some small has a similar probability of each sequence but Bernoulli Doubling must sometimes output tails. Thus, Bernoulli Doubling must require a number of coin flips that goes to infinity as goes to zero.
3 The Expectations from Samples Model
The expectations from samples model is a combinatorial generalization of the Bernoulli factory problem. The goal is to select an outcome from a distribution that is a function of the expectations of a set of input distributions. These input distributions can be accessed only by sampling.
Definition 3.1.
Given function for domain , the expectations from samples problem is to output a sample from given blackbox access to independent samples from distributions supported on with expectations .
Without loss of generality, by the Continuous to Bernoulli construction of Section 2, the input random variables can be assumed to be Bernoullis and, thus, this expectations of samples model can be viewed as a generalization of the Bernoulli factory question to output spaces beyond . In this section we propose and solve two fundamental problems for the expectations of samples model. In these problems the outcomes are the a finite set of outcomes and the input distributions are Bernoulli distributions with biases .
In the first problem, biases correspond to the marginal probabilities with which each of the outcomes should be selected. The goal is to produce random from so that the probability of is exactly its marginal probability . More generally, if the biases do not sum to one, this problem is equivalently the problem of random selection with linear weights.
The second problem we solve corresponds to a regularized maximization problem, or specifically random selection from exponential weights. For this problem the baiases of the Bernoulli input distributions correspond to the weights of the outcomes. The goal is to produce a random from according to the distribution given by exponential weights, i.e., the probability of selecting from is .
3.1 Random Selection with Linear Weights
Definition 3.2 (Random Selection with Linear Weights).
The random selection with linear weights problem is to sample from the probability distribution defined by for each in with only sample access to distributions with expectations .
We solve the random selection with linear weights problem by an algorithm that we call the Bernoulli race (Algorithm 1). The algorithm repeatedly picks a coin uniformly at random and flips it. The winning coin is the first one to come up heads in this process.
Theorem 3.1.
Proof.
At each iteration, the algorithm terminates if the flipped coin outputs and iterates otherwise. Since the coin is chosen uniformly at random, the probability of termination at each iteration is . The total number of iterations (and number of samples) is therefore a geometric random variable with expectation .
The selected outcome also follows the desired distribution, as shown below.
3.2 Random Selection with Exponential Weights
Definition 3.3 (Random Selection with Exponential Weights).
For parameter , the random selection with exponential weights problem is to sample from the probability distribution defined by for each in with only sample access to distributions with expectations .
The Basic Exponential Bernoulli Race, below, samples from the exponential weights distribution. The algorithm follows the paradigm of picking one of the input distributions, exponentiating it, sampling from the exponentiated distribution, and repeating until one comes up heads. While this algorithm does not generally run in polynomial time, it is a building block for one that does.
Theorem 3.2.
Proof.
The correctness and runtime follows from the correctness and runtimes of Bernoulli Exponentiation and the Bernoulli Race. ∎
3.3 The Fast Exponential Bernoulli Race
Sampling from exponential weights is typically used as a “soft maximum” where the parameter controls how close the selected outcome is to the true maximum. For such an application, exponential dependence on in the runtime would be prohibitive. Unfortunately, when is bounded away from one, the runtime of the Basic Logistic Bernoulli Race (Algorithm 2; Theorem 3.2) is exponential in . A simple observation allows allows the resolution of this issue: the exponential weights distribution is invariant to any uniform additive shift of all weights. This section applies this idea to develop the Fast Logistic Bernoulli Race.
Observe that for any given parameter , we can easily implement a Bernoulli random variable whose bias is within an additive of . Note that, unlike the other algorithms in this section, a precise relationship between and is not required.
Lemma 3.3.
For parameter , there is an algorithm for sampling from a Bernoulli random variable with bias , where , with samples from input distributions with biases .
Proof.
The algorithm is as follows: Sample times from each of the coins, let be the empirical estimate of coin ’s bias obtained by averaging, then apply the Continuous to Bernoulli algorithm (Section 2) to map to a Bernoulli random variable.
Standard tail bounds imply that with probability at least , and therefore . ∎
Since we are interested in a fast logistic Bernoulli race as grows large, we restrict attention to . We set in the estimation of (by Lemma 3.3). This estimate will be used to boost the bias of each distribution in the input so that the maximum bias is at least . The boosting of the bias is implemented with Bernoulli Addition which, to be fast, requires the cumulative bias be bounded away from one. Thus, the probabilities are scaled down by a factor of , this scaling is subsequently counterbalanced by adjusting the parameter . The formal details are given below.
Theorem 3.4.
Proof.
The correctness and runtime follows from the correctness and runtimes of the Basic Exponential Bernoulli Race, Bernoulli Doubling, Lemma 3.3 (for estimate of ), and the facts that and that the distribution given by exponential weights is invariant to additive shifts of all weights.
A detailed analysis of the runtime follows. Since the algorithm builds a number of sampling subroutines in a hierarchy, we analyze the runtime of the algorithm and the various subroutines in a bottom up fashion. Steps 3 and 4 implement a coin with bias with runtime per sample, as per the bound of Lemma 3.3. The coin implemented in Step 5 is sampled in constant time. Observe that , and the runtime of Bernoulli Doubling implies that samples from the coins of Steps 4 and 5 suffice for sampling ; we conclude that a coin can be sampled in time . Finally, note that for , we have ; Theorem 3.2 then implies that the Basic Exponential Bernoulli Race samples at most times from the coins; we conclude the claimed runtime. ∎
4 The SingleAgent MultipleUrns Problem
We investigate incentive compatible mechanism design for the singleagent multipleurns problem. Informally, mechanism is needed to assign an agent to one of many urns. Each urn contains objects and the agent’s value for being assigned to an urn is taken in expectation over objects from the urn. The problem asks for an incentive compatible mechanism with good welfare (i.e., the value of the agent for the assigned urn).
4.1 Problem Definition and Notations
A single agent with type from type space desires an object from outcome space . The agent’s value for an outcome is a function of her type and denoted by . The agent is a riskneutral quasilinear utility maximizer with utility for randomized outcome and expected payment . There are urns. Each urn is given by a distribution over outcomes in . If the agent is assigned to urn she obtains an object from the urn’s distribution .
A mechanism can solicit the type of the agent (who may misreport if she desires). We further assume (1) the mechanism has blackbox access to evaluate for any type and outcome , (2) the mechanism has sample access to the distribution of each urn . The mechanism may draw objects from urns and evaluate the agent’s reported value for these objects, but then must ultimately assign the agent to a single urn and charge the agent a payment. The urn and payment that the agent is assigned are random variables in the mechanism’s internal randomization and randomness from the mechanisms potential samples from the urns’ distributions.
The distribution of the urn the mechanism assigns to an agent, as a function of her type , is denoted by where is the marginal probability that the agent is assigned to urn . Denote the expected value of the agent for urn by . The expected welfare of the mechanism is . The expected payment of this agent is denoted by . The agent’s utility for the outcome and payment of the mechanism is given by . Incentive compatibility is defined by the agent with type preferring her outcome and payment to that assigned to another type .
Definition 4.1.
A singleagent mechanism is incentive compatible if, for all :
(1) 
A multiagent mechanism is Bayesian Incentive Compatible (BIC) if equation (1) holds for the outcome of the mechanism in expectation of the truthful reports of the other agents.
4.2 Incentive Compatible Approximate Scheme
If the agent’s expected value for each urn is known, or equivalently mechanism designer knows the distributions for all urns rather than only sample access, this problem would be easy and admits a trivial optimal mechanism: simply select the urn maximizing the agent’s expected value according to her reported type , and charge her a payment of zero. What makes this problem interesting is that the designer is restricted to only sample the agent’s value for an urn. In this case, the following Montecarlo adaptation of the trivial mechanism is tempting: sample from each urn sufficiently many times to obtain a close estimate of with high probability (up to any desired precision ), then choose the urn maximizing and charge a payment of zero. This mechanism is not incentive compatible, as illustrated by a simple example.

Consider two urns. Urn contains only outcome , whereas two contains a mixture of outcomes and , with slightly more likely than . Now consider an agent who has (true) values , , and for outcomes , , and respectively. If this agent reports her true type, the trivial Montecarlo mechanism — instantiated with any desired finite degree of precision — assigns her urn most of the time, but assigns her urn with some nonzero probability. The agent gains by misreporting her value of outcome as , since this guarantees her preferred urn .
The above example might seem counterintuitive, since the trivial Montecarlo mechanism appears to be doing its best to maximize the agent’s utility, up to the limits of (unavoidable) sampling error. One intuitive rationalization is the following: an agent can slightly gain by procuring (by whatever means) more precise information about the distributions than that available to the mechanism, and using this information to guide her strategic misreporting of her type. This raises the following question:
Question:
Is there an incentivecompatible mechanism for the singleagent multipleurns problem which achieves welfare within of the optimal, and samples only times (in expectation) from the urns?
We resolve the above question in the affirmative. We present approximation scheme for this problem that is based on our solution to the problem of random selection with exponential weights (Section 3.2). The solution to the singleagent multipleurns problem is a main ingredient in the Bayesian mechanism that we propose in Section 5 as our blackbox reduction mechanism.
To explain the approximate scheme, we start by recalling the following standard theorem in mechanism design.
Theorem 4.1.
For outcome rule , there exists payment rule so that singleagent mechanism is incentive compatible if and only if is maximal in range, i.e., , for some cost function .
^{6} The payments that satisfy Theorem 4.1 can be easily calculated with blackbox access to outcome rule . For a singleagent problem, this payment can be calculated in two calls to the function , one on the agent’s reported type and the other on a type randomly drawn from the path between the origin and . Further discussion and details are given in Appendix A. It suffices, therefore, to identify a mechanism that samples from urns and assigns the agent to an urn that induces an outcome rule that is good for welfare, i.e., , and is maximal in range. The following theorem solves the problem.
Theorem 4.2.
There is an incentivecompatible mechanism for the singleagent multipleurns problem which achieves an additive approximation to the optimal welfare in expectation, and runs in time in expectation.
Proof.
Consider the problem of selecting a distribution over urns to optimize welfare plus (a scaling of) the Shannon entropy function, i.e., .
^{7} It is well known that the optimizer is given by exponential weights, i.e., the marginal probability of assigning the th urn is given by . In Section 3.3 we gave a polynomal time algorithm for sampling from exponential weights, specifically, the Fast Exponential Bernoulli Race (Algorithm 3). Proper choice of the parameter trades off faster runimes with increased welfare loss due to entropy term. The entropy is maximized at the uniform distribution with entropy . Thus, choosing guarantees that the welfare is within an additive of the optimal welfare . The bound of the theorem then follows from the analysis of the Fast Exponential Bernoulli Race (Theorem 3.4) with this choice of . ∎5 A Bayesian Incentive Compatible Blackbox Reduction
A central question at the interface between algorithms and economics is on the existence of blackbox reductions for mechanism design. Given blackbox access to any algorithm that maps inputs to outcomes, can a mechanism be constructed that (a) induces agents to truthfully report the inputs and (b) produces an outcome that is as good as the one produced by the algorithm? The mechanism must be computationally tractable, specifically, making no more than a polynomial number of elementary operations and blackbox calls to the algorithm.
A line of research initiated by Hartline and Lucier (2010, 2015) demonstrated that, for the welfare objective, Bayesian blackbox reductions exist. In the Bayesian setting, agents’ types are drawn from a distribution. The algorithm is assumed to obtain good welfare for types from this distribution. The constructed mechanism is an approximation scheme; For any it gives a mechanism that is Bayesian incentive compatible (Definition 4.1) and obtains a welfare that is at least an additive form the algorithms welfare. Before formalizing this problem, for further details on Bayesian mechanism design and our set of notations in this paper, which are based on those in Hartline et al. (2015), we refer the reader to Appendix B.
Definition 5.1 (BIC blackbox reduction problem).
Given blackbox oracle access to an allocation algorithm , construct an allocation algorithm that is Bayesian incentive compatible; approximately preserves welfare, i.e., any agent’s expected welfare under is at least that under less ; and runs in polynomial time in and .
In this literature, Hartline and Lucier (2010, 2015) solve the case of singledimensional agents and Hartline et al. (2011, 2015) solve the case of multidimensional agents with discrete type spaces. For the relaxation of the problem where only approximate incentive compatibility is required, Bei and Huang (2011) solve the case of multidimensional agents with discrete type space, and Hartline et al. (2011, 2015) solve the general case. These reductions are approximation schemes that are polynomial in the number of agents, the desired approximation factor, and a measure of the size of the agents’ type spaces (e.g., its dimension).
5.1 Surrogate Selection and the ReplicaSurrogate Matching
A main conclusion of the literature on Bayesian reductions for mechanism design is that the multiagent problem of reducing Bayesian mechanism design to algorithm design, itself, reduces to a singleagent problem of surrogate selection. Consider any agent in the original problem and the induced algorithm with the inputs form other agents hardcoded as random draws from their respective type distributions. The induced algorithm maps the type of this agent to a distribution over outcomes. If this distribution over outcomes is maximalinrange then there exists payments for which the induced algorithm is incentive compatible (Theorem 4.1). If not, the problem of surrogate selection is to map the type of the agent to an input to the algorithm to satisfy three properties:

The composition of surrogate selection and the induced algorithm is maximalinrange,

The composition approximately preserves welfare,

The surrogate selection preserves the type distribution.
Condition (c), a.k.a. stationarity, implies that fixing the nonmaximailityofrange of the algorithm for a particular agent does not affect the outcome for any other agents. With such an approach each agent’s incentive problem can be resolved independently from that of other agents.
Theorem 5.1 (Hartline et al., 2015).
The composition of an algorithm with a profile of surrogate selection rules satisfying conditions (a)–(c) is Bayesian incentive compatible and approximately preserves the algorithm’s welfare (the loss in welfare is the sum of the losses in welfare of each surrogate selection rule).
The surrogate selection rule of Hartline et al. (2015) is based on setting up a matching problem between random types from the distribution (replicas) and the outcomes of the algorithm on random types from the distribution (surrogates). The true type of the agent is one of the replicas, and the surrogate selection rule outputs the surrogate to which this replica is matched. This approach addresses the three properties of surrogate selection rules as (a) if the matching selected is maximalinrange then the composition of the surrogate selection rule with the induced algorithm is maximalinrange, (b) the welfare of the matching is the welfare of the reduction and the optimal matching approximates the welfare of the original algorithm, and (c) any maximal matching gives a stationary surrogate selection rule. For a detailed discussion on why maximalinrange matching will result in a BIC mechanism after composing the corresponding surrogate selection rule with the allocation algorithm, we refer the interested reader to Lemma C.1 and Lemma C.2 in Appendix C.
Definition 5.2.
The replicasurrogate matching surrogate selection rule; for a to matching algorithm , a integer market size , and load ; maps a type to a surrogate type as follows:

Pick the realagent index uniformly at random from .

Define the replica type profile , an tuple of types by setting and sampling the remaining replica types i.i.d. from the type distribution .

Sample the surrogate type profile , an tuple of i.i.d. samples from the type distribution .

Run matching algorithm on the complete bipartite graph between replicas and surrogates.

Output the surrogate that is matched to .
The value that a replica obtains for the outcome that the induced algorithm produces for a surrogate, henceforth, surrogate outcome, is a random variable. The analysis of Hartline et al. (2015) is based on the study of an ideal computational model where the value of any replica for any surrogate outcome is known exactly. In this computationallyunrealistic model and with these values as weights, the maximum weight matching algorithm can be employed in the replicasurrogate matching surrogate selection rule above, and it results in a Bayesian incentive compatible mechanism. Hartline et al. (2015) analyze the welfare of the resulting mechanism in the case where the load is , prove that conditions (a)(c) are satisfied, and give (polynomial) bounds on the size that is necessary for the expected welfare of the mechanism to be an additive from that of the algorithm.

Given a BIC allocation algorithm ~missingA
If is maximum matching, conditions (a)(c) clearly continue to hold for our generalization to load . Moreover, the welfare of the reduction is monotone nondecreasing in .
Lemma 5.2.
In the ideal computational model (where the value of a replica for being matched to a surrogate is given exactly) the perreplica welfare of the replicasurrogate maximum matching is monotone nondecreasing in load .
Proof.
Consider a nonoptimal matching that groups replicas into groups of size and finds the optimal to matching between replicas in each group and the surrogates. As these are random matchings, the expected welfare of each such matching is equal to the expected welfare of the matching. These matchings combine to give a feasible matching between the replicas and surrogates. The total expected welfare of the optimal to matching between replicas and surrogates is no less than times the expected welfare of the matching. Thus, the perreplica welfare, i.e., normalized by , is monotone in . ∎
Our main result is an approximation scheme for the ideal reduction of Hartline et al. (2015). We identify a and a polynomial time (in and ) to matching algorithm for the blackbox model and prove that the expected welfare of this matching algorithm (perreplica) is within an additive of the expected welfare perreplica of the optimal matching in the ideal model with load (as analyzed by Hartline et al., 2015). The welfare of the ideal model is monotone nondecreasing in load (Lemma 5.2); therefore it will be sufficient to identify a polynomial load where there is a polynomial time algorithm in the blackbox model that has loss relative to the ideal model for that same load .
In the remainder of this section we replace this ideal matching algorithm with an approximation scheme for the blackbox model where replica values for surrogate outcomes can only be estimated by sampling. For any our algorithm gives an additive loss of the welfare of the ideal algorithm with only a polynomial increase to the runtime. Moreover, the algorithm produces a perfect (and so maximal) matching, and therefore the surrogate selection rule is stationary; and the algorithm is maximalinrange for any replica (including the true type of the agent), and therefore the resulting mechanism is Bayesian incentive compatible.
5.2 Entropy Regularized Matching
In this section we define an entropy regularized bipartite matching problem and discuss its solution. We will refer to the lefthandside vertices as replicas and the righthandside vertices as surrogates. The weights on the edge between replica and surrogate will be denoted by . In our application to the replicasurrogate matching defined in the previous section, the weights will be set to for .
Definition 5.3.
For weights , the entropy regularized matching program for parameter is:
s.t. The optimal value of this program is denoted .
The dual variables for righthandside constraints of the matching polytope can be interpreted as prices for the surrogate outcomes. Given prices, the utility of a replica for a surrogate outcome given prices is the difference between the replica’s value and the price. The following lemma shows that for the right choice of dual variables, the maximizer of the entropy regularized matching program is given by exponential weights with weights equal to the utilities.
Observation 1.
For the optimal Lagrangian dual variables for surrogate feasibility in the entropy regularized matching program (Definition 5.3), namely,
where is the Lagrangian objective function; the optimal solution to the primal is given by exponential weights: Observation 1 recasts the entropy regularized matching as, for each replica, sampling from the distribution of exponential weights. For any replica and fixed dual variables our Fast Exponential Bernoulli Race (Algorithm 3) gives a polynomial time algorithm for sampling from the distribution of exponential weights in the expectations from samples computational model.
Lemma 5.3.
For replica and any prices (dual variables) , allocating a surrogate drawn from the exponential weights distribution
(2) is maximalinrange, as defined in Definition 4.1, and this random surrogate can be sampled with samples from replicasurrogateoutcome value distributions.
Proof.
To see that the distribution is maximalinrange when assigning surrogate outcome to replica , consider the regularized welfare maximization
for replica . Similar to Observation 1, firstorder conditions imply that the exponential weights distribution in (2) is the unique maximizer of this concave program.
To apply the Fast Exponential Bernoulli Race to the utilities, of the form , we must first normalize them to be on the interval . This normalization is accomplished by adding to the utilities (which has no effect on the exponential weights distribution, and therefore preserves maximalityinrange), and then scaling by . The scaling needs to be corrected by setting in the Fast Exponential Bernoulli Race (Algorithm 3) to . The expected number of samples from the value distributions that are required by the algorithm, per Theorem 3.4, is .
∎
If we knew the optimal Lagrangian variables from Observation 1, it would be sufficient to define the surrogate selection rule by simply sampling from the exponential weights distribution (which is polynomial time per Lemma 5.3) that corresponds to the agent’s true type (indexed ). Notice that the wrong values of correspond to violating primal constraints (for the surrogates) and thus the outcome from sampling from exponential weights for such would not correspond to a maximalinrange matching. In the next section we give a polynomial time approximation scheme that is maximalinrange for each replica and approximates sampling from the correct .
5.3 Online Entropy Regularized Matching
In this section, we reduce the entropy regularized matching problem to the problem of sampling from exponential weights (as described in Lemma 5.3) via an online algorithm. Consider replicas being drawn adversarially, but in a random order, over times . The basic observation is that approximate dual variables are sufficient for an online assignment of each replica to a surrogate via Lemma 5.3 to approximate the optimal (offline) regularized matching. Recall, the replicas are independently and identically distributed in the original problem.
Our construction borrows techniques used in designing online algorithms for stochastic online convex programming problems (Agrawal and Devanur, 2015; Chen and Wang, 2013), and stochastic online packing problems (Agrawal et al., 2009; Devanur et al., 2011; Badanidiyuru et al., 2013; Kesselheim et al., 2014). Our online algorithm (Algorithm 4, below) considers the replicas in order, updates the dual variables using multiplicative weight updates based on the current allocation, and allocates to each agent by sampling from the exponential weights distribution as given by Lemma 5.3. The algorithm is parameterized by , the scale of the regularizer; by , the rate at which the algorithm learns the dual variables ; and by scale parameter , which we set later.
The algorithm needs to satisfy four properties to be useful in a polynomial time reduction. First, it needs to produce a perfect matching so that the replicasurrogate matching surrogate selection rule is stationary, specifically via condition (c). Second, it needs to be maximalinrange for the real agent (replica ). In fact, all replicas are treated symmetrically and allocated by sampling from an exponential weights distribution that is maximalinrange via Lemma 5.3. Third, it needs to have good welfare compared to the ideal matching. Fourth, its runtime needs to be polynomial. The first two properties are immediate and imply the theorem below. The last two properties are analyzed below.
Theorem 5.4.
The mechanism that maps types to surrogates via the replicasurrogate matching surrogate selection rule with the online entropy regularized matching algorithm (with payments from Theorem 4.1) is Bayesian incentive compatible.
5.4 Social Welfare Loss
We analyze the welfare loss of the online entropy regularized matching algorithm (Algorithm 4) with regularizer parameter , learning rate , and scale parameter set as a fraction of an estimate of the value of the offline program (Definition 5.3).
Theorem 5.5.
There are parameter settings for online entropy regularized matching algorithm (Algorithm 4) for which (1) its perreplica expected welfare is within an additive of the welfare of the optimal replica surrogate matching, and (2) given oracle access to , the running time of this algorithm is polynomial in and .
To prove this theorem, we first argue how to set to be a constant approximation to the fraction of optimal value of the convex program with high probability, and with efficient sampling. Second, we argue that the online and offline optimal entropy regularized matching algorithms have nearly the same welfare. Finally, we argue that the offline optimal entropy regularized matching has nearly the welfare of the offline optimal matching. The proof of the theorem is then given by combining these results with the right parameters.
Parameter and approximating the offline optimal.
Presetting to be an estimate of the optimal objective of the convex program in Definition 5.3 is necessary for the competitive ratio guarantee of Algorithm 4. Also, should be set in a symmetric and incentive compatible way across replicas, to preserve stationarity property. To this end, we look at an instance generated by an independent random draw of replicas (while fixing the surrogates). In such an instance, we estimate the expected values by sampling and taking the empirical mean for each edge in the replicasurrogate bipartite graph. We then solve the convex program exactly (which can be done in polynomial time using an efficient separation oracle). Obviously, this scheme is incentive compatible as we do not even use the reported type of true agent in our calculation for , and it is symmetric across replicas. In Appendix D we show how this approach leads to a constant approximation to the optimal value of the offline program in Definition 5.3 with high probability.
Lemma 5.6.
If , then there exist a polynomial time approximation scheme to calculate (i.e. it only needs polynomial in , , and samples to blackbox allocation ) such that
with probability at least .
Competitive ratio of the online entropy regularized matching algorithm.
Assuming is set to be a constant approximation to the fraction of the optimal value of the offline entropy regularized matching program, we prove the following lemma.
Lemma 5.7.
For a fixed regularizer parameter , learning rate , regularized welfare estimate , and market size that satisfy
Proof.
Recall that denotes the optimal objective value of the entropy regularized matching program. We will analyze the algorithm up to the iteration that the first surrogate becomes unavailable (because all copies are matched to previous replicas).
Define the contribution of replica to the Lagrangian objective of Observation 1 for allocation and dual variables as
(3) The difference between the outcome for replica in the online algorithm and the solution to the offline optimization is that the algorithm selects the outcome with respect to dual variables while the offline algorithm selects the outcome with respect to the optimal dual variables (Observation 1). Denote the outcome of the online algorithm by
and its contribution to the objective by
Likewise for the outcome of the offline optimization by and . Denote by the indicator vector for the online algorithm sampling from .
Optimality of for dual variables in equation (3) implies
so, by rearranging the terms and taking expectations conditioned on the observed history, we have
where
By summing the above inequalities for we have:
(4) In order to bound the term , let . Then, by applying the regret bound of exponential gradient (or essentially multiplicative weight update) online learning algorithm for any realization of random variables (which will result in to be the exponential weights distributions with weights ), we have
(5) where the last inequality holds because at the time , either there exists such that , or and all surrogate outcome budgets are exhausted. In the former case, we have
and in the latter case we have
Combining (4) and (5), letting , and assuming for , we have:
(6) where the last inequality holds simply because . By taking expectations from both sides, we have
