Optimal Mechanisms for Selling Information
Abstract
The buying and selling of information is taking place at a scale unprecedented in the history of commerce, thanks to the formation of online marketplaces for user data. Data providing agencies sell user information to advertisers to allow them to match ads to viewers more effectively. In this paper we study the design of optimal mechanisms for a monopolistic data provider to sell information to a buyer, in a model where both parties have (possibly correlated) private signals about a state of the world, and the buyer uses information learned from the seller, along with his own signal, to choose an action (e.g., displaying an ad) whose payoff depends on the state of the world.
We provide sufficient conditions under which there is a simple oneround protocol (i.e. a protocol where the buyer and seller each sends a single message, and there is a single money transfer) achieving optimal revenue. In these cases we present a polynomialtime algorithm that computes the optimal mechanism. Intriguingly, we show that multiple rounds of partial information disclosure (interleaved by payment to the seller) are sometimes necessary to achieve optimal revenue if the buyer is allowed to abort his interaction with the seller prematurely. We also prove some negative results about the inability of simple mechanisms for selling information to approximate more complicated ones in the worst case.
1 Introduction
A growing trend in online advertisement is the usage of behavioral targeting and user information (like demographics) to better match advertisements to the viewer. This is possible due to the existence of data providing agencies, like Bluekai, Bluecava, eXelate Media, Clearsprings and RapLeaf, whose business consists in collecting, curating and selling information about user intent to advertisers. An article in NYT [8] analyzes this phenomenon and points out that data agencies are not exclusively an Internet phenomenon. For example, for many years companies like Acxiom and Experian (founded in 1969 and 1980 respectively) have been collecting information about consumer habits and selling this information to marketers, who then can use it to send catalogs by mail.
A concrete situation is as follows: an advertiser has multiple different ads that he can present to the viewer, and the effectiveness of each of them depends on both the ad and the characteristics of the viewer. For example, a car maker would rather show sport car advertisement to affluent young bachelors while showing ads for family cars to older viewers with kids. A data providing agency (the seller), might have some information about the viewer generating the impression, like gender, age and past interaction on that site. Such information could be valuable to the advertiser as he would be able to use it for better targeting, and the monopolist seller would like to extract as much as possible out of this value as revenue. The advertiser (buyer) might have some information about the viewer as well, and this information might possibly be correlated with the seller’s information. The seller has uncertainty about the buyer’s information or utilities, yet possesses some belief about those.
While selling information about viewers raises obvious privacy questions, it also raises fascinating questions of a purely economic nature. How does one quantify the value of this information? What is the optimal (i.e. revenuemaximizing) selling strategy for information? What are the qualitative differences between selling information and selling physical goods and services? How do these differences influence the design of markets for information, and the algorithmic problems underlying such markets?
To highlight the issues inherent in such questions, it is helpful to highlight some differences between a seller offering distinct goods for sale and a seller offering bits of information.

A seller of goods can group them into bundles, offering a subset of the goods at a specified price. A seller of bits can do many other things: for example, she
^{1} can set a specified price for revealing the Boolean XOR of the first two bits or some more complex function of the bits. 
A consumer of goods generally knows their value even before they are allocated. The value of a piece of information is typically not known until the information is revealed.

By the taxation principle, a buyer of goods can be assumed, without loss of generality, to be facing a postedprice for each bundle (that is independent of his type). A seller of information may, in some cases, be able to extract strictly more revenue using an interactive protocol
^{2} rather than posted pricing. (See Section 5.)
To be sure, there are some cases in which trading goods possesses some of the characteristics of trading information noted above. For example, a customer in a restaurant does not necessarily know the quality of the food he is about to consume; in turn, this can lead to sellers using interactive protocols, for example, allowing the restaurant customer to try a limited sample of food for a reduced price (or even for free) before deciding whether to order more. We interpret such situations as markets in which information and goods are coupled together, i.e. revealing the quality of the food occurs in conjunction with selling the food itself.
This paper addresses some of these questions
raised above, by situating them in a model that eliminates
extraneous features — such as coupling of goods and
information, or competition between
multiple buyers and sellers of information — while
attempting to remain quite general in the model’s
assumptions about information and its utility.
Our only such assumption is that the
utility of information lies in guiding future
actions of the party receiving the information.
Thus, in our model there is a single seller and
single buyer. A state of the world (denoted by
henceforth) is known to the seller
but not the buyer
Crucially, we assume that the seller designs the protocol and can be trusted to faithfully follow the protocol he designs. The buyer, on the other hand, need not be honest: he may send signals that are inconsistent with his true payoff type if it is rational to do so. We do, however, distinguish between committed buyers — who can be committed to complete the specified protocol even if they are sending dishonest signals — and uncommitted buyers, who may abort the protocol if it is rational to do so, for example when they have received information and not yet paid for it.
Our results. The set of all interactive protocols is a large and illstructured space. Searching for the revenuemaximizing one is unfathomably complex unless there is a way to limit the search space. Our first set of results provide the tools necessary for that. In mechanism design this is often done by invoking some form of the revelation principle, due to Gibbard [13] and Myerson [16]. In their setting, buyers have private types and the seller (mechanism designer) needs to choose among a set of outcomes and can charge payments from the buyers. The revelation principle states that if a certain outcome and payments can be implemented in equilibrium of a possibly complicated and interactive mechanism, then it can be implemented in a simple direct revelation mechanism, where buyers report their types, and the mechanism chooses an outcome and payments. Moreover, this mechanism has a simple equilibrium where each buyer reports his type truthfully. If the outcome is the allocation of traditional goods, the revelation principle implies that the mechanism can be implemented as a protocol consisting of three steps: (i) buyers report their type, (ii) payments take place, (iii) outcome is determined. We say that such a mechanism has the oneround revelation property, a property which we define precisely later (in Definition 2.7), but intuitively it means that the buyers move only once (by declaring their type), payments happen only once and the sellers move only once (by choosing the outcome).
Now, consider the case where instead of an allocation of traditional goods the outcome is the disclosure of information. Myerson’s revelation principle still holds in the sense that any outcome can be implemented by a mechanism where buyers truthfully report their type in the first step. It is not clear, however, if the stronger property of oneround revelation still holds. After the buyer report his type, a sequence of payments and partial information disclosures might be required in order to implement a certain outcome.
Our first set of results (Theorems 3.1 and 4.1) provide conditions under which the oneround revelation property holds. Their precise theorem statement, to be given in Sections 3 and 4, are a bit stronger: they supplement the revelation principle with additional information about the relative timing of signals and payments.
Theorem 1.1
When buyers are committed, or when buyers are uncommitted but and are independent random variables, any mechanism can be transformed into a mechanism that extracts the same revenue and has the following form: the buyer and seller each sends a single message, payment takes place only once, the buyer’s message is simply an announcement of his type, and truthful reporting maximizes the buyer’s utility.
Interestingly, the revelation principle fails in the remaining case, when there are uncommitted buyers and correlated signals. The usual logic justifying the revelation principle — that the agents can always report their types to the mechanism and let it simulate their optimal strategy given their type — does not apply for a subtle reason having to do with the timing of payments, the correlation of the signals, and the fact that the buyer is uncommitted. The direct mechanism that attempts to simulate an interactive protocol cannot determine an unbiased estimate of the buyer’s expected payment before observing , because, unlike in the independent case, the conditional distribution of depends on the value of (the buyer’s true type) and not necessarily on the type that is reported. On the other hand, if the mechanism simulates the protocol using the realization of and posts a price that depends on the simulation outcome, this fails because the buyers are uncommitted: the price reveals information about , and the buyers may take this information for free while refusing to pay.
Our next results concern algorithms for computing the optimal mechanism. Even when the oneround revelation property holds, it is far from obvious how to compute the optimal mechanism efficiently. In a oneround revelation mechanism, the seller allocates information to the buyer by revealing a (possibly random) signal sampled from a distribution that depends on and . The main difficulty is that seller is free to choose the support size of this distribution (i.e., the number of potential signals) and in principle, this leads to an optimization problem of unbounded dimensionality. Nevertheless, we show that the optimal mechanism can be computed in polynomial time by solving a convex program of bounded dimensionality; a byproduct of the proof is an explicit upper bound on the number of potential signals.
Theorem 1.2
Suppose that can take only possible values and can take only possible values. When buyers are committed, or when buyers are uncommitted but and are independent random variables, there is an algorithm that computes the optimal mechanism in time . Furthermore, there is an optimal mechanism in which every signal transmitted from the seller to a buyer is sampled from a set of size .
In Theorem 4.2, we prove an analogue of the result of Cremer and McLean [10] on optimal auctions with correlated bids. We show that when the correlation of and is complex enough that a certain matrix has full rank, the optimal mechanism extracts the full surplus. However, as in [10], when this matrix is illconditioned the optimal mechanism can be quite exotic, using a mixture of unboundedly large positive and negative payments. This raises the following question: To what extent can its revenue be approximated by simpler and more natural mechanisms? We explore this question in Section 4.2 by investigating the relative power of four progressively more general types of mechanisms:

a “sealed envelope” mechanism that treats as an indivisible good by writing its value inside a sealed envelope and posts a price for the the envelope;

mechanisms that reveal a signal about but charge the buyer for this signal before revealing it;

mechanisms that reveal a signal about and then charge the buyer a nonnegative amount that depends on the signal;

arbitrary mechanisms.
It is not hard to show that if one compares the optimal mechanisms from two of these four classes, their revenue never differs by more a factor of more than , the number of potential buyer types. Section 4.2 shows that this multiplicative gap is tight up to a constant factor: for any two of the aforementioned classes of mechanisms, one can find examples where mechanisms in the more general class obtain times as much revenue as the optimal mechanism in the more specific class.
Our work leaves many interesting open problems, we discuss these problems in Section 6 and point out potential connections to other areas in computer science (as cryptographic) and in economics (as cheap talk and dynamic mechanism design).
Related work. The concept of information occupies a notable position in Auction Theory. In the classic work of Milgrom and Weber [15], the authors consider different auction formats for a single item and discuss how revenue changes as the seller reveals information (but not directly charging for it) regarding the quality of the good. Persico [18] remarks that the information structure is almost always assumed to be exogenous and out of the control of the mechanism designer. He initiates a line of inquire that proposed to endogenize the information acquisition process in the auction. The information acquisition in his model occurs by a competing firm paying some fixed exogenous cost (say by performing R&D).
It is interesting to consider the qualitative changes when information moves from an auxiliary device to the position of the central object being sold. It was noted by many authors that classic results in economics that were designed for dealing with traditional goods fail when selling information. Varian [21] presents such discussion in a much broader context. Although expressing some similar concerns on the relation with traditional goods, his definition of information is very different from ours.
Closer to our model are the work of Esö and Szentes [12] and Hörner and Skrzypacz [14]. In Esö and Szentes [12] the authors develop a model of consulting, in which the consultant, sells information about a certain random variable to a client. The assumptions on the nature on the random variables are orthogonal to ours: theirs are continuous random variables on while ours are discrete random variables. Also, while our utility function is generic, their utility function is linear. The mechanism obtained resembles our Pricing Outcomes mechanism, where the payment of the client depends on the action he takes (in their paper, whether the client decided to undertake a project or not). Given that their utility function is simpler, the structure of the disclosed information is also a lot simpler.
The paper of Hörner and Skrzypacz [14] is close to our model of uncommitted buyers. Their main goal is to design selfenforcing contracts for an agent to sell a binary HighLow signal about the state of the world to a firm. The authors remark: “Lack of commitment creates a holdup problem: since the Agent is selling information:, once the Firm learns it, it has no reason to pay for it”. The model again is orthogonal to ours, since the type of the Firm (information buyer) is known  so there is no uncertainty from the sellers perspective. On the other hand, no side has the ability to commit. This ability in our model leads us to a mechanism design approach to the problem, while the authors analyze the set of equilibria of a game very much in the spirit of the cheap talk literature [5].
Our model is also related the treatment of information in Athey and Levin [3]. The authors model a decision maker who chooses an action and gets a reward depending on both the action chosen and an unknown state of the world from which the decision maker gets an imperfect signal. They assume a total ordering on the space of states of the world and on the space of signals and restrict their attention to monotone decision processes. We, on the other hand, do not make any structural assumptions about the set of possible states of the world and set of possible signals (except being finite) and also do not assume anything about the decision problem. Our approach is also different: while the authors in [3] derive comparative statics for the demand of information, we take a mechanism design approach.
Also related to our paper is the work of Admati and Pfleiderer [1, 2]. Motivated by financial advice in the market for securities, the authors analyze the following setting: a monopolist seller has information about how much a share of a risky asset pays off, which is distributed according to a normal random variable. They consider a two step process: in the first round, the seller is able to sell this information to a continuum of traders using some mechanism. In the second round, the buyers (traders) trade in a speculative market. The authors consider the problem of how to design mechanisms to maximize revenue. Although of a similar spirit, the model of Admati and Pfleiderer is orthogonal to ours. Their model considers multiple buyers that trade in the same market in the second round, instead of simply making a decision. On the other hand, many features of our model are absent in theirs: they do not consider uncertainty of the seller about the buyer’s utility function and the fact that the buyers might already have some private information correlated with the seller’s information. Moreover, they also limit the seller’s power to transform the signal. In their model, the only transformation the seller is able to perform on the signal is to add normally distributed noise to it.
Our notion of uncommitted buyers is in the spirit of participation constraints used in the literature on dynamic mechanism design [7, 4]. This literature considers multiround interaction between a mechanism and the agents, and for such mechanisms it is required that individual participation constraints hold in every period. In parallel, when considering uncommitted buyers we require that participation is voluntary at every point the buyer plays, that is, that the buyer does not defect throughout the protocol. Note that unlike that literature, our buyer only gets one exogenous signal, so the different “periods” are not points of getting new exogenous information, but rather points where he takes an action in a multiround protocol with the seller.
2 Setting: Context and Protocols
We consider a setting with a buyer and a monopolist seller. The buyer is a decision maker
and his decision can be represented as picking an action .
His reward from this action depends on the state of the world which is unobservable.
However, both buyer and seller get private signals about the state of the world.
Let be the private signal of the buyer and be the private signal of the seller.
In this paper we consider the case that both and are finite.
The buyer’s expected reward for
taking action when the two signals are realized to is given by . The pair of private signals comes from a joint distribution .
To illustrate the model, suppose the buyer is an Internet advertiser who has acquired one displayad slot and is deciding which ad to show to the user. So, the set represents the possible ads he can place on this slot. The effectiveness of each ad depends on who the user is exactly. This is unknown to the buyer, but he has a private signal , which is the user browsing history in the website. Consider now the seller as a data provider who has information about age, gender, geographic location and income range of the user. Let this information be encapsulated in a signal .
Another interpretation is to consider as the type of the buyer. Since the reward of the buyer is a function of , one can use the same model to express the seller’s uncertainty about the buyer’s reward function.
The information that the seller holds is valued by the buyer. If the buyer observe his signal and nothing more, his expected reward is , where the expectation is taken over sampled from . If he also learns the value of exactly, his expected reward increases to . His surplus from knowing is thus:
Ideally, the seller would like to extract this extra surplus as revenue, but as she faces uncertainty regarding the buyer (she does not know ) and since the buyer act strategically, generically the seller would not be able to extract all that surplus. The central question of this paper is, “What mechanisms can the seller use in order to extract the largest possible fraction of this surplus?”
2.1 Sealed Envelope Mechanism
Before we start exploring the space of all possible mechanisms, we present a very simple (and usually suboptimal) mechanism — the Sealed Envelope Mechanism. We do so in order to highlight some basic difficulties in designing mechanisms for selling information. In this mechanism the seller treats the information as if it were a regular good. She writes on a piece of paper, puts it inside an envelope and then offers the envelope for a fixed price to the buyer. If the buyer’s type is , then his value for the envelope equals to , his surplus from knowing , so the revenue is . Note that the seller is not using her knowledge of in determining . It is easy to optimize to obtain the best Sealed Envelope Mechanism.
Notice that, after seeing , the seller can update her belief about . The seller might try to change the mechanism in the following way: enclose in an envelope and sell it for price maximizing . By doing so, the seller leaks information in the prices. Upon observing price , the buyer gains information about even without buying the envelope.
2.2 Generic Interactive Protocol
Our first step is to define a generic interactive protocol between the buyer and the seller. We assume that the protocol is designed by the seller with only knowledge of the context . The protocol prescribes the behavior of the seller for each and we assume that the seller always follows this prescription. All of this is known to the buyer.
After this we state Myerson’s revelation principle for this setting and then we formally define the stronger notion of oneround revelation. Then we discuss to what extend it is possible to obtain oneround revelation mechanisms for this setting.
Definition 2.1 (Generic interactive protocol)
A generic interactive protocol for a context is a finite decision tree defined on a set of nodes . For each nonleaf node, let be the children of node . Each nonleaf node is labeled either as a sellernode, as buyernode or a transfernode. Furthermore:

each seller node has a prescription of the seller behavior, which associates for each a probability distribution over . Formally, the prescription on node is a collection of distributions , one for each .

each transfer node has only one child and has associated with it a fixed (possibly negative) amount .
In practice we can think of each edge as labeled with a different message. Starting from the root, if the seller or buyer moves, she or he sends a message, which is represented by moving down the tree (picking a child). As the seller pledges to follow the protocol, her behavior (distribution of messages she sends conditional on ) is encoded in seller nodes in advance. On the other hand, the buyer strategically decides on the messages he sends at buyer nodes given the protocol tree defined by the seller. Moving down the tree from a transfer node to its child represents a money transfer; the value designates how much is transferred from buyer to seller, so a negative value represents a payment to the buyer. No action is taken at leaf nodes. In each such a node the buyer will update his belief about based on the protocol history. This belief is called the posterior probability distribution.
We consider two types of strategies for the buyer: committed and uncommitted strategies. A committed strategy is one where we can trust the buyer to follow the entire protocol, i.e., when reaching a buyer node, he sends one of the messages specified by the protocol and when reaching a transfer node, he sends or receives the specified amount of money. An uncommitted strategy is one where the buyer has, on top of that, the option of defecting from the protocol in each node, by simply leaving the mechanism (which is formally captured by allowing him to play ””). More formally:
Definition 2.2 (Buyer strategies)
A committed strategy is a collection of distributions for each buyernode or transfernode and each type .
An uncommitted strategy is a collection of distributions for each buyernode or transfernode and each type .
At this point, it is instructive to represent the Sealed Envelope Mechanism in the form of a generic protocol. It is a tree consisting of three interior nodes: in the root there is a buyernode with two children corresponding to the messages: “Do not accept the offer” and “Accept the offer”. The child corresponding to “Do not accept the offer” is a leaf. The other child is a transfernode with the specified amount of . Its only child is a sellernode. This seller node has a child for each and the seller prescription for this node is simply . See Figure 1 for an illustration.
Two other natural selling strategies are important in this work. Pricing Mappings refers to any postedprice mechanism in which the seller presents a menu of offers each having the following form: for a fixed amount of money, the buyer obtains the right to observe a random signal sampled by the seller from a distribution that depends on the value of in a prespecified way. Pricing Outcomes refers to a similar type of postedprice mechanism with one crucial difference: rather than charging the buyer a fixed amount of money after he selects an offer from the menu, the amount that the buyer pays (or receives) is allowed to depend on the signal that is revealed by the seller. This gives the seller the potential to pricediscriminate among buyers whose different types lead to their having different beliefs about the conditional distribution of , and therefore different assessments about the expected cost of accepting a given offer. Mechanisms that price mappings or price outputs can easily be represented in the form of generic interactive protocols, as illustrated in Figure 1.
It is important to notice that the seller designs the protocol solely based on the context, and after she designs the protocol its description becomes common knowledge. This happens before the pair is drawn. For example, the price at which the item is offered in the Sealed Envelope Mechanism is hardcoded in the protocol, so there is no need for the seller to send a message announcing it.
Now we define the utility associated with a committed strategy for a given protocol. Each committed strategy induces a distribution over the leaves of the tree: sample , then start in the root and use and to move down the tree until a leaf is reached. Let be the leaf reached. For each leaf associate the sum of the amounts of the transfernodes in the path between and the root. We define the utility of buyer of type for as:
We say that a protocol is voluntary is there is a committed strategy such that
This means that the expected utility the buyer gets from participating in the protocol is at least as large as the utility he would get by not participating in it. A committed strategy is called optimal if for all and for all alternative committed strategies , . The revenue extracted by this protocol is: .
We can define similar concepts for uncommitted strategies: for a given node , let be the sum of the amounts in the transfer nodes in the path between the root and (not including ). An uncommitted strategy defines a distribution over the nodes of the tree (not necessarily leaves) and we can therefore define , optimal strategy and revenue in the exact same way. Notice that every protocol is trivially voluntary for uncommitted buyers, since there is always a strategy guaranteeing the buyer , which is the strategy that defects at any transfer node.
Definition 2.3
We say that it is possible to extract revenue from a committed (uncommitted) buyer in a context if there is a voluntary protocol for this context and an optimal committed (uncommitted) strategy for this protocol with revenue at least .
Notice that in the case of committed buyers ’voluntary’ is an important restriction, because otherwise, one could simply have a mechanism consisting solely of a transfer node of amount and a leaf. For uncommitted buyers, however, the only optimal strategy in such a mechanisms would be to defect in the root.
Definition 2.4
We define the optimal revenue that can be extracted from a committed (uncommitted) buyer in a context to be the maximum such that for any it is possible to extract revenue from a committed (uncommitted) buyer in a context .
2.3 Revelation Principle and OneRound Revelation Mechanisms
First we define the concept of a revelation mechanism, in the sense of Gibbard [13] and Myerson [16] and recast their celebrated revelation principle in our setting (its proof is included, for completeness, in Appendix A). For our purposes it would be sufficient to formulate it in terms of revenue (traditional formulations are somewhat more general).
Definition 2.5 (Revelation Mechanism)
A revelation mechanism is a protocol represented by a tree where the root is buyer node, where the buyer is asked to report his type. Moreover, there are no other buyer nodes in the tree. A strategy in such a protocol is truthful if the buyer reports his type truthfully in the root.
Theorem 2.6 (Revelation Principle)
Consider any context . If it is possible to extract revenue from a committed (uncommitted) buyer in the context , then there is a revelation mechanism and a committed (uncommitted) truthful strategy that is optimal for it and produces revenue .
Theorem 2.6 says that we can restrict our attention to Revelation Mechanisms. However, this still allows trees with arbitrary depths and complicated arrangements of seller and transfer nodes. Next we define the stronger notion of Oneround revelation mechanisms, where only a very simple interaction between the seller and the buyer is allowed:
Definition 2.7 (Oneround Revelation Mechanism)
A oneround revelation mechanism is a revelation mechanism represented by a tree of depth three where each path from the root has at most one vertex of each type (i.e. at most one seller node and at most one transfer node).
Now we present formal definitions of two special types of Oneround Revelation Mechanisms which were briefly discussed earlier in this section:
Definition 2.8 (Pricing Mappings and Pricing Outcomes)
A Pricing Mappings Mechanism is a truthful direct revelation mechanism in which all the children of the root are transfer nodes and all their children are seller nodes. A Pricing Outcomes Mechanism is a truthful direct revelation mechanism in which all the children of the root are seller nodes and all their children are transfer nodes.
3 Independent signals
In this section we analyze the case when and are independent, which is simpler than the general case. In this case, the seller’s belief about and the buyer’s belief about are common knowledge. First we prove that for this setting, we can focus on Oneround Revelation Mechanism when searching for the optimal mechanism. Then we show how to compute it efficiently using a convex program. Finally we show that there always exists a protocol with a fairly small tree.
Theorem 3.1 (Existence of a OneRound Optimal Mechanism)
Consider any context such that and are independent. If it is possible to extract revenue from a committed buyer in the context , then it is possible to extract revenue from an uncommitted buyer in the same context using a Pricing Mappings Mechanism.
The proof is presented in Appendix B.
A consequence of Theorem 3.1 is that with
independent signals the fact that the
buyer is committed does not help the seller to extract more revenue.
In this setting it is possible to extract revenue from uncommitted buyers if
and only if it is possible to extract revenue from committed
buyers.
Pricing Mappings Mechanism.
Theorem 3.1 allow us to focus on
Pricing Mappings
Mechanisms.
An alternative way of describing a Pricing Mappings Mechanism is as a
fixed menu of contracts
.
The contract is intended for a buyer of type (in
the sense that it would be optimal for such a buyer to choose that contract out
of the menu).
A buyer choosing the contract would pay and
observe one realization of the random variable that is
correlated with , taking values in a finite set .
We call the elements of signals, since they reveal
to the buyer some information about .
By buying the contract a buyer of type gets
utility
To be more precise, is a random variable that is produced by the seller using and possibly some random bits that are independent of . Without loss of generality we represent by a family for . In order to sample , the seller observes and then samples the value of according to .
Without loss of generality, we can call the favorite contract of a buyer of type . In order for such set of contracts to be valid we need to make sure that: (1) the protocol is voluntary, i.e., the utility of a buyer of type by taking contract is at least as high as his utility of not participating in the mechanism and acting using his belief given only , and (2) contract is indeed his favorite one, i.e., he would not strictly prefer to misreport his type and buy a contract for some . Property (1) ensures individual rationality (IR) and property (2) ensures incentive compatibility (IC). Formally:
Definition 3.2
A menu of contracts is valid if and only if:
Given a valid menu of contracts, its associated revenue is given by . This definition implicitly assumes that whenever the buyer of type is indifferent between contract and not buying anything, i.e. the IR constraint is tight, then he buys contract . It also assumes that whenever he is indifferent between and , he buys . This assumption is without loss of generality, since given any menu of contracts with revenue , for every it is possible to produce a menu with revenue such that all IR and IC inequalities hold strictly. We defer the formal proof of this fact to Lemma C.3.
Our goal is, for any given context, to design the valid menu of contracts with largest possible associated revenue. We call it the optimal menu.
Before starting to optimize the menu, consider a couple of definitions: if is a variable taking values in a space , then for each , the posterior associated with is the distribution such that . We define the value of buyer of type for posterior as:
which is a piecewiselinear convex function . Usually is defined for , but sometimes it is used for vectors such that . The reader should note, however that it is a homogeneous function: .
We next show that we can represent a signal by a distribution over a finite set of posteriors (Observation 3.4) without repetition (Observation 3.3). The proofs of the observations are immediate, but we include in Appendix B for completeness.
Observation 3.3
Given a menu , if there are such that the posteriors associated with them are equal, consider the menu obtained by substituting by , where: for and otherwise. The obtained menu is also valid and the revenue associated with it is the same as the one of the original menu.
Given that the posteriors associated with each signal are different, we can represent by a distribution over a finite set of posteriors, i.e., a set of posteriors, each being of the form , and a probability of each posterior . The condition that a distribution over posteriors represents a random variable correlated with is in the following observation, whose proof is immediate.
Observation 3.4
Let be a random variable correlated with , i.e., given , , is sampled by first sampling and then sampling from . Consider a set and . The pair represent a distribution over posteriors of a random variable correlated with if and only if:
From this point on, we represent each as a function with finite support, satisfying Equation for each .
For any finite set of posteriors , we can formulate a restricted revenue maximization problem — for mechanisms that offer a menu of contracts with posteriors restricted to belong to — as below with variables for each , and for each . Recall that is the prior on , i.e., .
The constraints in correspond to the characterization of valid contracts in Definition 3.2 and the feasibility of the representation of contracts as distributions over posteriors in Observation 3.4.
To work with a linear program of finite size, we have restricted to belong to a finite set . It is conceptually easier to think of as ranging over the entire set , in which case would represent the revenue maximization problem in full generality. The following lemma, whose proof is in the appendix, shows that the restriction to a finite set of posteriors is without loss of generality. There is a finite set that can be precomputed from knowledge of the function alone, such that solving with is guaranteed to produce an optimal menu of contracts.
Lemma 3.5 (Interesting Posteriors)
Given , there is a finite set such that for all , the maximum revenue that can be extracted by any protocol can also be extracted by a protocol that is limited to use posteriors in . Moreover, all the elements can be represented with polynomially many bits.
By passing to the dual of , we get an LP with variables. Below, we give a separation oracle for the dual showing that it can be solved in polynomial time. The variables of the dual are and :
Separation oracle: The second family of constraints is of size , so separating it is trivial. In order to separate the first family we rewrite the constraints in a different way. Notice that the is the maximum over linear functions. So, we can substitute each constraint of the first family for the following constraints:
for all , and . Now, for fixed we want to check if this constraint is satisfied by all or find one for which the constraint is violated.
Relaxing the requirement to , this is equivalent to solving the following convex programming problem:
Lemma 3.6
The convex programming problem above can be solved exactly in polynomial time. Given an optimal solution , there must exist another optimal solution that belongs to , and we can find such a in polynomial time.
The proof (given in the appendix) assumes that the set is finite and polynomially bounded. But even if has exponential size, if one is able to solve the problem for any posterior , we can still solve the problem.
Comparison with Sealed Envelope. The sealed envelope mechanism presented in Section 2.1 can extract at least revenue of the optimal mechanism, by the following simple observation: if a there is a voluntary protocol that extracts revenue from a certain context , then there is at least one for which , where is the maximum surplus that can be extracted from a buyer of type . By setting the price of the envelope to for some tiny , the mechanism guarantees revenue at least . In Example B.1 we show this bound is tight by presenting a context where the sealed envelope mechanism can not extract more then of the optimal mechanism.
Protocols with Small Trees. We just showed how to compute the revenue optimal protocol in polynomial time when and are independent. We know that the protocol has polynomial size, where its size is measured by the number of nodes in the tree representing the protocol. Here, we make it more explicit and show that there is a protocol of size . We show that by bounding the sum of the support size of the random variables in the menu of contracts. This bounds the number of leaves of the Oneround Revelation Mechanism (Theorem 3.1). As the number of nodes in the tree representing this protocol is at most 3 times the number of leaves (twice the number of leaves plus one node for the buyer and seller nodes), the bound on the size of the tree follows from the bound on the number of leaves. The proof of the next theorem is in Appendix B.
Theorem 3.7
Let and . Denote the support of a vector by . The program has a solution where for all , and . Moreover, there are settings in which support of size quadratic in is necessary, even when .
The fact that the quadratic lower bound holds even for (Example B.2) is somewhat counterintuitive: even when the information being sold is a single bit, there are contexts in which revenue maximization requires using signals that consist of bits.
4 Correlated signals with committed buyers
In Section 3 we have considered independent signals and observed that for that case the seller does not care if the buyer is committed or not. In this section we consider correlated signals and committed buyers, the case of correlated signals and uncommitted buyers will be discussed in Section 5. Throughout this section we assume that the buyer is committed.
4.1 Pricing Outcomes Mechanism
In the previous section we showed that if are independent, then the optimal protocol had the form of offering a menu of contracts with a fixed price for each contract. In this section we show that this is not sufficient to optimize the revenue whenever are correlated. In order to optimize revenue, we need to add a twist: we still offer a menu of options, each option having a random variable correlated with and taking values in . Instead of a fixed price, however, we charge a specific price for each outcome of the signaling scheme. We continue to refer to the options on the menu as contracts, although this means that the word contracts has a slightly different meaning in this section than in the preceding one (as it is no longer the case that the buyer pays before observing any signal).
Why does this construction help in designing mechanisms to optimize revenue? Suppose a seller designs a variable taking values in . Consider for each such that the seller produces by sampling according to . If and were independent, the seller would be always choosing the same distribution from the buyer’s perspective, which would be . However, since and are correlated, different buyertypes perceive different distributions over : a buyer of type perceives . So, if we condition the prices on the outcomes , two different buyertypes see different prices for the same contract. This increases our power of pricediscrimination. For the case of committed buyers we are able to show the existence of an optimal OneRound Revelation Mechanism.
Theorem 4.1 (Existence of a OneRound Optimal Mechanism)
For any context , if it is possible to extract revenue from a committed buyer in this context, then there is a Pricing Outcomes Mechanism that does so.
The proof, which is given in Appendix C, is essentially the same as the proof of Theorem 3.1, except for the last step. Previously this step relied on the independence of and ; here, we instead rely on the fact that seller nodes are situated above transfer nodes in our protocol, which eliminates the need to estimate expected transfers over a random execution of the original protocol, and instead lets us match transfers pointwise.
It should be noted that the fact that buyers are committed to follow the mechanism until the end is crucial. In fact, in any protocol containing a transfer node in which the buyer needs to pay the seller and whose child is a leaf, the optimal uncommitted strategy would be to defect at that transfer node. In other words, uncommitted buyers could acquire the information and leave without paying. We mention one way to solve this problem: before the mechanism starts, ask a large sum of money from the buyer. Run the mechanism and then gives the large sum of money back to the buyer. This will guarantee that the buyer follows the mechanism until the end, so as not to lose his initial deposit. This will add an extra level to the mechanism: one transfer node in the beginning to charge this large sum of money. The rebate in the end can simply blend with the last transfer.
Pricing Outcomes Mechanism. We will describe a Pricing Outcomes Mechanism using the following notation. The seller designs a menu of contracts solely based on the context. The menu is a collection where is a random variable correlated with and taking values in a finite set , just like in the independent case. The payment function, however, is a function and it allows for both positive and negative payments. The seller outputs , which is a signal and a payment request of . If , the seller transfers to the buyer.
In the following, we discuss how to find the menu maximizing revenue using a convex program. The derivation of this convex program closely parallels the derivation of the corresponding convex program in the independent case. We give the full details in Appendix C, and here we limit ourselves to discussing the two most salient differences between the derivation of the convex program in the independent and correlated cases.

The posterior vector after receiving a given signal depends upon the buyer’s type. To compare posteriors across different types, we adopt a common “frame of reference” — that of an outside observer who observes the signal sent by the seller but does not observe — and we translate from the type reference frame to the outside observer’s reference frame using a matrix that expresses Bayes’ rule.

Since payments are now associated with signals, the obvious way of expressing the expected revenue is as a sum of products, where each term is the probability of sending a particular signal, , multiplied by the amount charged in the event of sending it, . Thus, our program has a quadratic objective function if we treat both the probability and the transfers as primal variables. (This issue does not arise in Pricing Mappings, since a buyer of type pays the same amount regardless of what signal is sent, hence the variables do not appear in the objective function.) To make the objective function linear when pricing outputs, we define new variables . Fortunately, this change of variables makes the constraints linear as well.
Full Surplus Extraction. Correlation can be very valuable to the seller. In fact, if the distribution exhibits sufficiently complex correlation, the seller might be able to extract full surplus from the buyers using a Pricing Outcomes Mechanism. Given a context we define the full surplus as the expected gain the buyer would get by learning the value of . In other words, the full surplus is , where . Clearly no mechanism can extract more then the full surplus, and extracting a fraction of it is trivial, even using a sealed envelope mechanism, as was observed in Section 3. Now, we show that if is sufficiently correlated, then we can extract the full surplus. Our result leverages the ideas developed by Cremer and McLean [9, 10] in their work on auctions with correlated bidders, although obviously the setting in which we apply these ideas is different from theirs.
For a joint distribution over we define as the rank of the matrix defined by . For example, if and are independent, then .
Theorem 4.2
If then the optimal Pricing Outcomes Mechanism extracts the full surplus. Moreover, this can be done with a single contract.
Proof.
We define one single contract in the following way. Calculate such that:
Since , this system is guaranteed to have a feasible solution. Now, offer the following contract to all the buyers: the seller reveals the value of and requests a payment of . By the definition of , each buyer is indifferent between buying this contract or not buying anything, so the mechanism is voluntary.
Notice that one can, in the manner of Lemma C.3, offer the above contract with price for each outcome, getting revenue arbitrarily close to the full surplus and making the players strictly prefer to buy the contract.
At this point it is instructive to consider a concrete example for full surplus extraction.
Example 4.3
Imagine a box that has a locker on it and there are two keys labeled with and , exactly one of which can open the box. The buyer can choose one key and try it. If he opens the box, he gets the object inside. Let the type of the buyer encode his value for the object and some signal that gives him a hint of which might be the right key. The seller knows exactly what is the correct key, and let this be . How should the seller sell the information to the buyer?
Consider , and the reward function as , with . The joint distribution is .
Before participating in the mechanism a buyer of type has interim belief on , so his best action is to pick key , getting expected reward . If he is able to pick key whenever and key whenever , then he always get a reward of , so his value for the information is . Similarly . In order to design a contract that extracts full surplus, find such that:
Solving this system, we get: . This means that if the seller reveals signal , the buyer needs to pay , if the seller reveals , the buyer receives from the seller. Both buyer types see the full information contract being offered, but because of the correlation of and , they perceive its expected cost to be different. Player of type perceives the expected cost to be This feature of correlation gives the seller a great power to do price discrimination.
4.2 Continuity and Approximation
One might be tempted to conclude from Theorem 4.2 that for any distribution such that , one is able to extract full surplus since all matrices can be approximated arbitrarily closely by matrices of full rank. The flaw in this argument is obvious. If one sees as a matrix, and as a vector with the surplus in component , the payment vector in Theorem 4.2 can be found by solving the linear system . If is a perturbation of, say, a rankone matrix (corresponding to and being independent) then the linear system is very illconditioned, and therefore the solution has a very high norm. This causes to diverge as becomes closer to being independent. To illustrate this point, let us revisit Example 4.3.
Example 4.4
Consider the same reward function as in Example 4.3 but with a different probability distribution: , , , with , and distribution .
The surplus of each buyer is given by . Proceeding as described in Section 3, one finds that the optimal mechanism is to offer one single contract that outputs exactly and costs . Clearly both types buy this contract and the expected revenue is . Notice however that the expected surplus is and Theorem 4.2 guarantees that for a slightly perturbed joint distribution one can extract it entirely. For example, consider Applying the proof of Theorem 4.2 we get the following mechanism extracting revenue : offer the contract that outputs the full information and if the outcome is , a player pays where .
This example highlights two problems with the optimal mechanism. The most obvious is that it somehow abuses riskneutrality. The optimal mechanism produces very large payments which are balanced by large rebates. This situation is clearly not desirable in practice. The second problem is that the revenue that can be extracted from a certain context might change abruptly whenever the context changes slightly.
It turns out that these discontinuities in the optimalrevenue
function are only unidirectional: as one varies the context,
the revenue can abruptly decrease but it cannot abruptly increase.
Furthermore, for certain restricted classes of mechanisms, the
optimal revenue depends continuously on the context. In particular,
this holds for the first three members of the
following sequence of progressively more general types of mechanisms.
To formalize these continuity assertions, let us fix and , and regard a context as a point in the topological space equipped with its standard topology. The revenue of the optimal mechanism for committed buyers (or, equivalently, the optimal Pricing Outcomes Mechanism) will be denoted by . Similarly, we use respectively to denote the revenue of the optimal Sealed Envelope Mechanism, Pricing Mappings Mechanism, or Pricing Outcomes Mechanism with No Positive Transfers. The following theorem formalizes the continuity assertions claimed above. Recall that a function is lowersemicontinuous if for all converging sequences , .
Theorem 4.5
The function is lowersemicontinuous. The functions are continuous. Let be the set of contexts such that for all (henceforth called nondegenerated contexts), the function is continuous.
Appendix D.1 breaks the theorem down into pieces and proves each piece. The fact that is lowersemicontinuous is proven by appealing to Lemma C.3 which shows that for any context , the optimal revenue can be approximated arbitrarily closely by mechanisms in which all of the IR and IC constraints are slack. Such a mechanism remains valid in a neighborhood of , and its revenue varies continuously in this neighborhood, hence the optimal revenue throughout a neighborhood of remains nearly as great as the optimal revenue at . As a side effect of the method of proof, we obtain a robustness result: if the seller believes that the context and designs an protocol extracting , but the real context may actually be slightly different, the seller can design a slightly different protocol extracting revenue at least from any context that is sufficiently close to . We note that a proof in the same spirit for the Cremer and McLean setting can be found in Robert [19]. Notice that the heart of our proof is the application of Lemma C.3. In this specific ingredient, our proof is different from that of Robert.
The same argument establishes lowersemicontinuity of the functions . Their uppersemicontinuity follows from a compactness argument. For each of these classes of mechanisms, the optimal revenue can be found by solving a variant of that reflects the additional constraints on the mechanism. Each such variant LP has an unbounded feasible region, and the first step of the proof is to identify a compact (i.e. closed and bounded) subset of the feasible region that is guaranteed to contain the optimal LP solution, at least for all contexts in a neighborhood of a given . Taking any sequence of contexts converging to and passing to a subsequence if necessary, we may assume that their optimal mechanisms converge to a limit belonging to the feasible region of the LP. This mechanism then supplies a lower bound on the optimal revenue in context that suffices to prove uppersemicontinuity.
Approximating Revenue. We have seen that the Sealed Envelope Mechanism achieves at least a approximation to the optimal revenue, where , and that this bound cannot be improved by more than a constant factor in the worst case. The other two classes of mechanisms listed above — Pricing Mappings, and Pricing Outcomes without positive transfers — are substantially simpler and more natural than Pricing Outcomes in full generality, so it would be desirable to approximate the optimal revenue using one of these simpler classes of mechanisms. Unfortunately, in the worst case, the approximation achieved is rather poor: we prove in Appendix D.2 that there exist contexts such that The proof is an application of Theorem 4.5. We carefully construct a context with independent for which the full surplus exceeds by a factor of . For such a context Pricing Mappings is optimal, i.e. . Now if we slightly perturb to make it into a fullrank matrix, Theorem 4.2 tells us that jumps up by a factor of to match the full surplus, while Theorem 4.5 ensures that and can only change by a tiny amount.
Finally there remains the question of whether achieves a good approximation to in the worst case. As it happens, once again the worstcase ratio between these two quantities is , as evidenced by Example D.7 in the Appendix. In this example the seller gets from a side channel the information about the the buyer’s surplus. In a Pricing Outcomes Mechanism (even without positive transfers), the seller is capable of leveraging this information, while in a Pricing Mappings Mechanism, she is not.
5 Correlated Signals with Uncommitted Buyers
In this section we examine mechanisms for uncommitted buyers when signals are correlated, and we formulate an intriguing open question related to a surprising failure of the oneround revelation property. First, we review what is known about uncommitted buyers from previous sections. Theorem 3.1 says that if and are independent, the revenueoptimal mechanism is aPricing Mappings Mechanism. The optimal strategy for both committed and uncommitted buyers is the same in such a mechanism. However, when and are correlated, the revenueoptimal mechanism for committed buyers is a Pricing Outcomes Mechanism (Theorem 4.1). In such a mechanism, the buyer declares his type, the seller sends a signal and the buyer pays a certain amount of money that depends on the signal sent by the seller. Such a mechanism clearly does not work for uncommitted buyers, who can defect after getting the signal but before the payment.
In section 4.1, we mentioned one way to get around this problem: before executing the Pricing Outcomes Mechanism the seller can require the buyer to deposit a large amount of money, then the mechanism executes, and after it completes the seller refunds the buyer. This mechanism has clear practical drawbacks: in practice a depositrefund scheme increases the cost of participation as it requires the buyer to always be able to make large payments, even in cases in which at the end of the protocol the net payment is small, or no payments are made in the execution. The latter case is particularly problematic as it is usually costly to establish a payment relationship (e.g. the buyer needs to spend time giving his credit card information) so always imposing payments might deter some buyers.
A protocol has no positive transfers if at any transfer node money always goes from the buyer to the seller, i.e. for every transfer node . Once we exclude positive transfers the problem of designing optimal mechanisms for uncommitted buyers becomes quite challenging. We formulate it as the following open problem:
Open Problem 5.1
Characterize the protocols that extract maximum revenue from uncommitted buyers subject to no positive transfers. In particular, is it possible to design an algorithm that decides, given any context and target revenue , whether there exists a protocol with no positive transfers that extracts revenue from uncommitted buyers?
A natural first attempt to address this problem would be to prove, in the manner of Theorem 3.1 and Theorem 4.1, that for this setting one can extract optimal revenue using a OneRound Revelation Mechanism. We show that this approach fails.
Theorem 5.2 (Failure of the Oneround Revelation Property)
There exists a context with correlated and for which some generic protocol with no positive transfers extracts strictly more revenue from uncommitted buyers than any Oneround Revelation Mechanism with no positive transfers.
In order to prove this theorem, we observe that if it is possible to extract revenue from an uncommitted buyer using a Oneround Revelation Mechanism, then it is possible to extract the same revenue using a Pricing Mappings Mechanism. This follows easily from the fact that if a transfer node is the last node before the leaf in a path of the protocol tree, an uncommitted buyer will always defect before this node if .
Then in Example 5.3 we present a context for which an interactive protocol with no positive transfers extracts strictly more revenue than any Pricing Mappings Mechanism.
Example 5.3
We present a context for which a mechanism where the seller interacts with the buyer twice (producing a protocol tree of height ) extracts strictly more revenue than any direct revelation mechanism.
Let , and the distribution . Let and define the utility such that (for ) and (for ). As usual, since we represent the posterior probability by one real number , the probability of the event . The first part of Figure 2 depicts the utility as a function of the posterior , the utility is represented by functions . In this particular case .
It is simple to see that the optimal Pricing Mappings Mechanism offers a single contract, pricing the full information (value of ) at , and getting revenue of .
We next present a protocol with no positive transfers that extracts strictly more revenue than the optimal Pricing Mappings Mechanism from uncommitted buyers. The protocol is represented by a tree of height depicted in the second part of Figure 2. It consists of two transfer nodes with amounts and . For the seller nodes, the transition probabilities are as follows:

node : the seller outputs whenever and outputs whenever .

node : the seller outputs whenever and outputs whenever .

node : the seller moves either to node or to node according to the following probabilities: , and clearly .
Now, we claim that the optimal strategy for an uncommitted buyer is to play left (to node ) whenever and play right (to node ) whenever , and then follow the protocol (make the transfers when asked) without defecting. We prove this in Claim E.1 in Appendix E. Given these strategies we calculate the expected revenue of the protocol, which is:
We have shown above that generic protocols can extract strictly more revenue than the Oneround Revelation Mechanism when buyers are uncommitted and no positive transfers are allowed. One might wonder if such interactive mechanisms can extract as much revenue from uncommitted buyers as can be extracted from committed buyers. We show that the answer is no, as for the setting of Example 5.3 there is a gap between the two.
Theorem 5.4
There exists a context with correlated and for which the optimal revenue that can be extracted from uncommitted buyers using a protocol with no positive transfers is strictly less than the optimal revenue that can be extracted from committed buyers.
We prove the theorem in Appendix E. For context , the optimal revenue that can be extracted from committed buyers using protocols that has no positive transfers is exactly the revenue that can be extracted by Pricing Outcomes with No Positive Transfers Mechanism. To prove the claim we show in Appendix E that for the setting of Example 5.3 for some it is impossible to extract revenue of from uncommitted buyers using protocols that have no positive transfers.
The intuition behind the proof of Theorem 5.4 is the following: in this context, it is possible to extract the full surplus from committed buyers. In order to extract close to this much revenue from uncommitted buyers, the mechanism must be offering an option that results with some posterior very close to the full information. Now we can show that since a buyer of is paying at most his entire surplus, there is a deviation for a buyer of type that guarantees him almost full information for a price considerably below his surplus.
6 Open Problems
We believe the design of mechanisms for selling information is an area full of exciting possibilities. In particular, there might be potential connections with information theory and cryptography, that could be discovered and formalized. In this section we describe some interesting open directions, listing them from the more precise to the more vague problems:

Mechanisms for Uncommitted Buyers: The main open problem left by our paper is the problem of designing revenue optimal protocols for uncommitted buyers, when no positive transfers are allowed (Open Problem 5.1). Theorem 5.2 reveals that multiple rounds of partial information disclosure (interleaved by payment to the seller) are sometimes necessary to achieve optimal revenue if the buyer is allowed to abort his interaction with the seller prematurely. Solving this open problem would probably require the development of new tools for directly bounding the revenue of interactive mechanisms — in sharp contrast to most optimal mechanisms in the literature which satisfy some version of the Oneround Revelation Property.
Possibly, tools from dynamic mechanism design [4, 17] and from cheap talk [5] might be useful in dealing with this question, since both involve repeated interactions between parties before an outcome is effectively implemented. Nevertheless, our setting differs from them in fundamental ways: the reason interaction is necessary here is different from that in dynamic mechanism design. In our problem, information is just revealed by nature in the beginning of time. No exogenous signals are revealed during the interaction between seller and buyer. This fact brings our model closer to cheap talk, yet there is a substantial difference as in our setting talking is not “cheap” in the sense that payments are required so that the conversation continues.

Continuous Type Spaces and Structured Contexts: Our results fix a finite discrete type space and use tools from Linear and Convex Programming to design optimal mechanisms. Is it possible to obtain a general theory of “Selling Information” where the signals from nature come from arbitrary spaces? Eső and Szentes [11] are able to deal with realvalued signals but they severely restrict the utility function of the agents.
Is there a natural yet less restrictive condition on the context that makes the problem tractable for more generic spaces? Can one come up with a general condition that would be still be sufficient for the optimal mechanism to have explicit and natural representation (similar to Myerson’s mechanism), rather than being a solution to some mathematical program?

Multiple buyers: In our work, we consider only two agents: a seller and a buyer. In the context of information advertisement, there are usually many sellers (many informationproviding agencies) and many buyers (advertisers). Those buyers do not simply solve a decision problem once they acquire the information, but play a game among each other.
A natural next step is to consider a variant of our model where there is a single seller but multiple buyers, who play a game after acquiring information. One needs to be careful when defining such problem since we must be sure that the game has an unique equilibrium or that for each possible outcome of the selling information phase, the Bayesian game played in the second phase has a focal equilibrium on which we can concentrate when reasoning about the first phase. Along those lines is the work of Eső and Szentes [12] and the recent paper by Kempe, Tardos and Syrgkanis [20].

Coupling Goods and Information: As mentioned earlier, there are many situations where goods and information are coupled together (for example, the restaurant example in the introduction). What is the correct model in which to analyze such situations?

Dynamic selling of information: We analyzed a single interaction between the buyer and the seller. Alternatively one could consider an ongoing (possibly interactive) relation between those two parties, e.g. when the buyer is interested in information regarding multiple impressions. Can the seller benefit from this and extract strictly more revenue from the whole process than she would if running the optimal auction for each query individually? It is known that for traditional goods, selling bundles might generate strictly more revenue than selling goods individually. What is the form of the analogous results for information?

Computationally bounded agents: We assumed that the seller sends messages to the buyer coming from a certain distribution and the buyer uses Bayes’ rule to learn about the state of the world from those messages. We do not impose computational constraints on the agents. If we were to assume computationallyconstrained buyers, the seller could potentially take advantage of cryptographic primitives, for example sending an encrypted piece of information to the buyer and later selling the decryption key. How would such capabilities affect the results? Would the seller be able to exploit them in a profitable way?
Appendix A Proofs omitted from Section 2
Proof of Theorem 2.6 : The idea of the proof is simple: we add in the root a buyer node where the buyer is asked to report his type . For each branch, there is a copy of the original protocol where the seller simulates the behavior of the buyer.
Formally, we fix a context and a consider a protocol represented by a tree, distributions for each seller node and transfers on each transfer node. Now, consider a set of moves for the buyer for each buyer and transfer node. We can assume that for any buyer node , outputs with zero probability, since any other strategy is weakly dominated by a strategy in which the buyer does not defect at the buyer node and instead defect at the first transfer node encountered. Let represent this protocol.
Now, define the protocol as the same tree with all buyer nodes substituted by seller nodes where the seller moves according to for all . Now, the tree consists only of transfer and seller nodes.
Now, design a new protocol with a buyer node in the root with outgoing edges, one corresponding to each type. Attach to the branch a copy of . Now, it is simple to see that an optimal committed (uncommitted) strategy is to report the true type and then follow the protocol (defecting in the nodes of corresponding to the nodes in where he defected in the original protocol). If any deviation is profitable now, the corresponding deviation would be profitable in the original protocol. Also, the revenue of the truthful strategy in the new protocol is clearly the same as the revenue of the original strategy in the original protocol.
Appendix B Proofs omitted from Section 3
Proof of Theorem 3.1 : Fix the context, consider a voluntary protocol and an optimal strategy for a committed buyer achieving revenue . We show that we can reduce it to a protocol of the specified form, achieving the same revenue for uncommitted buyers. Let be the distribution over leaves obtained when we start from the root and use to move down the tree. Also, let be the set of leaves of the tree.
In the root of the reduced protocol, we place a buyer node with edges out of it. This node corresponds to the buyer being asked to report his type. The children of the root are transfer nodes, each corresponding to a distinct type . The amount at such a transfer node corresponds to the expected payment of type in the mechanism, i.e., .
Now, consider the node who is the child of the transfernode in the branch. Add a copy of as its children and for each , have the seller use the distribution .
Now, we need to check that the new protocol is indeed equivalent to the original one. Consider the uncommitted strategy where the buyers report their true types in the first node and then pay according to the transfer node. This strategy generates the same utility and payments as the ones in the original protocol, so one only needs to check that it is indeed an optimal strategy for the buyer.
It suffices to see that a buyer of type would not rather declare instead or defect in the middle of the reduced protocol. Defecting in the middle is clearly not beneficial, since the protocol is voluntary and the buyer does not learn anything until the last move. Also, if declaring were profitable for a buyer of type , he would prefer to play according to in the original protocol. This happens for two reasons. First, it generates the same distribution over leaves given as declaring in the new protocol would. So the value of the deviations is the same. Second, it generates the same expected payment as declaring in the new protocol would. This depends crucially on independence. A buyer playing experiences the same expected payment regardless of his type, since is drawn from the same distribution (independent of ) and we thus visit the same nodes with the same probabilities in all cases.
Notice that the last step in the proof clearly does not hold if are correlated, since a buyer of type could change his declaration to , but he can not prevent to be sampled with probability .
Proof of Observation 3.3 : Simply notice that the utilities of each player for each contract in the original and new menu are the same.
Proof of Observation 3.4 : Let be the posterior associated with , i.e. and let . Then:
Conversely, given satisfying we can define a variable taking values in and set its joint distribution with to be .
Proof of Lemma 3.5 : The function defines piecewise linear functions . Each of them induces a partition of in polytopes in which is linear. One can combine the partitions, by taking the coarser partition of that is simultaneously a subpartition of the one induced by for all . This way we obtain a finite partition such that for all its regions all the functions are linear.
Now, let be the set of vertices of the regions in this partition. Given any posterior , if it is in region of the partition, one can write where are vertices of region and . Now, given a primal solution to , if and , simply increase by and decrease to zero. By the linearity of in , it is clear that the resulting solution is still feasible and has the same objective. By repeating this process as many times as needed, one ends with a solution where the support is in .
Proof of Lemma 3.6 : Given a convex programing problem , in order to show that it can be solved exactly in polynomial time we need to show three things (see section 5.3 of [6]):

that the function can be computed efficiently for each point in the domain.

that for each point in the domain it is possible to calculate a subgradient . A subgradient is a vector such that

that there is an optimal solution that can be expressed with polynomiallymany bits.
Point above is trivial. For point number we use the linearity of the subgradient: if and then . And since the subgradient of linear functions is trivial, we now need to show how to compute the subgradient of . Notice that is the maximum of linear functions. We use the fact that if , and , then , since:
For point , Lemma 3.5 says that can be written with finitely many variables. Therefore, the dual LP () can be written with finitely many constraints. Now, the optimizer of the dual can be found by taking constraints, making them tight and solving the resulting linear system. This solution has polynomially many bits.
Given an optimal solution , for each let be a solution to . There is a polytope that consists of all posteriors such that actions remain optimal when the posterior is . The objective function of the convex program is linear when restricted to , so the set of optimal solutions includes at least one extreme point of , and such an extreme point can be found in polynomial time. Recalling the construction of the set in the proof of Lemma 3.5, we see that all of the extreme points of are elements of , so we have established that our convex program has an optimal solution that belongs to , and that we can find such a in polynomial time, as claimed in the lemma.
The following example, discussed in Section 3.1, reveals that the Sealed Envelope Mechanism does not extract more than of the revenue of the optimal mechanism.
Example B.1
Consider , and . Now, for each we can represent as a piecewiselinear function on where
Let be the function interpolating the following three points: , where is some fixed large number. Now, the value of a buyer of type for the envelope is and occurs with probability , so any price will only be able to extract revenue.
Now, we show a Pricing Mappings Mechanism that can extract revenue. Consider the menu where the price of contract is and outputs the posterior w.p. and the posterior w.p.