Sequential Peer Prediction: Learning to Elicit Effort using Posted Prices

# Sequential Peer Prediction: Learning to Elicit Effort using Posted Prices

Yang Liu and Yiling Chen
Harvard University, Cambridge MA, USA
{yangl,yiling}@seas.harvard.edu
###### Abstract

Peer prediction mechanisms are often adopted to elicit truthful contributions from crowd workers when no ground-truth verification is available. Recently, mechanisms of this type have been developed to incentivize effort exertion, in addition to truthful elicitation. In this paper, we study a sequential peer prediction problem where a data requester wants to dynamically determine the reward level to optimize the trade-off between the quality of information elicited from workers and the total expected payment. In this problem, workers have homogeneous expertise and heterogeneous cost for exerting effort, both unknown to the requester. We propose a sequential posted-price mechanism to dynamically learn the optimal reward level from workers’ contributions and to incentivize effort exertion and truthful reporting. We show that (1) in our mechanism, workers exerting effort according to a non-degenerate threshold policy and then reporting truthfully is an equilibrium that returns highest utility for every worker, and (2) The regret of our learning mechanism w.r.t. offering the optimal reward (price) is upper bounded by where is the learning horizon. We further show the power of our learning approach when the reports of workers do not necessarily follow the game-theoretic equilibrium.

Sequential Peer Prediction: Learning to Elicit Effort using Posted Prices

Yang Liu and Yiling Chen Harvard University, Cambridge MA, USA {yangl,yiling}@seas.harvard.edu

## 1 Introduction

Crowdsourcing has arisen as a promising option to facilitate machine learning via eliciting useful information from human workers. For example, such a notion has been widely used for labeling training samples, e.g., Amazon Mechanical Turk. Despite its simplicity and popularity, one salient feature or challenge of crowdsourcing is the lack of evaluation for the collected answers, because ground-truth labels often are either unavailable or too costly to obtain. This problem is called information elicitation without verification (IEWV) (?). A class of mechanisms, collectively called peer prediction, has been developed for the IEWV problem (???????). In peer prediction, an agent is rewarded according to how his answer compares with those of his peers and the reward rules are designed so that everyone truthfully reporting their information is a game-theoretic equilibrium.

More recent work (???) on peer prediction concerns effort elicitation, where the goal is not only to induce truthful report, but also to induce high quality answers by incentivizing agents to exert effort. In such work, the mechanism designer is assumed to know workers’ expertise level and their cost for effort exertion and designs reward rules to induce optimal effort levels and truthful reporting at an equilibrium.

This paper also focuses on the effort elicitation of peer prediction. But different from prior work, our mechanism designer knows neither workers’ expertise level nor their cost for effort exertion. We introduce a sequential peer prediction problem, where the mechanism proceeds in rounds and the mechanism designer wants to learn to set the optimal reward level (that balances the amount of effort elicited and the total payment) while observing the elicited answers in previous rounds. There are several challenges to this problem. First, effort exertion is not observable and no ground-truth answers are available for evaluating contributions. Hence, it is not immediately clear what information the mechanism designer can learn from the observed answers in a sequential mechanism. Second, forward-looking workers may have incentives to mislead the learning process, hoping for better future returns.

The main contributions of this paper are the following: (1) We propose a sequential peer prediction mechanism by combining ideas from peer prediction with multi-armed bandit learning (??). (2) In this mechanism, workers exerting effort according to a non-degenerate threshold policy and then reporting truthfully in each round is an equilibrium that returns highest utility for every worker. (3) We show that the regret of this mechanism w.r.t. offering the optimal reward is upper bounded by where is the learning horizon. We also show that under a “mean-field” assumption, the sequential learning mechanism can be extended to a setting where workers may not be fully rational. (4) Our sequential peer prediction mechanism is minimal in that reported labels are the only information we need from workers.

In the rest of the paper, we first survey the most related work in Section 1.1. Section 2 introduces our problem formulation. We then present a game-theoretic analysis of worker behavior in a one-stage static setting in Section 3. Based on the equilibrium analysis of the one-stage setting, we propose and analyze a learning mechanism to learn the optimal bonus level using posted price in Section 4. We also discuss an extension of our learning mechanism to a setting where workers may not be fully rational. Section 5 concludes this paper. All omitted details can be found in the full version of the paper (?).

### 1.1 Related work

Eliciting high-quality data from effort-sensitive workers hasn’t been addressed within the literature of peer prediction until recently. ? [?] and ? [?] formally introduced costly effort into models of IEWV. The costs for effort exertion were assumed to be homogeneous and known and static, one-shot mechanisms were developed for effort elicitation and truthful reporting. Our setting allows participants to have heterogeneous cost of effort exertion drawn from a common unknown distribution and hence we consider a sequential setting that enables learning over time. ? [?] is the closest to this work. It considered the same general setting and partially resolved the problem of learning the optimal reward level sequentially. There are two major differences however. First, the method developed in ? [?] required workers to report their private cost in addition to their answer, which is arguably undesirable for practical applications. Our learning mechanism in contrast is ”minimal” (??) and only asks for answers (for tasks) from workers. Second, the mechanism of ? [?] was built upon the output agreement mechanism as the single-round mechanism. Output agreement and hence the mechanism of ? [?] suffer from potential, simple collusions of workers: colluding by reporting an uninformative signal will lead to a better equilibrium (higher utility) for workers. By building upon the mechanism of ? [?], which as a one-shot mechanism is resistant to such simple collusion, we develop a collusion-resistant sequential learning mechanism.

Generally speaking, when there is a lack of knowledge of agents, the design problem needs to incorporate learning from prior outputs from running the mechanism – see ? [?] for specific examples on learning with auction data. And this particular topic has also been studied within the domain of crowdsourcing. For example, ? [?] and ? [?] consider strategic data acquisition for estimating the mean and for online learning respectively. Our problem differs from above in that both agents’ action (effort exertion) and ground-truth outcomes are unavailable.

## 2 Problem Formulation

### 2.1 Formulation and settings

Suppose in our system we have one data requester (or a mechanism designer), and there are candidate workers denoted by , where . In all we have interactive agents. The data requester has binary-answer tasks, with answer space , that she’d like to get labels for. The requester assigns tasks to workers.

Label generated by worker comes from a distribution that depends both on the ground-truth label and an effort variable . Suppose there are two effort levels, High and Low, that a worker can potentially choose from: . We model the cost for exerting High effort for each (worker, task) pair as drawn from a distribution with c.d.f. on a bounded support ; while exerting Low effort incurs no cost. We assume such costs are drawn in an i.i.d. fashion. Denote worker ’s probability of observing when the ground-truth label is as , under effort level . Note with above we have assumed that the labeling accuracy is symmetric, and is independent of the ground-truth label . Further for simplicity of analysis, we will assume all workers have the same set of , denoting as . With higher effort, the expertise level is higher: – we also assume the labeling accuracy is no less than 0.5. The above are common knowledge among workers, while the mechanism designer doesn’t know the form of ; neither does she know . But we assume the mechanism designer knows the structural information, such as costs are i.i.d., workers are effort sensitive, and there are two effort levels etc.

The goal of the learner is to design a sequential peer prediction mechanism for effort elicitation via observing contributed data from workers, such that the mechanism will help the learner converge to making the optimal action (will be defined later).

### 2.2 Reward mechanism

 B′i(k)=1 (Li(k)=Lj(k))−Ldi⋅Ldj−¯¯¯¯Ldi⋅¯¯¯¯Ldi, (1)

where we denote reports from worker on task as and . Our bonus rule follows exactly the same idea except that we will multiply by a constant (which we can choose): .

### 2.3 Worker model

After receiving each task , worker first realizes the cost for exerting High effort. Then worker decides his effort level and observes a signal (label of the task). Worker can decide either to truthfully report his observation (denote by the decision variable on reporting) or to revert the answer :

Workers are utility maximizers. Denote the utility function at each time (or step) for each worker as , which is assumed to have the following form (payment cost):

 ui=Mb+∑k∈TiBi(k)−∑k∈Tici(k),\leavevmode\nobreak ∀i.

### 2.4 Data requester model

After collecting labels for each task, the data requester will aggregate labels via majority voting. Denote workers who labeled task as . Then the aggregate label for is given by

 LA(k)=1(K∑n=1Lrwk(n)(k)/K>0)⋅2−1.\leavevmode\nobreak

The data requester’s objective is to find a bonus level (as in ) that balances the accuracy of labels collected from workers, and the total payment. Denote requester’s objective function at each step as (assigning tasks):

 U(B):=N∑k=1[Pr[LA(k)=L(k)]−ηK∑n=1E[Bwk(n)(k)]],

where denotes the true label of task , and is a weighting constant balancing the two terms in the objective.

Since we have assumed that all tasks have been assigned the same number of workers, and workers are homogeneous in their labeling accuracy and cost (i.i.d.), we know all tasks enjoy the same probability of having a correct label (a-priori). We denote this probability as . Further as workers do not receive payment when a task is not assigned to him, can be simplified (and normalized 111which does not affect optimizing the utility function.) to the following form:

 U(B)=P(B)−ηN∑i∈CN∑k=1E[Bi(k)], (2)

Suppose there exists a maximizer

### 2.5 Sequential learning setting

Suppose our sequential learning algorithm goes for stages. At each stage , learner assigns a certain number of tasks to a set of selected workers 222For details please refer to our algorithm.. The learner offers a bonus bundle to each worker (the bonus constant in reward mechanism). The regret of offering w.r.t. is defined as follows:

 R(T)=T⋅U(B∗)−T∑t=1∑i∈S(t)Mi(t)⋅E[U(Bi,t)]∑j∈S(t)Mj(t). (3)

Note we normalize using the number of assignments – intuitively the more the requester assigned with a wrong price, the more regret will be incurred. The goal of the data requester is to design an algorithm such that . We can also define as being un-normalized, which will add a constant (bounded number of assignments at each step) in front of our results later.

## 3 One stage game-theoretic analysis

From the data requester’s perspective, we need to first understand workers’ actions towards effort exertion and reporting under different bonus levels, in order to figure out the optimal . We start with the case that the data requester knows the cost distribution, and we characterize the equilibria for effort exertion and reporting, i.e. , on workers’ side. Note are both vectors defined over all tasks – this is a simplification of notation as workers do not receive all tasks. We are safe as if task is not assigned to , worker does not make decisions on . We define Bayesian Nash Equilibrium (BNE) in our context as follows:

###### Definition 1.

We say is a BNE if :

In this paper, we restrict our attention to symmetric BNE. For each assigned task, we have a Bayesian game among workers in : a worker’s decision on effort exertion is a function of , , which specifies the effort levels for worker when his realized cost is and gives the reporting strategy for the chosen effort level. We focus on threshold policies: that is, there is a threshold such that for all and otherwise. In fact, players must play a threshold strategy for effort exertion at any symmetric BNE: workers’ outputs do not depend on and worker ’s chance of getting a bonus will not change when he has a cost ; so a worker will choose to exert effort, if it is a better move for an even higher cost. We will use to denote this threshold () strategy for workers. Denote the reporting strategy that , i.e. reporting truthfully regardless of the choice of effort.

###### Theorem 2.

When and is concave, there exists a unique threshold for such that is a symmetric BNE for all workers on all tasks.

##### Other equilibrias:

The above threshold policy is unique only in non-degenerate effort exertion (). There exist other equilibria. We summarize them here:

• Un-informative equilibrium: Colluding by always reporting the same answer to all tasks is an equilibrium. Similarly as mentioned in (?), when colluding (pure or mixed strategies) the bonus index defined in Eqn. (1) reduces to 0 for each worker, which leads to a worse equilibrium.

• Low effort: When , , i.e., no effor exertion (followed by either truthful or untruthful reporting) is also an equilibrium: when no one else is exerting effort, each worker’s answer will be compared to a random guess. So there would be no incentive for effort exertion.

• Permutation: Exerting the same amount of effort and then reverting the reports () is also an equilibrium.

But we would like to note that though there may exist multiple equilibria, all others lead to strictly less utility for each worker at equilibrium compared to the threshold equilibrium with followed by truthful reporting, except for the permutation equilibria, which gives the same expected utility to workers.

Solve for optimal : After characterizing the equilibria on effort exertion as a function of , we can compute and for each reward level . Then solving for the optimal reward level becomes a programming problem in , which can be solved efficiently when certain properties, e.g. convexity, can be established for .

## 4 Sequential Peer Prediction

In this section we propose an adaptive learning mechanism to learn to converge to the optimal or nearly optimal reward level. As mentioned earlier, a recent work (?) attempted to resolve this challenge. But besides the output labels, workers are also required to report the private costs, in which sense the proposed learning mechanism is not “minimal”. We try to remove this requirement by learning only through the label information reported by the workers. In this section, we assume the requirements for Theorem 2 hold, and workers will follow an equilibrium that returns the highest utility.

### 4.1 Challenges

In designing the mechanism, we face two main challenges. The first challenge is on the learning part. In order to select the best , we need to compute , which can be computed as a function of and , the probability of labeling accurately when is offered and the threshold policy is adopted by workers:

 ¯p(B):=F(c∗(B))pH+[1−F(c∗(B))]pL. (4)

The dependency on is straight-forward. For , e.g. when using Chernoff bound for approximating :

 P(B) =Pr[∑i1(worker i is correct)M≥0.5] ≥1−exp(−2(¯p(B)−0.5)2M),

it is clear is a function of . In fact both and are functions of , so is . For details, please see Appendix of (?). The question pings down to learn . Since we do not have the ground-truth labels, we have no way to directly evaluate via checking workers’ answers. Also since we do not elicit reports on private costs, we are un-able to estimate the amount of induced effort for each reward level.

The second challenge we have is that when workers are returning and participating in a/an sequential/adaptive learning mechanism, they have incentives to mislead the learning process by deviating from the one-shot BNE strategy for a task, so to create untruthful samples (and then collected by learner), which will lead the learner into believing that inducing certain amount of effort requires a much higher reward level. The cost-reporting mechanism described in (?) proposes a method to deter such a deviation by eliminating workers who over-reported from receiving potentially higher bonus. We will describe a two-fold cross validation approach to decouple such incentives, which aims to remove the requirement of reporting additional information.

#### Learning w/o ground-truth

The following observation inspires our method for learning without ground-truth. For each bonus level , we can estimate (at equilibrium) through the following experiments: the probability of observing a pair of matching answers for any pair of workers (denoted by for each bonus level ) on equilibrium can be written as follows:

 pm(B) =¯p2(B)match on correct label+(1−¯p(B))2match on wrong label. (5)

The above matching formula forms a quadratic equation of . From Eqn. (4) we know . Then the only solution to the matching Eqn. (5) that is larger than is

 ¯p(B)=1/2+√2pm(B)−1/2.

Above solution is well defined, as from Eqn. (5) we can also deduce that . Therefore though we cannot evaluate each worker’s labeling accuracy directly, we can make such an inference using the matching probability, which is indeed observable.

#### Decoupling incentives via cross validation

To solve the incentive issue, we propose the following cross validation approach (illustrated in Fig. 1). First the entire crowd is separated into two groups uniformly random, but with equal size (when is even) or their sizes differ by at most 1 (when is odd). Suppose we have at least . Denote worker ’s group ID as . Then we have . For our learning algorithm, only the data/samples collected from group will be used to reward any worker in group . Secondly when selecting reference worker for comparing answers for mechanism (DG13), we select from the same group .

### 4.2 Mechanism

We would like to treat each bonus level as an “arm” (as in standard MAB context) to explore with. Since we have a continuous space of bonus level , we separate the support of bonus level () into finite intervals. Then we treat each bonus interval as an arm. Our goal is to select the best one of them, and bound the performance in such a selection.

We set up arms as follows: chooses a , separate into uniform intervals:

 [0,¯B/Na],...,[(k−1)¯B/Na,k¯B/Na],....,[(Na−1)¯B/Na,¯B]

For each interval we take its right end point as the bonus level to offer: Denote by the estimated matching probability for agents in group under bonus level , and s the estimated for group , at stage ; and we use to denote the estimated utility function when using a noisy (), instead of the true ones. We present Mechanism 1.333We assume we know or a non-trivial lower bound on .

Note since we assign the same number of tasks to each labeler at all stages, we have the regret defined in Eqn. (3) become equivalent with the following form:

where when is in exploration stages otherwise

### 4.3 Equilibrium analysis: workers’ side

Denote a worker’s action profile at step as . We adopt BNE as our solution concept:

###### Definition 3.

A set of reporting strategy is a BNE if for any , we have

 ∑tE[maxei(t),ri(t)ui(~ai,~a−i)|~ai(1:t−1),~a−i(1:t−1)] ≥∑tE[maxei(t),ri(t)ui(~a′i,~a−i)|~a′i(1:t−1),~a−i(1:t−1)]\leavevmode\nobreak .

We first characterize the equilibrium strategy for workers’ effort exertion and reporting with (SPP_PostPrice).

###### Lemma 4.

At a symmetric BNE, the strategy for workers can be decoupled into a composition of myopic equilibrium strategies, that is combined with .

###### Proof.

W.l.o.g., consider worker in . We are going to reason that deviating from one step BNE strategy for effort exertion is non-profitable for worker , when other players are following the equilibria. Since the one stage equilibrium strategy maximizes the utility at current stage, and it does not affect the past utilities that have been already collected, the potential gain by deviating comes from the future gains in utilities in: (1) the offered bonus level (2) matching probability from other peers. For the bonus level offered to worker , it will only be computed using observed data from workers in at exploration phases. Note for our online learning process, the exploration phases only depend on the pre-defined parameter , which does not depend on worker ’s data (deterministic exploration). Similarly for all other workers (reference workers), their future utility gain is not affected by worker ’s data. Therefore an unilateral deviation from worker will not affect the matching probability from other peers. So no deviation is profitable. ∎

Again consider colluding workers. Potentially when offered a certain bonus level, workers can collude by not exerting effort regardless of their cost realization, so to mislead the learner into believing that in order to incentivize certain amount of effort, a much higher bonus level is needed.444This is similar to the colluding strategy that contributes uninformative signals we studied in Section 3. There potentially exist infinitely many colluding strategies for workers to game against a sequential learning algorithm. We focus on the following easy-to-coordinate (for workers), yet powerful strategy (Collude_Learn):

###### Definition 5 (Collude_Learn).

Workers collude by agreeing to exert effort when the offered bonus level satisfies where is the cost for workers to exert effort for task at time . is a collusion constant.

In doing so workers mislead the learner into believing that a higher bonus (differs by ) is needed to induce certain effort. The next lemma establishes the collusion-proofness:

###### Lemma 6.

Colluding in (SPP_PostPrice) via (Collude_Learn) is not an equilibrium.

In fact the reasoning for removing all symmetric colluding equilibria is similar – regardless of how others collude on effort exertion, when a worker’s realized cost is small enough, he will deviate.

### 4.4 Performance of (SPP_PostPrice)

We impose the following assumptions 555Please refer to our full version for justification. :

###### Assumption 7.

is Lipschitz in both and :

 |~U~p(~B)−U(B)|≤L1|~p(~B)−¯p(~B)|+L2|~B−B|,

are the Lipschitz constants.

We have the following theorem:

###### Theorem 8.

The regret of (SPP_PostPrice) is:

 R(T)≤ O(⌈Tθ+zlogT⌉)+L1C(θ)√2pL−1T1−θ/2 +L2C(z)¯BT1−z+const.

are tunable parameters. are constants. The optimal regret is when setting .

###### Proof.

(Sketch) First notice by triangular inequality we know i.e., the total regret is upper bounded by the weighted sum of each group’s regret. Since the two learning processes for the two groups parallel each other, we omit the super- or sub-script. We analyze for the regret incurred at the exploration and exploitation stages respectively. For exploitation regret, we first characterize the estimation error for estimating each . First by mean value theorem we can show:

 |¯p(Bk)−~pt(Bk)|≤12√2pL−1|pm(Bk)−~pm,t(Bk)|.

At time , an exploitation phase, there are number of samples guaranteed; so the estimation satisfies: Then w.h.p., we have established that Further by Lipschitz condition we know For any that falls in the same interval as we know:

Denote by i.e., is the estimated optimal bonus level at time – at any time , by searching through all arm and we can find the one maximizes the utility function from the empirically estimated function the learner currently has. Combine above arguments, we can prove Then the exploration regret can be bounded as

 T∑t=1(L1⋅t−θ/22√2pL−1+L2¯BT−z)+O(T∑t=12t2) ≤L1⋅C(θ)2√2pL−1T1−θ/2+L2⋅C(z)¯BT1−z+const.

where we have used the fact that for any , there exists a constant such that The total number of explorations are ( number of explorations needed for each of the arms): Sum over we finish the proof.

### 4.5 Beyond game theoretical model

So far we have modeled the workers as being fully rational, and the reports as coming from game theoretical responses. Consider the case workers’ responses do not necessarily follow a game (or arguably no one is fully rational). Instead we assume each worker has a labeling accuracy for different , where can come from different models, being game theory driven, behavioral model driven or decision theory driven, and can be different for different workers.

Challenges and a mean filed approach: With this model, we can again write as a function of and . In order to select , again we need to learn . We re-adopt the bandit model we described earlier, and estimate s via observing the matching probability between worker and a randomly selected reference worker : For each we define

 pim(B)=pi(B)¯p−i(B)+(1−pi(B))(1−¯p−i(B)), (6)

where is the probability of observing a matching for worker , when a random reference worker is drawn (uniformly) from his group. The above forms a system of quadratic equations in when s are known. We then need to solve a perturbed quadratic equations for , with s being estimated via observations (Step 3 of (Explore_Crowd)). The following challenges arise for analysis: (1) it is hard to tell whether the solution for above quadratic equations is unique or not. (2) Solving a set of perturbed (error in estimating s) quadratic equations for each incurs heavy computations.666We do not claim this is impossible to do. Rather, analyzing the output from such a system of perturbed quadratic equations merits a further study.

Instead, by observing the availability of relatively large and diverse population of crowd workers, we make the following mean filed assumption:

###### Assumption 9.

For any worker ,

That is one particular worker’s expertise level does not affect the crowd’s mean. This is not a entirely unreasonable assumption to make, as the candidate pool of crowd workers is generally large. With above then becomes

 pim(B)=pi(B)¯pg(i)(B)+(1−pi(B))(1−¯pg(i)(B)).

Averaging over we have:
which is very similar to the matching equation we derived earlier on. Again we can solve for as a function of . Plugging back into Eqn. (6), we obtain an estimate of as follows:

 pi(B)=(pim(B)+¯pg(i)(B)−1)/(2¯pg(i)(B)−1).

Similar regret can be obtained – the difference only lies in estimating s. Details can be found in (?).

## 5 Conclusion

We studied the sequential peer prediction mechanism for eliciting effort using posted price. We improve over status quo towards making the peer prediction mechanism for effort elicitation more practical: (1) we propose a posted-price and “minimal” sequential peer prediction mechanism with bounded regret. The mechanism does not require workers to report additional information, except their answers for assigned tasks. Further we show our learning results can generalize to the case when workers may not necessarily be fully rational, under a mean-filed assumption. (2) Workers exerting effort according to an informative threshold strategy and reporting truthfully is an equilibria that returns highest utility.

##### Acknowledgement:

We acknowledge the support of NSF grant CCF-1301976.

## References

• [Abernethy et al. 2015] Abernethy, J.; Chen, Y.; Ho, C.-J.; and Waggoner, B. 2015. Actively Purchasing Data for Learning. In ACM EC 2015.
• [Auer, Cesa-Bianchi, and Fischer 2002] Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47:235–256.
• [Chawla, Hartline, and Nekipelov 2014] Chawla, S.; Hartline, J.; and Nekipelov, D. 2014. Mechanism Design for Data Science. In ACM EC 2014, 711–712.
• [Dasgupta and Ghosh 2013] Dasgupta, A., and Ghosh, A. 2013. Crowdsourced Judgement Elicitation with Endogenous Proficiency. In WWW 2013, 319–330.
• [Jurca and Faltings 2007] Jurca, R., and Faltings, B. 2007. Collusion-resistant, Incentive-compatible Feedback Payments. In ACM EC 2007, 200–209.
• [Jurca and Faltings 2009] Jurca, R., and Faltings, B. 2009. Mechanisms for Making Crowds Truthful. JAIR 34(1):209.
• [Lai and Robbins 1985] Lai, T. L., and Robbins, H. 1985. Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics 6:4–22.
• [Liu and Chen 2016a] Liu, Y., and Chen, Y. 2016a. Learning to Incentivize: Eliciting Effort via Output Agreement. In IJCAI 2016.
• [Liu and Chen 2016b] Liu, Y., and Chen, Y. 2016b. Sequential Peer Prediction: Learning to Elicit Effort using Posted Prices. AAAI 2017, full version, http://arxiv.org/abs/1611.09219.
• [Miller, Resnick, and Zeckhauser 2005] Miller, N.; Resnick, P.; and Zeckhauser, R. 2005. Eliciting informative feedback: The peer-prediction method. Management Science 51(9):1359 –1373.
• [Prelec 2004] Prelec, D. 2004. A Bayesian Truth Serum for Subjective Data. Science 306(5695):462–466.
• [Radanovic and Faltings 2013] Radanovic, G., and Faltings, B. 2013. A Robust Bayesian Truth Serum for Non-Binary Signals. In AAAI 2013.
• [Roth and Schoenebeck 2012] Roth, A., and Schoenebeck, G. 2012. Conducting Truthful Surveys, Cheaply. In ACM EC 2012, 826–843.
• [Segal 2007] Segal, I. 2007. The Communication Requirements of Social Choice Rules and Supporting Budget Sets. Journal of Economic Theory 136(1):341–378.
• [Shnayder et al. 2016] Shnayder, V.; Agarwal, A.; Frongillo, R.; and Parkes, D. C. 2016. Informed Truthfulness in Multi-Task Peer Prediction. ACM EC 2016.
• [Waggoner and Chen 2014] Waggoner, B., and Chen, Y. 2014. Output Agreement Mechanisms and Common Knowledge. In HCOMP 2014.
• [Witkowski and Parkes 2012a] Witkowski, J., and Parkes, D. 2012a. A Robust Bayesian Truth Serum for Small Populations. In AAAI 2012.
• [Witkowski and Parkes 2012b] Witkowski, J., and Parkes, D. 2012b. Peer Prediction without a Common Prior. In ACM EC 2012, 964–981.
• [Witkowski and Parkes 2013] Witkowski, J., and Parkes, D. C. 2013. Learning the Prior in Minimal Peer Prediction. In the 3rd Workshop on SCUGC.
• [Witkowski et al. 2013] Witkowski, J.; Bachrach, Y.; Key, P.; and Parkes, D. C. 2013. Dwelling on the Negative: Incentivizing Effort in Peer Prediction. In HCOMP 2013.

## Appendix A Randomized task assignment

We explain on why we need a well structured random task assignment. First we make sure for each task, it has been assigned at least to two workers, so each of the assignment can serve as a peer evaluation for the other. Secondly for any pair of workers that share the same task, they also need to have distinct tasks, which is motivated by the mechanism (DG13).

A reader may notice that by simply assigning each task to all worker both of above conditions will be satisfied satisfied. (For example, assign 4 tasks to all 4 workers, and when distinct tasks are needed for worker 1 and 2, we can compute using only task 3 and 4 for each worker respectively.) But instead we will make our assignment process random, which will help exclude the possibility of more complicated collusion strategies (e.g. such as colluding on subset of tasks but not on the rest, see the example below), especially when also considering we can randomly shuffle labels (or IDs) of both workers and tasks. Therefore in this paper we only consider one type of collusion, that is if workers decide to contribute uninformative signals, they will report the same labels for all tasks.

##### Example on more sophisticated collusion:

Suppose workers are assigned the same set of tasks with the same ID. And for simplicity we assume there are even number of tasks. Workers agree on the IDs of tasks, and they will agree on reporting -1 for odd number ID tasks, and +1 for even IDs. Then

 Bi(k) =1(Li(k)=Lj(k))−Ldi⋅Ldj−¯¯¯¯Ldi⋅¯¯¯¯Ldi =1−12⋅12−12⋅12=12,

which is the maximum score that can be achieved as

 1(Li(k)=Lj(k))≤1, Ldi⋅Ldj+¯¯¯¯Ldi⋅¯¯¯¯Ldi≥1/2.

To see this, denote by – we know . The following holds:

 xy+(1−x)(1−y)≤(x2+y2)/2+((1−x)2+(1−y)2)/2=2x2−2x+12+2y2−2y+12≥1/2.
##### A feasible assignment:

Now we demonstrate that such an assignment that meets all requirements can be achieved. For example, suppose we have workers, set and we assign 4 tasks each time as follows:

 Worker 1:\leavevmode\nobreak {1,2},\leavevmode\nobreak Worker 2:\leavevmode\nobreak {1,3},\leavevmode\nobreak % Worker 3:\leavevmode\nobreak {2,4},\leavevmode\nobreak Worker 4:\leavevmode\nobreak {3,4}

Not hard to verify the above assignment satisfies all constraints we enforced. More generally when we have workers, we can prepare tasks to assign for each time. Again set . Denote the tasks received by worker as . Then the assignment can be adaptively decided as follows:

 t1(1)= 1,\leavevmode\nobreak t1(2)=2,\leavevmode\nobreak t2(1)=1,t2(2)=3, ti(1)= ti−2(2),ti(2)=min{ti−1(2)+1,N},∀i>2\leavevmode\nobreak .

It is easy to verify the above assignment rule satisfies our requirements: each tasks is assigned at least twice; workers receiving the same task also receive different tasks; all tasks are assigned the same number of times; not all tasks are assigned to all workers.

## Appendix B Proof of Theorem 2

###### Proof.

Denote by the priors for labels, and the probability of observing label +1 and -1 of each worker under effort level as

 p+,ei:=Pr[Li=+1|ei]=P+pei+P−(1−pei),

and . W.l.o.g. consider task of worker (when task is indeed assigned to worker ). Exerting effort or not on task will affect in two ways:

First on : notice the decision on does not affect . For , consider the fact that every other player is following the threshold policy that if , and truthfully reporting. Then

 E[pej(k)] =F(c∗)pH+(1−F(c∗))(1−pL), E[p+,ej(k)] =F(c∗)p+,H+(1−F(c∗))p+,L.

From which we have the utility difference between exerting and not exerting effort becomes:

 E[Bi(k)|ei=H]−E[Bi(k)|ei=L] = E[pHpej(k)+(1−pH)(1−pej(k))] \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak −E[pLpej(k)+(1−pL)(1−pej(k))] = (pH−pL)(2E[pej(k)]−1),

where

 E[pej(k)]=F(c∗)pH+(1−F(c∗))pL.

Now consider the effect of on . Suppose after the randomized assignment, appears in other tasks’ distinct set. Denote the set as . For , affects the “penalty term”: first we know by independence that

 E[Ldi⋅Ldj] =E[Ldi]⋅E[Ldj]\leavevmode\nobreak .

Then

 E[Ldi⋅Ldj|ei(k)=H]−E[Ldi⋅Ldj|ei(k)=L] = E[Ldj](E[Ldi|ei(k)=H]−E[Ldi|ei(k)=L]) = E[p+,ej]p+,H−p+,Ld\leavevmode\nobreak .

And similarly

 E[¯¯¯¯Ldi⋅¯¯¯¯Ldj|ei(k)=H]−E[¯¯¯¯Ldi⋅¯¯¯¯Ldj|ei(k)=L]=(1−E[p+,ej])p+,L−p+,Hd

Summarize above difference we know:

 E[ui|ei(k)=H]−E[ui|ei(k)=L]=V1⋅F(c∗)+V2,

where

 V2:=(2pL−1)[1−Dd(P+−P−)2](1−2P−)2\leavevmode\nobreak .

The equilibrium equation establishes itself when the above equals to the cost : (after re-arrange)

 B[V1⋅F(c∗)+V2]=c∗. (7)

When is chosen such that we know (as ). We claim when is concave, there exists a unique solution if : first , and when

 B[V1⋅F(c∗=¯c)+V2]≤cmax,

we have LHS of Eqn. (7) and the RHS intersects exactly once. So this unique intersecting point is the unique solution to Eqn. (7), and o.w., we have , that is is large enough so that exerting effort is always the best action to take.

Also reporting by reverting the answer, i.e., , the probability of matching the true label becomes , which leads to even less utility. So deviating from truthfully reporting is not profitable. ∎

## Appendix C Proof of Lemma 6

###### Proof.

This lemma is due to the fact that if everyone else is colluding to mis-lead the learner into believing a wrong price, a particular worker has no incentive to also do so: first his reported data will not affect his own price in the future. And as a rational worker, he should not care about the prices received by others. Due to the index rule we adopted, workers can do better than colluding: deviating from colluding to not exert effort possibly increases his current stage payment, when , and when cost is small enough:

 \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak E[Bi(k)|ei=H]−E[Bi(k)|ei=L] =(pH−pL)(2E[pej(k)]−1), =(pH−pL)(2pL−1)>0\leavevmode