Learning to Incentivize: Eliciting Effort via Output Agreement

# Learning to Incentivize: Eliciting Effort via Output Agreement

Yang Liu and Yiling Chen
Harvard University, Cambridge MA, USA
{yangl,yiling}@seas.harvard.edu
###### Abstract

In crowdsourcing when there is a lack of verification for contributed answers, output agreement mechanisms are often used to incentivize participants to provide truthful answers when the correct answer is hold by the majority. In this paper, we focus on using output agreement mechanisms to elicit effort, in addition to eliciting truthful answers, from a population of workers. We consider a setting where workers have heterogeneous cost of effort exertion and examine the data requester’s problem of deciding the reward level in output agreement for optimal elicitation. In particular, when the requester knows the cost distribution, we derive the optimal reward level for output agreement mechanisms. This is achieved by first characterizing Bayesian Nash equilibria of output agreement mechanisms for a given reward level. When the requester does not know the cost distribution, we develop sequential mechanisms that combine learning the cost distribution with incentivizing effort exertion to approximately determine the optimal reward level.

## 1 Introduction

Our ability to reach an unprecedentedly large number of people via the Internet has enabled crowdsourcing as a practical way for knowledge or information elicitation. For instance, crowdsourcing has been widely used for getting labels for training samples in machine learning. One salient characteristic of crowdsourcing is that a requester often cannot verify or evaluate the collected answers, because either the ground truth doesn’t exist or is unavailable or it is too costly to be practical to verify the answers. This problem is called information elicitation without verification (IEWV) Waggoner and Chen [2014].

In the past decade, researchers have developed a class of economic mechanisms, collectively called the peer prediction mechanisms Prelec [2004]; Miller et al. [2005]; Jurca and Faltings [2006]; Jurca and Faltings [2009]; Witkowski and Parkes [2012a, b]; Radanovic and Faltings [2013]; Frongillo et al. [2015], for IEWV. The goal of most of these mechanisms is to design payment rules such that participants truthfully report their information at a game-theoretic equilibria. Each of these mechanisms makes some restriction on the information structure of the participants. Under the restriction, truthful elicitation is then achieved by rewarding a participant according to how his answer compares with those of his peers. Within this class, output agreement mechanisms are the simplest and they are often adopted in practice von Ahn and Dabbish [2004]. In a basic output agreement mechanism, a participant receives a positive payment if his answer is the same as that of a random peer and zero payment otherwise. When the majority of the crowd hold the correct answer, output agreement mechanisms can truthfully elicit answers from the crowd at an equilibrium.

Most of these works on peer prediction mechanisms, with the exception of Dasgupta and Ghosh [2013] and Witkowski et al. [2013], assume that answers of participants are exogenously generated, that is, participants are equipped with their private information. However, in many settings, participants can exert more effort to improve their information and hence the quality of their answers is endogenously determined. Recent experiments Yin and Chen [2015]; Ho et al. [2015] have also shown that the quality of answers can be influenced by the magnitude of contingent payment in settings where answers can be verified.

In this paper, we study eliciting efforts as well as truthful answers in output agreement mechanisms. Taking the perspective of a requester, we ask the question of how to optimally set the payment level in output agreement mechanisms when the requester cares about both the accuracy of elicited answers and the total payment.

Specifically, we focus on binary-answer questions and binary effort levels. We allow workers to have heterogeneous cost of exerting effort. Such a cost is randomly drawn from a distribution that is common knowledge to all participants. We consider two scenarios. In the first scenario, a static setting, the requester is assumed to know the cost distribution of the participants. Her objective is to set the payment level in output agreement mechanisms such that when a game-theoretic equilibrium is reached, her expected utility is maximized. In the second scenario, a dynamic setting, the data requester doesn’t know the cost distribution of the participants but only knows an upper bound of the cost. Here, the requester incorporates eliciting and learning the cost distribution into incentivizing efforts in output agreement mechanisms when she repeatedly interacts with the set of participants over multiple tasks. The ultimate goal of the requester is to learn to set the optimal payment level in this sequential variant of output agreement mechanism for each interaction so that when participants reach a game-theoretic equilibrium of this dynamic game, the data requester minimizes her regret on expected utility over the sequence of tasks.

We summarize our main contributions as follows:

• Since the quality of answers is endogenously determined, a requester’s utility depends on the behavior of participants. Optimizing the payment level requires an understanding of the participant’s behavior. We characterize Bayesian Nash equilibria (BNE) for two output agreement mechanisms with any given level of payment and show that at equilibrium there is a unique threshold effort exertion strategy that returns each worker highest expected utility, when there is no collusion among workers.

• For the static setting where the requester knows the cost distribution, when the cost distribution satisfies certain conditions, we show that the optimal payment level in the two output agreement mechanisms is a solution to a convex program and hence can be efficiently solved.

• For the dynamic setting where the requester doesn’t know the cost distribution, we design a sequential mechanism that combines eliciting and learning the cost distribution with incentivizing effort exertion in a variant of output agreement mechanism. Our mechanism ensures that participants truthfully report their cost of effort exertion when asked, in addition to following the same strategy on effort exertion and answer reporting as that in the static setting for each task. We further prove performance guarantee of this mechanism in terms of the requester’s regret on expected utility.

### 1.1 Related work

The literature on peer prediction mechanisms hasn’t addressed costly effort until recently. Dasgupta and Ghosh [2013] and Witkowski et al. [2013] are the two papers that formally introduce costly effort into models of information elicitation without verification. Dasgupta and Ghosh [2013] design a mechanism that incentivizes maximum effort followed by truthful reports of answers in an equilibrium that achieves maximum payoffs for participants. Witkowski et al. [2013] focuses on simple output agreement mechanisms as this paper. They study the design of payment rules such that only participants whose quality is above a threshold participate and exert effort. Both Dasgupta and Ghosh [2013] and Witkowski et al. [2013] assume that the cost of effort exertion is fixed for all participants and is known to the mechanism designer. This paper studies effort elicitation in output agreement mechanisms but allow participants to have heterogeneous cost of effort exertion drawn from a common distribution. Moreover, we consider a setting where the mechanism designer doesn’t know this cost distribution, which leads to an interesting question of learning to optimally incentivize effort exertion followed by truthful reports of answers in repeated interactions with a group of participants.

Roth and Schoenebeck [2012] and Abernethy et al. [2015] consider strategic data acquisition for estimating the mean and statistical learning in general respectively. Both works do not consider costly effort but participants may have stochastic and heterogeneous cost for revealing their data and need to be appropriately compensated. Moreover, these two works all assume that workers won’t misreport their obtained answers.

Caveats: With output agreement mechanisms, workers can achieve an uninformative equilibrium by colluding, which returns a higher utility for each worker. Our model and current results do not remove this caveat. For static scenario, it is promising to adopt the method introduced in Dasgupta and Ghosh [2013] to rule out such a case; nevertheless we conjecture that ruling out collusions in a dynamic setting with returning workers is much more challenging. This merits a future study.

## 2 Problem formulation

### 2.1 Our mechanisms

A data requester has a set of tasks that she wants to obtain answers from a crowd of candidate workers. In this paper, we consider binary-answer tasks, for example, identifying whether a picture of cells contains cancer cells, and denote the answer space of each task as . The requester assigns each task to randomly selected workers, with being potentially much less than .111We assume is fixed, though how to optimally choose could be an interesting future direction. Such a redundant assignment strategy, when combined with some aggregation method (e.g. majority voting), has been found effective in obtaining accurate answers Sheng et al. [2008]; Liu and Liu [2015].

The requester cannot verify the correctness of contributed answers for a task, either because ground truth is not available or verification is too costly and defies the purpose of crowdsourcing. Thus, in addition to a base payment, each worker is rewarded with a contingent bonus that is determined by how his answer compares with those of other workers for completing a task. Specifically:

1. The requester assigns a task to a randomly selected subset of workers, where . She announces a base payment and a bonus , as well as the criteria for receiving the bonus. The criteria of receiving the bonus is specified by an output agreement mechanism, which we will introduce shortly

2. Each worker independently submits his answer to the requester.

3. After collecting the answers, the requester pays base payment to every worker who has submitted an answer and a bonus to those who met the specified criteria.

The criteria for receiving bonus is specified by an output agreement mechanism. Output agreement is a term introduced by von Ahn and Dabbish [2008] to capture the idea of “rewarding agreement” in their image labeling game, the ESP game von Ahn and Dabbish [2004]. We define two variants of output agreement mechanisms:

#### Peer output agreement (PA):

For each worker , the data requester randomly selects a reference worker and . If , worker receives bonus . Note worker ’s reference worker could be different from .

#### Group output agreement (GA):

For each worker , the data requester compares with the majority answer of the rest of the workers, , where if , if and if . If , worker receives bonus .

### 2.2 Agent models

A worker can decide how much effort to exert to complete a task and the quality of his answer stochastically depends on his chosen effort level. Specifically, a worker can choose to either exert or not exert effort. If a worker exerts effort, then with probability his answer is correct. If a worker does not exerts effort, with probability , where , he will provide the correct answer. We further assume , that is, when no effort is exerted the worker can at least do as well as random guess. This assumption is also used by Dasgupta and Ghosh [2013] and Karger et al. [2011, 2013]. For now, we assume and are the same for all workers.

Since workers can choose their effort level, the quality of an answer is endogenously determined. Let represents the chosen effort level of worker , with corresponding to not exerting effort and corresponding to exerting effort. The accuracy of worker can be represented as

Workers have heterogeneous abilities, which are reflected by their cost of exerting effort. When worker doesn’t exert effort, he incurs zero cost. A cost of is incurred if agent chooses to exert effort on a task. is randomly generated according to a distribution with pdf and cdf for each pair of (worker, task). We further assume this distribution stays the same222Realization for each (worker, task) pair can be very different. across all workers and all tasks, and it has a bounded support . Moreover we enforce the following assumption on :

###### Assumption 2.1.

is strictly concave on .

This assumption is stating that the probability of having a larger cost is decreasing. Several common distributions, e.g. exponential and standard normal (positive side), satisfy this assumption. Throughout this paper, we assume is common knowledge among all workers.333In practice each worker can estimate such distribution based on their past experiences. Nevertheless each realized cost is private information, that is each worker observes his own realized cost , but not the one for others. In Section 3, we assume the requester also has full knowledge of , but we relax this assumption in Section 4.

Given that the cost of not exerting effort is zero, the positive base payment ensures that every worker will provide an answer for a task assigned to him. We focus on understanding how to determine the bonus in output agreement mechanisms to better incentivize effort in this paper. The base payment doesn’t enter our analysis directly but it allows us to not worry about workers’ decisions on participation. When reporting their answer to the data requester, workers can choose to report truthfully, or to mis-report. Denote this decision variable for each worker as , where represents worker truthfully reporting his answer, and represents worker mis-reporting (reverting the answer in our case). Then the accuracy of each worker ’s report is a function of :

 pi(ei,ri)=pi(ei)ri+(1−pi(ei))(1−ri) .

When each worker takes actions , we denote the probability that worker receives bonus as . In the PA mechanism, this quantity is

 Pi,B({(ej,rj)}j)=∑j≠iP(Li=Lj)N−1

In the GA mechanism, it is Then, the utility for worker is:

 ui({(ej,rj)}j)=b−eici+B⋅Pi,B({(ej,rj)}j) .

### 2.3 Requester model

The data requester has utility function , which in theory can be of various forms balancing accuracy of elicited answers and total budget spent. In this paper, we assume that the requester uses majority voting to aggregate elicited answers and has utility function

 UD(B)=Pc(N,B)−b N−B Ne(B),

where is the probability that the majority answer is correct, and is the number of workers who receive the bonus. Data requester’s goal is to find a s.t.

 B∗∈argmaxB∈\mathbbmR+Pc(N,B)−b N−B E[Ne(B)] . (1)

Notice both and depend on workers’ strategy towards effort exertion and answer reporting. The equilibrium analysis in the next section will help us define these quantities rigorously. The data requester is then hoping to choose a reward level that maximizes the expected utility at an equilibrium.

## 3 Optimal bonus strategy with known cost distribution

In this section we set out to find the optimal bonus strategy when the data requester knows workers’ cost distribution. Because the requester’s utility depends on the behavior of workers, we first characterize symmetric Bayesian Nash Equilibria (BNE) for the two output agreement mechanisms for an arbitrary bonus level . Then based on workers’ equilibrium strategies, we show the optimal can be calculated efficiently for certain cost distributions. Note that due to the independence of tasks, this is a static setting and we only need to perform the analysis for a single task.

### 3.1 Equilibrium characterization

For any given task, we have a Bayesian game among workers in . A worker’s strategy in this game is a tuple where specifies the effort level for worker when his realized cost is and gives the reporting strategy for the chosen effort level, with representing reporting truthfully and representing misreporting.

We first argue that at any Bayesian Nash equilibrium (BNE) of the game, must be a threshold function. That is, there is a threshold such that for all and for all at any BNE. The reason is as follows: suppose at a BNE worker exerts effort with cost . Since the other workers’ outputs do not depend on (due to the independence of reporting across workers), worker ’s chance of getting a bonus will not change when he has a cost and only obtains a higher expected utility by exerting effort. This allow us to focus on threshold strategies for effort exertion. We restrict our analysis to symmetric BNE where every worker has the same threshold for effort exertion, i.e. . In the rest of the paper, we often use to denote that a worker playing an effort exertion strategy with threshold . In addition, we use to denote the reporting strategy that , i.e. always reporting truthfully for either effort level.

#### Pa:

We have the following results for the PA mechanism.

###### Lemma 3.1.

The strategy profile is a symmetric BNE for the PA game if

 2(PH−PL)F(c∗)+2PL−1=c∗/((PH−PL)B) . (2)

Denote , the minimum bonus level needed to induce full effort exertion. With above lemma, we have the following equilibrium characterization.

###### Theorem 3.2.

When , there always exists a unique threshold such that is a symmetric BNE for the PA game:

• When , .

• O.w. is the unique solution to Eqn. (2).

This theorem implies that among all symmetric BNE, the effort exertion strategy is unique for a given bonus level when . When , we can prove similar results for the existence of such that is a symmetric BNE. But this is not a unique effort exertion strategy. In fact, we have a set of trivial symmetric BNE for all : , that is no one exerting effort combined with any reporting strategy. Nonethelss this trivial equilibrium returns strictly less expected utility for each worker.

We would like to note that always mis-reporting () combined with the same threshold for effort exertion as in Theorem 3.2 is also a symmetric BNE when . This equilibrium gives workers the same utility as the equilibrium in Theorem 3.2. This phenomenal has also been observed by Dasgupta and Ghosh [2013] and Witkowski et al. [2013]. Dasgupta and Ghosh [2013] argue that always mis-reporting is risky, and workers may prefer breaking the tie towards always truthful reporting.

#### Ga:

For GA, directly calculating the probability term for matching a majority voting is not easy; but if we adopt a Chernoff type approximation for it, and suppose such approximation is common knowledge444This is not entirely unreasonable as in practice this Chernoff type bounds are often used to estimate such majority voting probability term., we can prove similar results.

Similar to Lemma 3.1, we can show that the strategy profile is a symmetric BNE for the GA game if

 1−2[(α−1)F(c∗)+ 1]N−1=c∗/(B(PH−PL)) . (3)

where .

Denote we have:

###### Theorem 3.3.

When , there always exists a unique threshold such that is a symmetric BNE for the GA game:

• When , .

• O.w., is the unique solution to Eqn. (3).

Moreover we can show that the reward level and the total expected payment if lower in GA than in PA for eliciting the same level of efforts.

###### Lemma 3.4.

Denote the smallest bonus level corresponding to an arbitrary equilibrium threshold for PA and GA as and respectively. Then when is sufficiently large (e.g., ). Furthermore, the total payment in GA is lower than that in PA.

This result also implies that adopting GA will lead to a higher requester utility.

#### Heterogeneity of Pl and Ph.

So far we have assumed that and are the same for all workers. If workers have heterogeneous accuracy that are generated from some distribution with mean , we can show that the above results hold in a similar way, with more involved arguments.

### 3.2 Optimal solution for data requester

Now consider the optimization problem stated in Eqn. (1) for the requester’s perspective. For each , denote as the corresponding strategy profile at equilibrium. can then be calculated based on (controlling how much effort can be induced), and . Same can be done for . Denote the optimization problem in (1) with above calculation as . Directly investigating the two objective functions may be hard. We seek to relax the objectives. First of for PA, we will be omitting the term as when is only slightly larger than 0.5, this quantity is close to 0. Also for both PA and GA, we again use the Chernoff type of approximation for calculating . We further introduce three conditions: (i) is twice differentiable and . (ii) is convex on . (iii) satisfies that exists and being non-negative.

###### Lemma 3.5.

If (i) and (ii) hold, the objective function of is concave if we adopt PA. When (ii) and (iii) hold, the objective function of is concave if we adopt GA.

For example, exponential distribution (exp()) for satisfies (i)&(ii) for PA; and exp() for satisfies (ii)&(iii) for GA. It is worth to note above results hold for a wide range of other s: for instance the ones with a linear combination of and .

## 4 Learning the optimal bonus strategy

In this section we propose a sequential mechanism for learning the optimal bonus strategy, when the requester has no prior knowledge of the cost distribution but only knows . This assumption can be further relaxed by assuming knowing an upper bound of instead of knowing precisely. Also similar as last section, are known. In reality these two quantities can be estimated through a learning procedure by repeated sampling and output matching as shown in Liu and Liu [2015], via setting bonus level and 555Calculating only requires the knowledge of , not . respectively (to induce effort level corresponding to ). In this work we focus on learning the cost functions, which is a more challenging task when the workers are strategic. We are in a dynamic setting where the requester sequentially ask workers to complete a set of task. In our mechanisms, requesters can ask workers to report their costs of effort exertion for a task and based on their reports decide on the bonus level for the current task and for future tasks in a output-agreement-style mechanism.

#### (P1):

We start our discussions with a simpler case. When asked to report their cost, workers maximize their collected utility from a set of data elicitation tasks and are not aware of the potential influence of their reports on calculating optimal bonus levels for any future tasks. The data requester’s goal is to elicit cost data to estimate cost distribution and then the optimal bonus level , such that when is applied to a newly arrived task we can bound where is the optimal bonus level if the cost distribution is known.

#### (P2):

We then consider the case when workers are forward looking and are aware of that their reported cost on a task will be utilized to calculate optimal bonus strategy for future tasks. We form a sequential learning setting, where we separate the stages for task assignment into two types: one for data elicitation, which we also refer as exploration, and the other for utility maximization, which we refer as exploitation. The data requester’s objective in this case is to minimize the regret defined as follows:

 R(T)=T∑t=1E|UD({Bi(t)}i∈U)−UD(B∗GA)| , (4)

where is the optimal bonus level for GA when cost distribution is know,666Since GA is more cost efficient, we define the regret w.r.t. the optimal utility that can be obtained via GA. and is the bonus bundle offered at time . Note is mechanism dependent: both and depends on not only the bonus level, but also the equilibrium behavior in a particular mechanism.

For simplicity of presentation, throughout this section we consider : this is to remove the ambiguity introduced in by the trivial equilibrium . Also we assume with the same expected utility, workers will favor truthful reporting .

### 4.1 (M_Crowd) for (P1)

Suppose the data requester allocates tasks to elicit the cost data sequentially, and exactly one of them is assigned to the workers at each time step . For simplicity of analysis we fix the set of workers we will be assigning tasks to. Denote worker ’s realized cost for the -th task as . We propose mechanism (M_Crowd):

#### Remarks:

1. When a worker, say worker , reports higher than the selected threshold, his probability of receiving a bonus will be calculated using the following experiment, which is independent of his output: suppose out of workers, there are of them reported lower than . Then we will ”simulate” workers’ reports with the following coin-toss procedure: toss -coin and -coin. Assign a -coin toss to worker , and select a reference answer from the rest of the tosses, and compare their results. If there is a match, worker will receive a bonus. Simply put, the probability for receiving a bonus can be calculated as the matching probability in the above experiment. 2. Since we have characterized the equilibrium equation for PA with a clean and simple form, this set of equilibriums is good for eliciting workers’ data. 3. After estimating the bonus level for each worker , the data requester will add a positive perturbation term to each of them. This is mainly to remove the bias introduced by (i) imperfect estimation due to finite number of collected samples, and (ii) the (possible) mis-reports from workers. Such term will become clear later in the stated results. 4. The fact that we can use collected cost data to estimate depends crucially on the assumption that the cost distribution is the same for all tasks.

### 4.2 Equilibrium analysis for (M_Crowd)

We present the main results for characterizing workers’ cost data reporting strategies at an equilibrium. Because of the independence of tasks, the effort exertion on each stage is essentially a static game. While for cost reporting, even though we are in a dynamic game setting, we again adopt BNE as our general solution concept. It may sound more intuitive to use Perfect Bayesian equilibrium (PBE) to define our solution in dynamic setting, but we argue BNE and PBE does not make conceptual difference in our case. Note that workers’ decision on effort exertion is not directly observable by others, and the only signals one worker can use to update their belief towards others’ effort exertion actions are the offered bonus levels. However due to the stochastic nature of the calculated , any realization is on the equilibrium path with positive probability (though could be arbitrarily small). Simply put, in our case there is no off-equilibrium path information set.

Let denotes worker ’s utility at time . We will adopt approximate BNE as our exact solution concept, which is defined as follows:

###### Definition 4.1.

A set of reporting strategy is -BNE if for any , we have

 T∑t=1E[maxei,riuti(~ci,~c−i)]/T≥T∑t=1E[maxei,riuti(~c′i,~c−i)]/T−ϵ .

Here we explicitly denote the expected utility for each worker as a function of . Note this is rather a short-hand notation, as s also depend on the effort exertion and reporting strategies. The term allows worker to optimize his effort exertion and reporting procedure based on their cost reporting.

###### Theorem 4.2.

With (M_Crowd), set , let being arbitrarily small, there exists a -BNE for each worker with reporting at time such that

 max{ci(t)−ϵ1(t),0}≤~ci(t)≤min{ci(t)+ϵ2(t),cmax} ,

where .

The effort exertion game at each step looks alike the static game introduced in Section 3 with the following difference: instead of workers who have cost will exert effort, now it is the workers who reported will exert effort. This is mainly due to the addition of the perturbation term to the estimated bonus level. Meanwhile the mechanism excludes workers who reported higher than the threshold from exerting effort by offering bonus with a probability that is independent of worker’s output. Nevertheless, we can bound the fraction of workers whose actions are different for the above two games. We provide intuitions for the proof.

#### Over-reporting

Over-reporting by worker will mislead the data requester into believing that finishing the tasks costs more than it actually is, so this can lead to a higher estimation of . This will induce more effort from other workers, which will in turn increase the utility for worker . We bound the extra efforts exerted from other workers , where we will be utilizing the second step of (M_Crowd) that the one reported higher than the threshold will be excluded from exerting effort (since worker’s probability of winning a bonus will be independent of her report), and the decoupling step (Step 3) where each worker’s bonus level will only be calculated over data collected from others. On the other hand, over-reporting will decrease the chance of receiving bonus (excluded from effort exertion).

#### Under-reporting

When a worker under-reports, he will gain by having a higher bonus in expectation – this is due to (i) the fact that the threshold is randomly determined, and (ii) we added positive perturbation to estimated bonus level. The loss is due to the fact that with under-reporting, with positive probability, exerting effort costs more than the threshold cost. The regulation for under-reporting mainly comes from the thresholding step of (M_Crowd).

### 4.3 Performance of (M_Crowd)

With this set of collected data, we bound the performance loss in offering optimal bonus level for an incoming task (or task ). Suppose we adopt GA, where the optimal bonus level with known cost distribution is given by , and the estimated optimal solution is given by . We will have the following lemma: (similar results hold for PA)

###### Lemma 4.3.

With probability being at least ,

 |UD(~B∗GA)−UD(B∗GA)|=o(√log2/η2NT+√logTT) .

When we chose , the above regret term is roughly on the order of  .

### 4.4 (RM_Crowd) for (P2)

We propose a (RM_Crowd) for (P2):

#### Remarks:

1. The dependence on is to simplify the presentation and our algorithm design. This can be easily extended to a -independent one. 2. At exploitation phases we assume there exists a solver that can find the optimal solution with a noisy estimation of . In practice search heuristics can help achieve the goal. 3. We adopted different bonus mechanisms for different phases. When we calculate the bonus level according to a particular mechanism (PA or GA), we will also adopt it for evaluating workers’ answers. 4. When using GA, the independent probability for giving out bonus when a report is higher than the threshold will be adjusted to a probability of matching a majority voting of the experiment we presented for (M_Crowd).

###### Theorem 4.4.

With (RM_Crowd), set , let and being arbitrarily small, there exists a -BNE for each worker with reporting at time such that

 max{ci(t)−ϵ1(t),0}≤~ci(t)≤min{ci(t)+ϵ2(t),cmax} ,

where .

We have similar observations for the effort exertion game in (RM_Crowd) as we made for (M_Crowd). Further we prove the following regret results:

###### Lemma 4.5.

Order-wise, the best is when , which leads to a bound on the order of .

## 5 Conclusion

In this paper we focus on using output agreement mechanisms to elicit effort, in addition to eliciting truthful answers, from crowd-workers when there is no verification of their outputs. Workers’ cost for exerting efforts are stochastic and heterogeneous. We characterize the symmetric BNE for workers’ effort exertion and reporting strategies for a given bonus level, and show data requester’s optimal bonus strategy at equilibrium is a solution to a convex program for certain cost distribution. Then a learning procedure is introduced to help the requester learn the optimal bonus level via eliciting cost data from strategic workers. We bound the mechanism’s performance loss w.r.t. offering the best bonus bundle, compared to the case when workers’ cost distribution is known a priori.

## References

• Abernethy et al. [2015] Jacob Abernethy, Yiling Chen, Chien-Ju Ho, and Bo Waggoner. Actively purchasing data for learning. 2015.
• Dasgupta and Ghosh [2013] Anirban Dasgupta and Arpita Ghosh. Crowdsourced judgement elicitation with endogenous proficiency. In Proceedings of the 22nd international conference on World Wide Web, pages 319–330. International World Wide Web Conferences Steering Committee, 2013.
• Frongillo et al. [2015] Rafael M. Frongillo, Yiling Chen, and Ian A. Kash. Elicitation for Aggregation. In Proceedings of the 29th Conference on Artificial Intelligence (AAAI’15), 2015.
• Ho et al. [2015] Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. Incentivizing high quality crowdwork. In Proceedings of the 24th International Conference on World Wide Web, pages 419–429. International World Wide Web Conferences Steering Committee, 2015.
• Jurca and Faltings [2006] R. Jurca and B. Faltings. Minimum payments that reward honest reputation feedback. In Proceedings of the 7th ACM conference on Electronic commerce, EC ’06, pages 190–199. ACM, 2006.
• Jurca and Faltings [2009] R. Jurca and B. Faltings. Mechanisms for making crowds truthful. Journal of Artificial Intelligence Research, 34(1):209, 2009.
• Karger et al. [2011] David R Karger, Sewoong Oh, and Devavrat Shah. Iterative learning for reliable crowdsourcing systems. In Advances in neural information processing systems, pages 1953–1961, 2011.
• Karger et al. [2013] David R Karger, Sewoong Oh, and Devavrat Shah. Efficient crowdsourcing for multi-class labeling. In ACM SIGMETRICS Performance Evaluation Review, volume 41, pages 81–92. ACM, 2013.
• Liu and Liu [2015] Yang Liu and Mingyan Liu. An online learning approach to improving the quality of crowd-sourcing. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’15, pages 217–230, New York, NY, USA, 2015. ACM.
• Miller et al. [2005] Nolan Miller, Paul Resnick, and Richard Zeckhauser. Eliciting informative feedback: The peer-prediction method. Management Science, 51(9):1359 –1373, 2005.
• Prelec [2004] Dražen Prelec. A bayesian truth serum for subjective data. Science, 306(5695):462–466, 2004.
• Radanovic and Faltings [2013] G. Radanovic and B. Faltings. A robust bayesian truth serum for non-binary signals. In Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI ’13, 2013.
• Roth and Schoenebeck [2012] Aaron Roth and Grant Schoenebeck. Conducting truthful surveys, cheaply. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 826–843. ACM, 2012.
• Sheng et al. [2008] Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 614–622. ACM, 2008.
• Shnayder et al. [2016] V. Shnayder, A. Agarwal, R. Frongillo, and D. C. Parkes. Informed Truthfulness in Multi-Task Peer Prediction. ArXiv e-prints, March 2016.
• von Ahn and Dabbish [2004] Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’04, pages 319–326. ACM, 2004.
• von Ahn and Dabbish [2008] Luis von Ahn and Laura Dabbish. Designing games with a purpose. Communications of the ACM, 51(8):58–67, 2008.
• Waggoner and Chen [2014] Bo Waggoner and Yiling Chen. Output agreement mechanisms and common knowledge. In Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing, HCOMP’14, 2014.
• Witkowski and Parkes [2012a] J. Witkowski and D. Parkes. A robust bayesian truth serum for small populations. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI ’12, 2012.
• Witkowski and Parkes [2012b] J. Witkowski and D.C. Parkes. Peer prediction without a common prior. In Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pages 964–981. ACM, 2012.
• Witkowski et al. [2013] Jens Witkowski, Yoram Bachrach, Peter Key, and David C. Parkes. Dwelling on the Negative: Incentivizing Effort in Peer Prediction. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing (HCOMP’13), 2013.
• Yin and Chen [2015] Ming Yin and Yiling Chen. Bonus or Not? Learn to Reward in Crowdsourcing. In In the Proc. of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15), 2015, 2015.

## Appendix A Proof for Lemma 3.1

###### Proof.

We denote worker ’s expected utility777Throughout the proof we will interchange the wording between utility and bonus for workers’ payoff. for each task as when effort exertion and reporting strategy has been adopted by the crowd. And we shorthand it as . Also we denote by worker ’s utility when a threshold policy for effort exertion is adopted.

First we show at equilibrium workers will not deviate from truthfully reporting . Suppose all other workers are following . We consider the following two cases. When , the expected bonus worker can have by truthfully reporting is . By deviating we have the above term become . Since under equilibrium strategy, we know by deviating worker’s utility will decrease. Similarly we can show when , by mis-reporting, worker’s utility decreases from to . For more discussions please refer to Section C.

Let workers adopt a threshold policy , and truthfully report their labeling outcome . Consider the difference in worker ’s expected bonus between exerting effort and not:

 ui(ei=1,ri=1;c)−ui(ei=0,ri=1;c) =−ci+BN−1∑j≠iE[PHpj+(1−PH)(1−pj)] −BN−1∑j≠iE[PLpj+(1−PL)(1−pj)] =−ci+BN−1∑j≠i(PH−PL)E(2pj−1) . (5)

Consider the summation in Eqn. (5).

 ∑j≠iE(2pj−1) =2N−1∑k=0(N−1k)Fk(c)(1−F(c))N−1−k[kPH+(N−1−k)PL]−(N−1) =2N−1∑k=0(N−1k)Fk(c)(1−F(c))N−1−k(PH−PL)k+2(N−1)PL−(N−1) .

Consider the sum of combinatorial terms in above equation:

 N−1∑k=0(N−1k)Fk(c)(1−F(c))N−1−k⋅k =N−1∑k=1(N−1)!(N−1−k)!(k−1)!Fk(c)(1−F(c))N−1−k =(N−1)F(c)N−1∑k=1(N−2)!((N−2)−(k−1))!(k−1)!Fk−1(c)(1−F(c))(N−2)−(k−1) =(N−1)F(c)N−2∑k=0(N−2)!((N−2)−k)!k!Fk(c)(1−F(c))(N−2)−k =(N−1)F(c) .

Then set and , i.e., when there is no difference between exerting effort and not, we have

 2(PH−PL)F(c)+2PL−1=cB(PH−PL) .

With this it is easy to see when , , that is not exerting effort is a better move. While on the hand when , , worker should exert effort to maximize utility.

## Appendix B Proof for Theorem 3.2, with extension to pl=0.5

###### Proof.

First consider the case . When , we have

 2(PH−PL)F(c∗)+2PL−1=2PL−1>0=c∗B(PH−PL) .

Since LHS is an increasing, strictly concave in , and RHS is linear in , we know the LHS and RHS can only intersects once on . Now we discuss in two cases.

• First when

 2(PH−PL)F(cmax)+2PL−1>cmaxB(PH−PL),

we have (by concavity, as a combination of : )

 2(PH−PL)F(c)+2PL−1 ≥ccmax(2(PH−PL)F(cmax)+2PL−1)+(1−ccmax)(2PL−1) >ccmaxcmaxB(PH−PL)+0 =cB(PH−PL) .

This is implying that for all effort level , it is better to exert effort than to not, in which case the only equilibrium is which corresponds to full effort exertion.

• For the second case, when

 2(PH−PL)F(cmax)+2PL−1≤cmaxB(PH−PL) ,

there will be odd number of crossings between LHS and RHS of Eqn.(2) on , so there must exist only one of them (since there are two at most) corresponding to the solution of the equilibrium equation.

Now consider . We first rigorously state our results.

###### Lemma B.1.

When , is a symmetric BNE . Besides, there exists at most one more threshold policy such that is an equilibrium:

• When , there is no such .

• When , .

• O.w., is the solution to

###### Proof.

When , , so no worker will exert efforts, and the LHS of Eqn.(2) reduces to 0, which matches the RHS. Moreover as a strictly concave function intersects with a linear function at most twice, there exists at most one more intersection point (equilibrium point). We again discuss in cases:

• When the following holds,

 1B(PH−PL)2>2f(0), or B<12f(0)(PH−PL)2

we will have for ,

 cB(PH−PL)>2c(PH−PL)f(0)≥2(PH−PL)F(c),

where the last inequality is due to the concavity of . So for any , we have for Eqn. (2) , i.e., no effort exertion is the only equilibrium, or equivalently .

• When , we will have

 2(PH−PL)F(cmax)=cmax(PH−PL)B .

By strict concavity of , we know for all (as similarly reasoned for the case with ),

 2(PH−PL)F(c)>cB(PH−PL).

Therefore we will also know by setting larger we will have for all

 2(PH−PL)F(c)>cB(PH−PL).

Hence effort exerting is always a better move, regardless of .

• For the case in the middle, we know there exists (and guaranteed to exist) a unique intersection point, which solution can be obtained by simply setting

 B(c∗)=c∗2F(c∗)(PH−PL)2,

and solve for .

Now we prove the following claim we made in the paper that the trivial equilibrium returns less expected utility than a non-trivial one . We demonstrate this under (PA). Similar results can be established for (GA).

###### Lemma B.2.

Under (PA), when there exists a non-trivial equilibrium , it returns higher expected bonus than adopting the trivial one .

###### Proof.

We prove this by discussing two cases. When , exerting effort returns higher expected bonus compared to no effort exertion, when every worker else is exerting efforts according to the threshold policy :

 ui(ei=1,ri=1;c∗)>ui(ei=0,ri=1;c∗) =(PL(F(c∗)PH+(1−F(c∗))⋅0.5) +(1−PL)[1−(F(c∗)PH+(1−F(c∗))⋅0.5)])⋅B =B2 ,

which is exactly the expected bonus a worker can have under the trivial equilibrium. Similarly when , we have

 ui(ei=0,ri=1;c∗)=B2 ,

with which we finish our proof. ∎

## Appendix C Proofs for discussions on equilibrium strategies

We provide proofs for some claims we made in the discussions of other equilibrium strategies. As a remainder, we summarize the results here:

• Mis-report also induces an equilibrium, which returns each worker the same utility (with the same effort exertion threshold).

• There does not exist effort-dependent equilibrium.

Truthful report

###### Lemma C.1.

Always mis-reporting () combined with the same threshold for effort exertion as in Theorem 3.2 is also a symmetric BNE when .

###### Proof.

Throughout this section we will shorthand the following notations:

 pi:=pi(ei), ~pi:=pi(ei,ri=0) .

We first show mis-reporting (reverting the answer in our binary labeling case) with the same effort level is also an equilibrium. When all other workers are mis-reporting, we will have

 ui(ei,ri=0;c) =b−eici+BN−1∑j≠i[~pi~pj+(1−~pi)(1−~pj)] =b−eici+B~pi∑j≠i(1−2pj)+B∑j≠ipj .

Since , minimizing , i.e., by setting , will maximize worker ’s utility. Also

 ui(ei,ri=0;c) =b−eici+BN−1∑j≠i[~pi~pj+(1−~pi)(1−~pj)] =b−eici+BN−1∑j≠i[pipj+(1−pi)(1−pj)] =ui(ei,ri=1;c) ,

i.e., the two equilibrium returns the same expected bonus. ∎

Under the binary signal case, in fact the above mis-reporting strategy is equivalent with permutation strategy as studied in Shnayder et al. [2016]. It is known when signals are categorical Shnayder et al. [2016] (which is our case), permutation strategy of an equilibrium remains as an equilibrium, and returns the same utility at equilibrium.

Also no other mixed strategy between truthful report and mis-report with mixing probability will be an equilibrium. Denote labeling accuracy for such mixed strategy for each worker as . Then observe that worker ’s utility can be written as follows:

 b−eici+BE[~pi(δ)]E[∑j≠i(2~pj(δ)−1)]+BE[∑j≠i(1−~p(δ)j)] .

Depending on different , is either positive or negative. Correspondingly worker will have incentives to deviate to or .

It is also worth noting that this issue can be resolved by assuming a known prior on the label. We can then generate a random outcome based on the prior (by tossing a coin for example) and compare each worker’s outcome with this random bit with certain probability. This will reduce the utility for untruthful reporting. To see this, suppose there is a certain prior on the labels or outcomes for each task. Denote the prior probability for getting a as , and suppose we have this extra piece of information . Then with probability , instead of comparing worker ’s answer with a randomly selected peer, we randomly toss a coin with probability , and compare the worker’s answer to the outcome of this toss. This fraction of utility is then given by

 ϵ(p0Hpi(ei,ri)+(1−p0H)(1−pi(ei,ri)))=ϵ(pi(ei,ri)(2p0H−1)+1−p0H) ,

from which we observe truthful reporting will return a higher utility, as (i) we will have a higher and (ii) .

Effort-dependent reporting

###### Lemma C.2.

There does not exists effort-dependent reporting at equilibrium.

###### Proof.

As similarly argued in the proof for Lemma C.1, suppose there is an effort-dependent reporting equilibrium . Then again depending on the sign of being either positive or negative, when , worker will choose to either truthfully report on both cases or mis-report to match other’s outputs.

When , worker has no incentive to exert effort – so there cannot be an effort-dependent equilibrium. ∎

## Appendix D Heterogeneous Pl,ph

###### Lemma D.1.

The claims in Theorem 3.2 and 3.3 hold when are generated according to certain distribution with mean .

###### Proof.

It is quite clear that with (PA) the previous results hold. The reason is under peer agreement, the utility function is linear in each worker’s expertise level (consider truthful reporting):

 E[ui(ei=1,ri=1;