Partial Truthfulness in Minimal Peer Prediction Mechanisms with Limited Knowledge

# Partial Truthfulness in Minimal Peer Prediction Mechanisms with Limited Knowledge

## Abstract

We study minimal single-task peer prediction mechanisms that have limited knowledge about agents’ beliefs. Without knowing what agents’ beliefs are or eliciting additional information, it is not possible to design a truthful mechanism in a Bayesian-Nash sense. We go beyond truthfulness and explore equilibrium strategy profiles that are only partially truthful. Using the results from the multi-armed bandit literature, we give a characterization of how inefficient these equilibria are comparing to truthful reporting. We measure the inefficiency of such strategies by counting the number of dishonest reports that any minimal knowledge-bounded mechanism must have. We show that the order of this number is , where is the number of agents, and we provide a peer prediction mechanism that achieves this bound in expectation.

## 1Introduction

One of the crucial prerequisites for a good decision making procedure is the availability of accurate information, which is often distributed among many individuals. Hence, elicitation of distributed information represents a key component in many systems that are based on informative decision making. Typically, such an information elicitation scenario is modeled by representing individuals as rational agents who are willing to report their private information in return for (monetary) rewards.

We study a setting in which reports cannot be directly verified, as it is the case when eliciting opinions regarding the outcome of a hypothetical question [11]. Other examples include: product reviewing, where the reported information can describe ones taste, which is inherently subjective1; peer grading, where a rater needs to grade an essay; or eliciting information that is highly distributed, as in community sensing.

Since a data collector cannot directly verify the reported information, it can score submitted reports only by examining consistency among them. Such an approach is adopted in peer prediction mechanisms, out of which the most known examples are the peer prediction method [19] and the Bayesian truth serum [20]. While there are different ways of classifying peer prediction mechanisms, the most relevant one for this work distinguishes two categories of mechanisms by: 1) the amount of additional information they elicit from agents; 2) the knowledge they have about agents’ beliefs.

The first category includes minimal mechanisms that elicit only desired private information, but have some additional information that enables truthful elicitation. For instance, the classical peer prediction [19] assumes knowledge about how an agent forms her beliefs regarding the private information of other agents, while other mechanisms (e.g., see [14]) relax the amount of knowledge they need by imposing different restrictions on the agents’ belief structures. Mechanisms from the second category elicit additional information to compensate for the lack of knowledge about agents’ beliefs. For example, the Bayesian truth serum [20], and its extensions [28], elicit agents’ posterior beliefs.

The mentioned mechanisms are designed for a single-task elicitation scenario in which an agent’s private information can be modeled as a sample from an unknown distribution. In a basic elicitation setting, agents share a common belief system regarding the parameters of the setting [19]. While the mechanisms typically allow some deviations from this assumption [9], these deviations can be quite constrained, especially when private information has a complex structure.2

We also mention mechanisms that operate in more specialized settings that allow agents to have more heterogeneous beliefs. Peer prediction without a common prior [27] is designed for a setting in which a mechanism can clearly separate a period prior to agents acquiring their private information from the period after the acquisition. It elicits additional information from agents which corresponds to their prior beliefs. More recently, many mechanisms have been developed for a multi-task elicitation scenario [5], primarily designed for crowdsourcing settings. The particularities of the multi-task setting enable the mechanisms to implicitly extract relevant information important for scoring agents (e.g., agents’ prior beliefs). For more details on peer prediction mechanisms, we refer the reader to [6].

Clearly, there is a tradeoff between the assumed knowledge and the amount of elicited information. The inevitability of such a trade-off can be expressed by a result of [21], which states that in the single-task elicitation setting, no minimal mechanism can achieve truthfulness for a general common belief system.

Contributions.

In this paper, we investigate single-task minimal peer prediction mechanisms that have only limited information about the agents’ beliefs, which precludes them from incentivizing all agents to report honestly. To characterize the inefficiency of such an approach, we introduce a concept of dishonesty limit that measures the minimal number of dishonest agents that any minimal mechanism with limited knowledge must allow. To the best of our knowledge, no such characterization has ever been proposed for the peer prediction setting. Furthermore, we provide a mechanism that reaches the lower bound on the number of dishonest agents. Due to the fact that the bound is logarithmic in the number of reports, aggregated reports converge to the true aggregate. Unlike the mechanism of [13], that also has a goal of eliciting an accurate aggregate, our mechanism does not require agents to learn from each other’s reports.

The full proofs to our claims can be found in the appendix.

## 2Formal Setting

We study a standard peer prediction setting where agents are assumed to have a common belief regarding their private information [19]. In the considered setting, a mechanism has almost no knowledge about the agents’ belief structure, which makes the elicitability of truthful information more challenging (e.g., [21]). We define our setting as follows.

There are agents whose arrival to the system is stochastic. We group agents by their arrival so that each group has a fixed number of agents, and we consider participation period of a group as a time .

To describe how agents form beliefs about their private information, we introduce a state , which is a random variable that takes values in set , which is assumed to be a (real) interval. We denote the associated distribution by , and assume that for all .

An agent’s private information, here called signal, is modeled with a generic random variable that takes values in a finite discrete set whose generic values are denoted by , , , etc. For each agent , her signal is generated independently according to a distribution that depends on state variable . This distribution is common for agents, i.e., for two agents and , and it is fully mixed, i.e., for all it holds that . Furthermore, we assume that private signals are stochastically relevant [19], meaning that posterior distributions and (obtained from , and ) differ for at least one value of whenever . Agents share a common belief about the model parameters of the model ( and ), so we denote these beliefs in the same way.

Agents report their private information (signals) to a mechanism, for which they get compensations in terms of rewards. Agents might not be honest, so to distinguish the true signal from the reported one, we denote reported values by . Since our main result depends on agents’ coordination, we also introduce a noise parameter that models potential imperfections in reporting strategies. In particular, we assume that with probability an agent is rational and reports a value that maximizes her expected payoff, while otherwise she heuristically reports a random value from . Notice that we do not consider adversarial agents. Furthermore, while in the development of our formal results we assume that is not dependent on , we also show how to apply our main mechanism when such a bias exists (Section Additional Considerations).

A mechanism needs not know the true value of ; it only needs to have an estimate that is in expectation equal to , and we show how to obtain from the reports. Furthermore, the belief of a rational agent incorporates the fact that a peer report is noisy, which means that , where is the value that a rational peer would report.

Beliefs about an agent’s signal or her report are defined on the probability simplex in -dimensional space, that we denote by . To simplify the notation for beliefs, we often omit and symbols. In particular, instead of using , we simply write , or instead of using , we write .

The payments of a mechanism are denoted by and they are applied on each group of agents separately. We are interested in -peer payment mechanisms that reward an agent by using her one peer , i.e., the reward function is of the form . As shown in [21], this restriction does not limit the space of strictly incentive compatible mechanisms when agents’ beliefs are not narrowed by particular belief updating conditions. Furthermore, we distinguish the notion of a mechanism, here denoted by , from a peer prediction payment function because different payment functions could be used on different groups of agents, i.e., at different time periods .

Solution concept.

From the perspective of rational agents, our setting has a form of a Bayesian game, hence we explore strategy profiles that are Bayesian-Nash equilibria. We are particularly interested in strict equilibria, in which rational agents have strict incentives not to deviate. Any mechanism that adopts honest reporting as a strict Bayesian-Nash equilibrium is called strictly Bayesian-Nash incentive compatible (BNIC).

## 3Our Approach

Let us begin by describing our approach in dealing with the impossibility of truthful minimal knowledge-bounded elicitation. A mechanism that we are building upon is described by the payment rule:

and is called the peer truth serum (PTS) [7]. is a fully mixed distribution that satisfies:

Provided that other rational agents are honest, the expected payoff of a rational agent with signal for reporting is:

Selecting proper values for and is an orthogonal problem to the one addressed in the paper, and is typically achieved using a separate mechanism [23], a pre-screening process [5], or by learning [18]. However, we do set so that the expected payoff is proportional to (because ). In other words, we remove an undesirable skew in agents’ expected payoffs that might occur due to the presence of non-strategic reports.3

Without additional restrictions on agents’ beliefs, it is possible to show that the PTS mechanism is uniquely truthful [9]. Condition is called the self-predicting condition, and it is crucial for ensuring the truthfulness of PTS.4 We say that a distribution is informative if it satisfies the self-predicting condition. Instead of assuming that a specific a priori known satisfies condition , we show that there always exists a certain set of distribution functions for which the condition holds, and although this set is initially not known, we show that one can learn it by examining the statistics of reported values for different reporting strategies.

### 3.1Phase Transition Diagram

We illustrate the reasoning behind our approach and a novel mechanism on a binary answer space . In this case, it has been shown that if we set to an agent’s prior belief , the self-predicting condition is satisfied, and, consequently, the PTS mechanism is BNIC [29]. However, in our setting, a mechanism has no knowledge about .

Consider what happens when is much smaller than . For signal value , this means that is much larger than . If an agent observes , her expected payoff when everyone is truthful is proportional to:

where the last inequality is due to the self-predicting condition. Therefore, agents who observe are incentivized to report it. However, agents who observe might not be incentivized to report truthfully, because if , we have:

In this case, one would naturally expect that both observations and lead to report , and it is easy to verify that this is an equilibrium of the PTS mechanism. Namely, the expected payoffs for reporting only increase if more agents report . Similarly, when is much larger than , one would expect that agents would report .

With this reasoning, we can construct a phase transition diagram, that shows how the expected frequency of reports equal to changes with the increase of , for a fixed posterior beliefs and , . The diagram is shown in Figure 1, and it has three phases:

• Phase 1, in which agents are not truthful and report .

• Phase 2, in which agents are truthful.

• Phase 3, in which agents are not truthful and report .

Notice that not all reports are equal to in phase 1 nor equal to in phase 3. This is due to the presence of noisy reports. However, noisy reports are unbiased, so the frequency of the truthful reporting phase is by Euclidian distance closer to than are the frequencies of the other two phases:

where , which gives us:

As the expression also holds for signal , it follows that the disagreement among reports, i.e., probability that the two reports do not match, is (strictly) maximized in the truthful reporting phase. Therefore, we can use the disagreement as an indicator of whether agents are truthful or not.

Furthermore, if needs to be obtained from the reports, it is enough to acquire responses of agents using PTS that has and such that agents are clearly incentivized to report a specific value. For example, one can set to a small value, and define , where is the number of agents who reported . Notice that is in expectation equal to , and this generalizes to the non-binary case (but also that depends on (biased noise)).

Based on the previous observations, we now construct a novel elicitation mechanism: AdaPTS. The first step of the AdaPTS mechanism is to divide probability simplex into regions and accordingly sample from each region one fully mixed representative . In Figure 1, these representatives are shown as black points on the horizontal axis and the division of the simplex is done uniformly. The granularity of the division should be fine enough so that at least one representative falls into the truthfulness phase. To achieve this, the PTS mechanism must have some knowledge about the agents’ belief structure, but this knowledge can be very limited. For example, to properly divide , it is enough to know the lower bound on the size of the region that contains distributions for which PTS is BNIC.

Furthermore, in such a discretization, one can always choose representative distributions that are in the interior of the probability simplex, thus avoiding potential divisions by in equation . Notice that it is also possible to bound payments to a desired interval by choosing an appropriate value of (see Footnote 2).

Now, the AdaPTS mechanism should define payment function before each time step , i.e., before a considered group of agents start submitting their reports. We want to maximize the number of honest agents, without knowing for which representative distributions agents are honest. This can be translated to a (stochastic) multi-armed bandit (MAB) setting5 (e.g., see [1]) with arms defined as representative distributions and the optimization goal defined as maximizing the number of honest agents.

As argued in the previous paragraph, the latter is the same as maximizing the disagreement among reports. More precisely, we define our objective function (feedback of MAB) as an indicator function that counts the disagreements among the reports of agents:

Notice that the indicator function depends on a chosen representative distribution , while its associated distribution is dependent on agents’ strategies and the underlying distribution from which agents’ private signals are sampled. Therefore, at time step , AdaPTS considers a group of agents, selects a representative distribution according to a MAB algorithm, and scores agents using the PTS mechanism with the chosen representative . After receiving the reports of agents, the mechanism updates the parameters of the MAB algorithm.

Although we could use any MAB algorithm with desirable regret features, in the following text we restrict our attention to UCB1 [4]. Algorithm ? depicts the pseudocode of AdaPTS based on the UCB1 algorithm [4]. Function returns the set of representative distributions for a given granularity , e.g., by uniformly discretizing the probability simplex as shown in Figure 1. Function collects the reports of agents whose rewards are then calculated using PTS mechanism with parameter , where the peer of agent is determined by index . Function estimates the value , e.g., by acquiring responses of agents for extremal values of , as described in the previous subsection.

## 4Analysis

We first start by examining particular properties of our setting, which imply the difficulty of our problem and also lead us towards our main results. The major technical difficulty is to show that there exists a distribution for which mechanism is truthful. This is not a trivial statement, since the original PTS mechanism requires an additional condition to hold, which is not necessarily satisfied in our setting. Furthermore, we also need to show that is an appropriate indicator function. Given these two results, we can apply the results from multi-armed bandit literature ([17]) to derive the logarithmic bounds on the dishonesty limit.

### 4.1Correlation Among Signal Values

The first property we show is that there exist a limit on how different signal values can be correlated in terms of agents posterior beliefs. In particular, if an agent endorses information , there is an upper bound on the value of her belief about a peer agent endorsing information .

### 4.2Mechanisms with Limited Knowledge

The second property is that the truthful elicitation of all private signals is not possible if a mechanism has no knowledge about agents’ belief structure. This follows from Theorem 1 presented in [21], which states that it is not possible to design a minimal payment scheme that truthfully elicits private signals of all agents. While the result was obtained for a slightly different information elicitation scenario, where no particular belief model is assumed, it is easy to verify that it holds in our setting as well (the proof does not use anything contradictory to our setting). We explicitly state the impossibility of truthful information elicitation due to its importance for the further analysis.

Even if a mechanism does have some information about agents, the result of Theorem ? is likely to hold if this knowledge is limited. We, therefore, define knowledge-bounded mechanisms as mechanisms whose information about agents is not enough to construct a BNIC payments function for all admissible belief models.

The AdaPTS mechanism assumes that a given granularity structure of probability simplex contains a representative distribution for which the PTS payment rule is BNIC. We show in the following subsections that one can always partition probability simplex to obtain a desirable granularity structure. Moreover, the following lemma shows that the granularity structure of AdaPTS is not in general sufficient to construct a BNIC payment rule.

### 4.3Dishonesty Limit

While one cannot achieve incentive compatibility using peer prediction with limited knowledge, dishonest responses could be potentially useful for a mechanism to learn something about agents. This was noted in [13], where the mechanism outputs a publicly available statistic that converges to the desirable outcome - true distribution of private signals. The drawback of the mechanism is that it relies on agents being capable of learning from each other’s responses, meaning that they update their beliefs by analyzing the changes in the public statistic.

Our approach is different. By inspecting agents’ responses, we aim to learn which incentives are suitable to make agents respond truthfully. To quantify what can be done with such an approach, we define dishonesty limit.

The following lemma establishes the order of the lower bound of DL. Its proof is based on the result of [17], while the tightness of the result is shown in Section Main Results.

By Definition ?, there exists no single mechanism that incentivizes agents to report honestly if their belief model is arbitrary. Suppose now that we have two mechanisms and that are BNIC under two different (complementary) belief models, so that a particular group of agents is truthful only for one mechanism. We can consider this situation from the perspective of a meta-mechanism that has to choose between and . At a time-step , obtains reports — feedback, which, in general, is insufficient for determining whether agents lied or not because agents’ observations are stochastic, while their reports contain noise. Therefore, the problem of choosing between and is an instantiation of a multi-armed bandit problem (see Section Mechanism: Adaptive PTS (AdaPTS)). Since, in general, any MAB algorithm pulls suboptimal arms number of times in expectation where is the total number of pulls (e.g., see [17]), we know that meta mechanism will in expectation choose the wrong (untruthful) payments at least times. This produces untruthful reports in expectation because non-truthful payments are not truthful for at least one signal value and each signal value has strictly positive probability of being endorsed by an agent.

### 4.4Existence of an Informative Distribution

The PTS mechanism is BNIC if the associated distribution satisfies the self-predicting condition, i.e., if is informative. We now turn to the crucial property of our setting:

### 4.5Indicator Function

Now, let us define a set of lying strategies that have the same structure and in which agents report only a strict subset of all possible reported values. For example, if possible values are and agents are not incentivized to report honestly value , then a possible lying strategy could be to report honestly values , , , and instead of honestly reporting , agents could report .

Non-surjective strategies also include those that most naturally follow from a simple best response reasoning: agents who are not incentivized to report honestly, deviate by misreporting, which necessarily reduces the set of values that rational agents report. This type of agents’ reasoning basically corresponds to the player inference process explained in [26]. Without specifying how agents form their reporting strategy, we show that in PTS there always exists an equilibrium non-surjective strategy profile. In Subsection Allowing Smoother Transitions Between Phases, we discuss how to use our approach when agents are not perfectly synchronized in adopting non-surjective strategies.

Notice that even for non-surjective strategy profiles, the set of reported values that a mechanism receives does not reduce, i.e., it is equal to . This follows from the fact that not all agents are rational in a sense that they comply with a prescribed strategy profile (i.e., some report random values instead). Nevertheless, the statistical nature of reported values change: reports received by the mechanisms have smaller number of disagreements.

### 4.6Main Results

From Proposition ?, Proposition ?, and Lemma ?, we obtain the main property of the AdaPTS mechanism: its ability to substantially bound the number of dishonest reports in the system.

Consider a reporting strategy in which agents are honest whenever is such that truthful reporting is a strict Bayesian-Nash equilibrium of PTS (by Proposition ?, such always exists), and otherwise they use an equilibrium non-surjective strategy profile (which by Proposition ? always exists). We use the result that the UCB1 algorithm is expected to pull a suboptimal arm times, where is the total number of pulls [4]. By Lemma ?, the representative of a truthful reporting region is an optimal arm, while the representative of a non-truthful region is a suboptimal arm. Furthermore, the number of pulls in our case corresponds to , where is the total number of agents and is the number of agents at time period . Since is a fixed parameter, the expected number of lying agents is of the order of .

Notice that we have not specified an exact equilibrium strategy that satisfies the bound of the theorem. For the theorem to hold, it suffices that agents adopt truthful reporting when in AdaPTS is such that PTS is BNIC, while they adopt any non-surjective strategy profile when is such that PTS is not BNIC. As explained in the previous section, a simple best response reasoning can lead to such an outcome.

Since AdaPTS is allowed to have a bounded knowledge information structure (Lemma ?), from Theorem ? it follows that the dishonesty limit is upper bounded by . From Lemma ?, we know that the dishonesty limit is lower bounded by . Therefore:

Importance of the results.

With the dishonesty limit concept, we are able to quantify what is possible in the context of minimal elicitation with limited knowledge. An example of an objective that is possible to reach with partially truthful mechanisms is elicitation of accurate aggregates. In particular, suppose that the goal is to elicit a distribution of signal values. From Theorem ?, we know that this is achievable with AdaPTS. Namely, if we denote the normalized histogram of reports by and the normalized histogram of signal values by , then it follows from the theorem that their expected difference is bounded by:

which approaches as increases. Therefore, although the truthfulness of all agents is not guaranteed, the aggregate obtain by the partially truthful elicitation converges to the one that would be obtained if all agents were honest.

In this section, we discuss how to make our mechanism AdaPTS applicable to the situations when there exist biases in reporting errors (i.e., parameter is biased towards a particular value) or the phase transitions are smoother (e.g., because reporting strategies are not perfectly synchronized).

Allowing Bias in the Non-strategic Reports.

Allowing biases in reporting errors is important for cases when some agents do not strategize w.r.t. parameter , e.g., these agents are truthful regardless of the payment function, or they report heuristically without observing their private signal (for example, a fraction of agents reports , whereas the other agents are strategic). An example of the phase transition diagram that incorporates a bias is shown in Figure 2. Since the disagreement is highest for phase 1 of the diagram, function is not a good indicator of agents’ truthfulness. However, frequency of the truthful phase is the closest one to the average of and . By the same reasoning as in Section Our Approach, the expression is maximized for . This leads us the following indicator function:

where is the frequency of reports equal to among . The problem, however, is that the AdaPTS mechanism does not know and . Nevertheless, it can estimate them in an online manner from agents’ reports.

To adjust AdaPTS, we first define time intervals and to each time interval associate a different UCB1 algorithm. For periods , we run a (separate) UCB1 algorithm with the indicator function that uses estimators and . The estimators are initially set to and for all values , and they change after each time interval . More precisely, they are updated by finding respectively the minimum and the maximum frequency of reports equal to among all possible arms (representative distributions ). Since UCB1 sufficiently explores suboptimal arms, the estimates and become reasonable accurate at some point, which implies a sublinear number of dishonest reports for a longer elicitation period.

Allowing Smoother Transitions Between Phases.

Agents may not be perfectly synchronized in changing between phases. Nonetheless, we can expect that the resulting behaviour would produce a similar phase transition diagram, as illustrated in Figure 3. However, the simple indicator function defined by expression is no longer a suitable choice for detecting the truthful reporting phase.

In order to see this, we have added additional representative distributions . Notice that we obtain the same disagreement for as for . However, is a better choice, because belongs to a transition phase where a high level of disagreement is due to asynchronous behaviour of agents. Notice that the phase diagram experiences rapid changes in transitions between two phases. This means that we can avoid selection of undesirable distributions by introducing a proper regularization term. That is, we can separate agents that arrive at time into two groups, and , and reward each group with a slightly different from the one selected by UCB1. For example, if is selected, we could reward one group using PTS with and the other group using PTS with . If group has agents and group has agents , then a possible indicator function could be:

where is the regularization factor. The regularization term is in expectation equal to the square of the difference between the expected disagreement of agents in group and the expected disagreement of agents in group . Some insight on how to adjust might be a priori needed, but this information is a limited knowledge. With this modification of the indicator function, we can apply AdaPTS.

## 6Conclusion

We investigated the asymptotic behavior of partially truthful minimal peer prediction mechanisms with a limited knowledge. As shown by Theorem ?, any such mechanism results in redundant (non-truthful) reports. In contrast, one of the most known knowledge-bonded elicitation mechanism, Bayesian Truth Serum [20], elicits from each agent her signal value and her prediction about other agents, having in total additional reports. Thus, our results quantify the necessary overhead when the minimality in reported information and the knowledge of a mechanism is preferred to full truthfulness. One of the most important future steps would be to make the mechanism robust in terms of collusion resistance (e.g., measured using replicator dynamics [25]), which, in general, can be challenging even for a more robust settings [10].

This work was supported in part by the Swiss National Science Foundation (Early Postdoc Mobility fellowship).

## ATTACHMENT: Partial Truthfulness in Peer Prediction Mechanisms with Limited Knowledge

### 7.1Proof of Lemma

Using the properties of our model (conditional independence of signal values given ) and Bayes’ rule we obtain:

Jensen’s inequality tells us that , with strict inequality when (notice that is not constant due to stochastic relevance). As is fully mixed, we have:

implying the first statement.

From Bayes’ rule it follows that is equal to:

Similarly, is equal to:

Notice that expressions and have equal denominators, so we only need to compare nominators. Let and be two random variables such that and . We have:

is expectation over the distribution . Using the Cauchy-Schwarz inequality (, with strict inequality if and ), and the fact that and are positive random variables that differ due to stochastic relevance, we obtain that for .

The third claim follows analogously. Namely, the nominator of is in more general form equal to , while the nominator of is in more general form equal to , where . By applying the Cauchy-Schwarz inequality, we obtain the claim.

### 7.2Proof of Lemma

Consider two arbitrary probability distribution functions and , such that: , , and for and . PTS with is truthful for belief model defined by , , , where . Similarly, we define posteriors based on . Notice that , where :

Now, suppose that , and set up such that: and . By using the same procedure of proving as in Theorem 1 of [21], we obtain that if a mechanism is incentive compatible for both and , then:

where and . The two inequalities, however, contradict by the choice of .

Therefore, even though a mechanism might know the size of a region in probability simplex that contains distributions for which PTS is BNIC, it might not be able to construct a BNIC payment rule.

### 7.3Proof of Lemma

By Definition ?, there exists no single mechanism that incentivizes agents to report honestly if their belief model is arbitrary. Suppose now that we have two mechanisms and that are BNIC under two different (complementary) belief models, so that a particular group of agents is truthful only for one mechanism. We can consider this situation from the perspective of a meta-mechanism that has to choose between and . At a time-step , obtains reports — feedback, which, in general, is insufficient for determining whether agents lied or not because agents’ observations are stochastic, while their reports contain noise. Therefore, the problem of choosing between and is an instantiation of a multi-armed bandit problem (see Section Mechanism: Adaptive PTS (AdaPTS)). Since, in general, any MAB algorithm pulls suboptimal arms number of times in expectation where is the total number of pulls (e.g., see [17]), we know that meta mechanism will in expectation choose the wrong (untruthful) payments at least times. This produces untruthful reports in expectation because non-truthful payments are not truthful for at least one signal value and each signal value has strictly positive probability of being endorsed by an agent.

### 7.4Proof of Proposition

The proposition is the direct consequence of the two following lemmas.

Since is informative:

for all . The strictness of the inequality implies that there exists such that for all we have:

In other words, we have that any for which , , satisfies:

By putting , we obtain the claim.

We only need to show the existence of . Consider a specific signal value and any other signal value . Let us define as:

where is a normalization factor so that . Notice that by Lemma ?:

holds for any . To prove that is informative, it is sufficient to show that for any signal values we have:

Provided that these inequalities hold, the second one can be made strict (while keeping the other two inequalities strict as well) by reducing all , , by a small enough value, and then re-normalizing .

The first inequality follows from and inequality :

We obtain the second inequality by putting :

For the third inequality, we use the second inequality and inequality , which follows from Lemma ?. We have:

### 7.5Proof of Proposition

One equilibrium non-surjective strategy profile is when agents report value such that . Let us denote by an agent ’s belief regarding the report of her peer agent , i.e., , for the considered strategy profile. Notice that for (due to the reporting noise). However, the strategy profile of reporting is an equilibrium because the expected value of is equal to , so in expectation for , while . That is, an agent’s expected payment is strictly maximized when she reports .

### 7.6Proof of Lemma

Since a non-surjective reporting strategy is a non-surjective function of observation , we know that takes values in a strict subset of all possible signal values. The disagreement function is linear in , so it is sufficient to show that is in expectation greater for truthfulness than for non-surjective reporting strategy profile. In expectation, the expression is equivalent to saying whether two reports of rational agents disagree, which is equal to , where is the probability of agreement. The probability of agreement in a non-surjective strategy profile is equal to:

where the last term is the probability of agreement for truthfulness. Notice that the inequality is strict because in a non-surjective strategy profile there exist and for which , and thus, . The last inequality follows from agents’ beliefs being fully mixed. Since is strictly smaller than for any non-surjective strategy profile, we conclude that the disagreement is strictly greater for truthful reporting than for any other non-surjective strategy profile.

### 7.7Proof of Theorem

Consider a reporting strategy in which agents are honest whenever is such that truthful reporting is a strict Bayesian-Nash equilibrium of PTS (by Proposition ?, such always exists), and otherwise they use an equilibrium non-surjective strategy profile (which by Proposition ? always exists). We use the result that the UCB1 algorithm is expected to pull a suboptimal arm times, where is the total number of pulls [4]. By Lemma ?, the representative of a truthful reporting region is an optimal arm, while the representative of a non-truthful region is a suboptimal arm. Furthermore, the number of pulls in our case corresponds to , where is the total number of agents and is the number of agents at time period . Since is a fixed parameter, the expected number of lying agents is of the order of .

### Footnotes

1. As, for example, in rating a restaurant or a hotel on TripAdvisor (www.tripadvisor.com).
2. For example, [22] show that one cannot easily relax the common prior condition when agents’ private information is real-valued.
3. Furthermore, notice that by setting proportional to , we can bound PTS payments so that they take values in .
4. The condition is typically defined for equal to the prior [14], but we generalize it here.
5. A most basic -armed bandit problem is defined by random variables , where represent the arm of a bandit (gambling machine) and represents the reward (feedback) obtained by pulling the arm at time step . The goal is to maximize the total reward by sampling one arm at each time step.

### References

1. 1995.
Agrawal, R. Sample mean based index policies with o(log n) regret for the multi-armed bandit problem.
2. 2010.
Audibert, J.-Y., and Bubeck, S. Regret bounds and minimax policies under partial monitoring.
3. 2009.
Audibert, J.-Y.; Munos, R.; and Szepesvári, C. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.
4. 2002.
Auer, P.; Cesa-Bianchi, N.; and Fischer, P. Finite-time analysis of the multiarmed bandit problem.
5. 2013.
Dasgupta, A., and Ghosh, A. Crowdsourced judgement elicitation with endogenous proficiency.
6. 2017.
Faltings, B., and Radanovic, G. Game Theory for Data Science: Eliciting Truthful Information.
7. 2014.
Faltings, B.; Pu, P.; Tran, B. D.; and Jurca, R. Incentives to counter bias in human computation.
8. 2017.
Faltings, B.; Jurca, R.; and Radanovic, G. Peer truth serum: Incentives for crowdsourcing measurements and opinions.
9. 2016.
Frongillo, R., and Witkowski, J. A geometric method to construct minimal peer prediction mechanisms.
10. 2016.
Gao, A.; Wright, J. R.; and Leyton-Brown, K. Incentivizing evaluation via limited access to ground truth: Peer-prediction makes things worse.
11. 2014.
Garcin, F., and Faltings, B. Swissnoise: Online polls with game-theoretic incentives.
12. 2011.
Garivier, A., and Cappé, O. The KL-UCB algorithm for bounded stochastic bandits and beyond.
13. 2008.
Jurca, R., and Faltings, B. Incentives for expressing opinions in online polls.
14. 2011.
Jurca, R., and Faltings, B. Incentives for answering hypothetical questions.
15. 2015.
Kamble, V.; Shah, N.; Marn, D.; Parekh, A.; and Ramachandran, K. Truth serums for massively crowdsourced evaluation tasks.
16. 2016.
Kong, Y., and Schoenebeck, G. Equilibrium selection in information elicitation without verification via information monotonicity.
17. 1985.
Lai, T. L., and Robbins, H. Asymptotically efficient adaptive allocation rules.
18. 2016.
Liu, Y., and Chen, Y. Learning to incentivize: Eliciting effort via output agreement.
19. 2005.
Miller, N.; Resnick, P.; and Zeckhauser, R. Eliciting informative feedback: The peer-prediction method.
20. 2004.
Prelec, D. A bayesian truth serum for subjective data.
21. 2013.
Radanovic, G., and Faltings, B. A robust bayesian truth serum for non-binary signals.
22. 2014.
Radanovic, G., and Faltings, B. Incentives for truthful information elicitation of continuous signals.
23. 2016.
Radanovic, G.; Faltings, B.; and Jurca, R. Incentives for effort in crowdsourcing using the peer truth serum.
24. 2016a.
Shnayder, V.; Agarwal, A.; Frongillo, R.; and Parkes, D. C. Informed truthfulness in multi-task peer prediction.
25. 2016b.
Shnayder, V.; Agarwal, A.; Frongillo, R.; and Parkes, D. C. Measuring performance of peer prediction mechanisms using replicator dynamics.
26. 2014.
Waggoner, B., and Chen, Y. Output agreement mechanisms and common knowledge.
27. 2012a.
Witkowski, J., and Parkes, D. C. Peer prediction without a common prior.
28. 2012b.
Witkowski, J., and Parkes, D. C. A robust bayesian truth serum for small populations.
29. 2014.
Witkowski, J. Robust Peer Prediction Mechanisms.
26417