Sybil-proof Mechanisms in Query Incentive Networks

# Sybil-proof Mechanisms in Query Incentive Networks

WEI CHEN YAJUN WANG DONGXIAO YU LI ZHANG Microsoft Research Asia, weic@microsoft.com Microsoft Research Asia, yajunw@microsoft.com The University of Hong Kong, dxyu@cs.hku.hk Microsoft Research Silicon Valley Lab, lzha@microsoft.com
###### Abstract

In this paper, we study incentive mechanisms for retrieving information from networked agents. Following the model in [Kleinberg and Raghavan (2005)], the agents are represented as nodes in an infinite tree, which is generated by a random branching process. A query is issued by the root, and each node possesses an answer with an independent probability . Further, each node in the tree acts strategically to maximize its own payoff. In order to encourage the agents to participate in the information acquisition process, an incentive mechanism is needed to reward agents who provide the information as well as agents who help to facilitate such acquisition.

We focus on designing efficient sybil-proof incentive mechanisms, i.e., which are robust to fake identity attacks. We propose a family of mechanisms, called the direct referral (DR) mechanisms, which allocate most reward to the information holder as well as its direct parent (or direct referral). We show that, when designed properly, the direct referral mechanism is sybil-proof and efficient. In particular, we show that we may achieve an expected cost of for propagating the query down levels for any branching factor . This result exponentially improves on previous work when requiring to find an answer with high probability. When the underlying network is a deterministic chain, our mechanism is optimal under some mild assumptions. In addition, due to its simple reward structure, the DR mechanism might have good chance to be adopted in practice.

query incentive networks; query incentive mechanisms; sybil-proof mechanisms; branching processes
\acmVolume

X \acmNumberX \acmArticleX \acmYear2013 \acmMonth6

\category

G.2Mathematics of ComputingDiscrete Mathematics\categoryG.3Mathematics of ComputingProbability and Statistics\categoryF.2.0Analysis of Algorithms and Problem ComplexityGeneral\categoryJ.4Social and Behavioral SciencesEconomics

\terms

Economics, Theory

{bottomstuff}

## 1 Introduction

Many information systems, e.g., peer-to-peer networks or social networks, are designed such that queries are answered by networked agents instead of a centralized authority. In such a system, the query propagates in the network with the hope that it will eventually reach some agents which hold (and return) an answer. For such query models, it is important to design an incentive mechanism to encourage the query propagation and the return of the answer. In addition, we would like the mechanism to be efficient, with low expected cost for the root, and sybil-proof, discouraging the agents to disrupt or delay the query process by producing fake identities. In this paper, we will present a family of mechanisms which can achieve these two goals simultaneously.

We mainly follow the query incentive network model invented by \citeNKleinberg2005. Under the model, each agent is represented as a node in a fixed infinite -ary tree. A query is issued by the root, where each node in the tree may have an answer with a fixed probability . Information, query or answer, can propagate along the edges in the tree which are “turned on” according to a random branching process. Each node, with a local view of the tree, can decide if it continues to propagate the query to its children and to forward back an answer to its parent. The nodes are self-interested and risk-neutral so they choose the actions that maximize their expected payoff. In the model, there is a fixed unit cost in forwarding a selected answer along each edge. On the other hand, it is free to propagate the queries.

In [Kleinberg and Raghavan (2005)], the authors considered incentive mechanisms in the form of fixed-payment contract, where each node offers a fixed amount of reward, which is in turn a part of the reward promised by its parent, to its children, under the condition that the child propagates back an answer (and accepted by the root). In the paper, the authors obtain the lowest possible cost (or reward) needed for such mechanisms to retrieve an answer with a constant probability. The cost depends on the rarity of the answer, defined as , and the branching factor , the expected number of edges to children that are “turned on” by the random process. The paper showed a phase transition phenomenon at : when , the mechanism can achieve low cost of ; when , the cost explodes to , which is exponential to the number of levels needed to explore.

The work of \citeNKleinberg2005 motivated many subsequent works. In particular, \citeNArcaute2007 extended the results to more general random processes, and observed the similar phase transition phenomenon. \citeNCebrian2012 analyzed a different type of mechanism, called split contract, which models a successful scheme used by the winner [Pickard et al. (2011)] of the DARPA Network Challenge (also known as Red Balloon Challenge) [DARPA (2009)]. Roughly speaking, in the split contract mechanism, the answer holder receives the specified reward, while each “referral” on the path to the root receives a fraction of its child’s reward. \citeNCebrian2012 showed that the split contract can achieve low cost even when . However, unlike the fixed-payment contract scheme, the split contract is not sybil-proof, as one can produce fake identities in the tree to obtain higher expected payoff.

In our paper, we propose a new family of mechanisms which distribute most reward to, in addition to the agent who provides the answer, the direct referral, i.e., its parent. We call such query incentive mechanisms as Direct Referral (DR) mechanisms. We show that the direct referral mechanism, when designed properly, can discourage sybils, i.e., it is not to the agents’ interest to create fake identities, as well as obtain a low cost to retrieve an answer for any . In particular, the scheme still has a low cost when the success probability is close to , e.g., with probability , where is the extinction probability of the branching process. Both fixed-payment contracts and split contracts incur high cost when the success probability approaches .

The following two theorems summarize the main results. {theorem} If the underlying branching process is a deterministic chain, there exists a sybil-proof direct referral incentive mechanism with expected cost , where is the desired level of agents the root wants to propagate the query and is the answer rarity. Furthermore, the direct referral mechanism is optimal on the chain among all the sybil-proof query incentive mechanisms which satisfy some mild assumptions (Section 4.3).

{theorem}

For any branching process with branching factor , there is a constant such that for any answer rarity with , there exists a sybil-proof direct referral query incentive mechanism with expected cost , where is the desired level of agents the root wants to propagate the query.

Notice that the above bound holds for so the direct referral mechanism, compared to the fixed-payment contract mechanism, can achieve low cost for a larger family of branching processes. In addition, the cost (when ) has a polynomial dependence on the level instead of the rarity . Therefore, in the case of retrieving an answer with high probability, i.e., a probability which is close to the extinction probability of the branching process, the expected cost is still polynomial in , rather than polynomial in (in this case, is exponential in ). In contrast, query incentive networks with either fixed-payment contracts [Arcaute et al. (2007)] or split contracts [Cebrian et al. (2012)] have cost polynomial of in the high probability case.

The direct referral mechanism has the natural structure of rewarding the answer holder and the direct referral while the others only receive minimum compensation for routing the answer. Such simplicity might be highly desirable for practical adoption. On the other hand, despite the simplicity of the direct referral scheme, we show that the mechanism can be quite robust, i.e., sybil-proof, and efficient. Actually, in the case of the infinite chain, we can show that such scheme is optimal among all the sybil-proof mechanisms with some mild assumptions.

Intuitively, by rewarding the direct referral, the DR mechanism encourages any agent who does not have an answer to propagate the query as there is a chance that one of its children may have an answer (and get selected by the root) so the agent can win the direct referral reward. In addition, for any agent, no matter how many sybils it creates, at most one of them receives the “direct referral” reward. Therefore, if we design the direct referral rewards such that they decrease rapidly enough as the depth increases, the potential gain of the sybils may be offset by the gap between the direct referral rewards in the two different levels. Of course, the gap cannot be too large for it would increase the expected cost for the root. Indeed, we will show that with mildly decreasing direct referral rewards, we can achieve both sybil-proofness and low cost at the same time.

### 1.1 Related Work

The model of query incentive networks was introduced by Kleinberg and Raghavan 2005, where they considered a simple branching process in an underlying -ary tree, i.e., each edge exists with an independent probability with branching factor . Kleinberg and Raghavan observed an interesting phase-transition phenomenon with fixed-payment contracts. Specifically, when , in order to retrieve the answer with constant probability, the reward needed for the root to offer is which is asymptotically optimal. However, when , the cost grows to , i.e., the root needs to pay a reward that is exponentially larger than the expected distance for finding an answer in this case.

Arcaute et al. (2007) generalized the simple branching process in Kleinberg and Raghavan (2005) to an arbitrary GW branching process in which the number of children of a node is determined by a fixed offspring distribution. They observed that the phase transition phenomenon at branching factor still exists in this general case. Furthermore, they also observed that when it requires to find the answer with high probability, e.g., with probability , where is the extinction probability of the branching process, the phase transition phenomenon vanishes. In particular, for any branching process with branching factor , to retrieve an answer with high probability, the required reward is . They also showed that in a deterministic chain (), the cost of the root is for finding an answer with constant probability.

Kota and Narahari 2010 analyzed the reward for such fixed-payment contracts when the degree distribution follows power-law. Dikshit and Yadati 2009 considered the quality of the answers in query incentive networks. Both models exhibit similar phase transition phenomena at branching factor found in Arcaute et al. (2007); Kleinberg and Raghavan (2005).

Cebrian et al. (2012) presented a split contract based mechanism motivated by the success of the winning team of the DARPA challenge. In this mechanism, the root provides a reward to the answer holder. A “shortest path” based answer selection scheme is adopted in case of multiple reachable answers. See Section 2.3.1. Each node in the path to the selected answer will receive a fraction of the reward received by its child. With the split contract mechanism, it is shown that the phase transition phenomenon vanishes at . In particular, for any GW branching process with branching factor , the cost to retrieve an answer with constant probability is in a Nash equilibrium which is asymptotically optimal. Therefore, split contract based query incentive networks are more efficient. On the other hand, if we want to retrieve an answer with high probability, split contracts also need reward of .

In the previous studies on the query incentive networks, generating fake identities is not part of the agents’ strategy. In other words, sybil-proofness is not explicitly explored. In fact, we show that the fixed contract based mechanisms are “sybil-proof” while split contract mechanisms are clearly not.

How to prevent sybil attacks Douceur (2002) has been studied in many aspects of computer networks. Babaioff et al. (2012) presented a sybil-proof scheme for the Bitcoin system. There are several major differences between the Bitcoin system and a query incentive network. Most notably, instead of a branching process, the network in the Bitcoin system can be intentionally constructed, which gives the mechanism designer an additional freedom to address the sybil-proofness. Therefore, the results in Babaioff et al. (2012) cannot be directly applied to a query incentive network. They adopted the iterated removal of dominated strategy, which is a stronger solution concept than the Nash equilibrium used in this paper.

Sybil-proofness mechanisms for multi-level marketing is studied in Emek et al. (2011); Drucker and Fleischer (2012). In the multi-level marketing, there is a fixed cost (price) for a sybil to purchase the product. Therefore, to enforce sybil-proofness, the mechanisms try to cap the referral fees. Douceur and Moscibroda Douceur and Moscibroda (2007) gave a sybil-proof lottery tree mechanism for motivating people to install and run a distributed service in a peer-to-peer system. The issue of sybil-attacks have also appeared in many other contexts such as reputation mechanisms Cheng and Friedman (2005), combinatorial auctions Todo et al. (2009), social choice Wagman and Conitzer (2008); Conitzer and Yokoo (2010), and cost-sharing games Penna et al. (2009). One major difference between our problem (as well as Babaioff et al. (2012)) with these problems is they are dealing with static configurations, which make sybil-proofness hard to achieve. In our results, we punish sybils by reducing their probability of winning in a probabilistic environment. In particular, our mechanism is not sybil-proof if the agents know the outcome of the environment. Such trade-off for sybils, i.e., more reward conditioned on winning with smaller winning probability, is not available with static inputs. Nevertheless, we believe the results in this paper may be of interest in other settings.

## 2 Query incentive networks: a normal form

In this section, we describe the problem and the model formally. In particular, we provide formal definition for the random branching process for information propagation and the various components that constitute an incentive mechanism.

### 2.1 The branching process

Following previous works, the underlying network is generated by a Galton-Watson (GW) branching process on an infinite -ary tree. In the branching process, each node samples its number of children independently according to a given distribution . Afterwards, selects children, out of its children uniformly at random, and connects to them. The final tree in the branching process is the connected component containing the root. We call an agent active if and non-active otherwise.

Formally, given an offspring distribution , where and is the probability to have children, define the probability generating function of the branching process as

 Ψ(x)=d∑i=0cixi. (1)

A basic parameter for the branching process is the branching factor , defined as the expectation of . The extinction probability of the branching process is the probability that the branching process dies out, i.e., the final tree is finite. A well known fact is that if and only if or with , and otherwise.

For the query issued from the root, each node in the underlying tree has an answer with an independent probability , where represents the rarity of the answer. So in expectation, we should reach out nodes to obtain an answer with constant probability.

### 2.2 Query model

Given the above process for generating the network, the query process works as follows:

1. The root announces the incentive mechanism, which stipulates rules for selecting the answer and for rewarding the involved agents.

2. The query is propagated, starting from the root, down the tree as generated by the above branching process.

3. Each node (agent) in the tree, when receiving the query, may decide whether to continue to propagate the query, and whether, in case it has an answer, to report the answer back to its parent,

4. When an agent has an answer and/or receives reports of answers from its subtree, it chooses to report a subset (can be empty) of the answers to its parent. After the root receives all reports, a winning answer is selected.

5. The holder of the winning answer forwards the answer to the root through the path in the tree. If any node on the path decides not to forward, no payment will be made.

6. Once the root receives the answer, the rewards are paid to the nodes according to the rule announced in Step (1).

One significant difference between our model and the previous work is that in our model the incentive mechanism is determined by the root but the previous work allows a mixture of global and local contracts. For example, in the fixed-payment contract or the split contract schemes, the type of contracts as well as the process for selecting an answer are fixed globally, but each node can decide locally how to enter into a contract with its parent or children.

In our model, the incentive mechanism announced by the root contains a global reward allocation scheme, which maps any final configuration to a set of rewards to the agents. While such a global reward scheme may seem limiting as it takes away the freedom enjoyed by the nodes in the fixed-payment contract and split contract mechanisms, it is still non-trivial to address the new challenge of sybil attacks. On the other hand, we show later in this section that it is possible to describe the equilibrium of a local contract-based scheme by a global reward allocation scheme.

### 2.3 Query incentive mechanism

We call an answer (or the agent who has the answer) reachable from the root if all the agents on the path from the root to the answer are active in the branching process and decide to propagate the query. The query incentive mechanism determines how an answer is selected when there are multiple reachable answers, and how the rewards are allocated. To prevent arbitrarily complex mechanisms, we focus on mechanisms that satisfy the following properties.

1. Complete: If there exist reachable answers, the mechanism will select one.

2. Unique: Only one answer is selected if multiple answers are presented.

3. Anonymous: Agents are not treated preferentially a prior.

Such properties have been implicit in the previous work. In addition, by requiring these properties, it makes the analysis easier. For example, the anonymity is helpful in addressing sybil-proofness since the identities of the agents can be manipulated. We now describe families of mechanisms that achieve the above properties. We divide the query incentive mechanism into two steps: the answer selection step and the reward allocation step.

The answer selection step chooses one answer when multiple answers are reachable from the root. We consider two answer selection schemes, both appeared in the literature, that satisfy the above properties.

• Random Walk (RW): In RW scheme, starting from the root, at each step, we select one child uniformaly at random from those children who have reported the existence of answers in its subtree. We continue the random walk until we reach an answer, which is selected.

• Shortest Path (SP): Among all the reachable answers, we exclude those that are not the closest to the root. We then perform the above RW process for the remaining answers.

Both schemes have very natural interpretation. In the RW scheme, after the formation of the network, each node that has an answer reports its answer to its parent. If a node, who does not have an answer itself, receives reports of multiple answers from all children, it randomly selects one and reports it to its parent. In particular, if one node has an answer, it will not report answers from its subtree. The process continues until the root selects one of the answers reported to it uniform at random.

The SP scheme on the other hand can be viewed as the RW scheme for impatient agents. One node reports back to its parent as soon as it receives the query, in the case it has an answer. When the node does not have an answer, it immediately reports back when it receives reports of answers. In case of multiple answers reported simultaneously, it will select one uniform at random. However, unlike RW scheme, the node will not wait the responses from all children. We will use the SP answer selection scheme in our mechanisms.111 The reporting strategy described above is only for interpretation. In a Nash equilibrium, we assume an agent will report all answers it is aware of. The tie-breaking is handled by the root after receiving all reports.

#### 2.3.2 Reward allocation

Once an answer is selected, the reward allocation step determines how the rewards are assigned to the nodes on the answer path , i.e., the path from the root to the selected answer. To achieve anonymity, we require the reward allocation scheme to be oblivious.

{definition}

Oblivious Reward Allocation Scheme. A reward allocation scheme maps any particular answer path in a resulting branching process to a set of payments to the agents in :

 f:Tr,P∈Tr→[1,∞]|P|.

The reward scheme is oblivious if such that , we have

 f(Tr,P)=f(T′r,P′).

For a node , we denote its reward as . In particular, an oblivious reward allocation scheme only cares about the length of the answer path. It does not consider the identities of the agents in the path and the structure of the resulting branching process. Although this is a restriction on the space of reward allocations, oblivious reward allocation schemes are convenient in case of sybil-attacks, since both the identities and the structure of the branching process can be manipulated unexpected. In fact, we will show that the equilibria studied in the fixed-payment contract and split contract based query incentive networks imply oblivious reward allocation schemes.

Remark: In our model, we assume there is no cost for the agents. To avoid trivial reward allocations, we require all rewards are normalized to be at least one. This is slightly different with previous literature on query incentive networks where a fixed unit cost is assumed in forwarding the final selected answer. We choose the reward normalization instead of the forwarding cost to avoid defining the answer forwarding cost for sybils. Nevertheless, it is straightforward to generalize our results to the unit forwarding cost case as long as the cost for sybils is properly defined.

For a query incentive mechanism with an oblivious reward allocation scheme, the expected cost of the mechanism is defined as

 EP[∑u∈Pfu(P)],

where is the answer path selected and is the reward allocation scheme. If there is no reachable answer, is empty. The expectation is taken over the randomness of the branching process, the answer distribution and the answer selection scheme.

### 2.4 Sybil attack

Once the incentive mechanism is announced, an agent which is reachable from the root can choose the action to maximize its (expected) payoff. The agent can choose to propagate the query or not. When it has answers either from itself or reported from its children, it can choose to report a (possibly empty) subset of them to its parent. One important action we consider is the creation of fake identities (or sybil attack). We allow the following type of sybil attacks.

• Tree augmentation with sybils. One agent is allowed to generate a possibly infinite tree of sybils attaching to its parent. Its original children can be attached to one particular sybil in the tree.222It would be interesting to consider the case that different children can be attached to different sybils in the tree. Our analysis unfortunately cannot handle this case. The reward of the agent is the total rewards that are received by all its sybils.

• Answer placement. If the agent have an answer, it is allowed to place its answer to any subset of the sybils.

• Decision timeframe. We always assume an agent knows whether it has an answer or not before its strategic decision. We can also assume the action taken by the agent is conditioned on the event that it is active in the branching process.

With the above definition of sybil-attack, we call a query incentive mechanism is sybil-proof if the strategy profile in which no agent generates any sybil is a Nash equilibrium. {definition} [Sybil-proof query incentive mechanisms] A query incentive mechanism, consisting of an answer selection scheme and a reward allocation scheme, is sybil-proof with level if the following strategy profile is a Nash equilibrium:

• All agents in the underlying -aray tree in level to : If the agent contains an answer, it directly reports back to its parent and does not propagate the query. Otherwise, the agent chooses to propagate the query to its children. After it receives reports of answers from the children, it reports all answers to its parent. It chooses to forward the answer to its parent if one answer is selected from its subtree.

• The agents at level : If the agent has an answer, it directly reports back to its parent. If the answer is selected, it chooses to forward the answer to its parent. The agent will not propagate the query further.

Although we assume the decision of the agent is conditioned on the event that it is active in the branching process, such condition is in fact not necessary, i.e., the agent should take the same strategy independent of whether it is active. This can be easily seen since if the agent is not active, all strategies will have utility zero. In what follows, we will analyze our mechanisms without conditioning on the activeness of the agents.

### 2.5 Formulating contract-based mechanisms by global reward allocation schemes

To illustrate the connection between the global reward allocation scheme studied in the paper and the previous work, we describe how one would formulate an equilibrium of a contract-based mechanism as a global reward allocation scheme. We will mainly discuss the case for the fixed-payment contracts. The case for the split contracts is similar.

In a fixed-payment contract query incentive network, the root provides a total reward for an answer of the query. In particular, the root enters a contract with its child , i.e., if one answer from is selected, the root will pay to . Then query is propagated down the tree, during which each node determines the reward to its children. Finally, if there are multiple answers reported, the root will select an answer using the RW answer selection scheme. The agents on the path from the root to the selected answer holder are offered the reward based on the fixed-payment contracts determined. In particular, the utility of an agent in the path is , where is the reward offered by its parent and is the reward offered to its children. If is holding the selected answer, . Notice that all agents in the path except the root will pay a unit cost to forward the answer.

The strategy of each participating node is a function , i.e., if the offer from its parent is , it will offer a reward of to its children. Then an equilibrium in the query incentive network is defined by the set of functions for all participating nodes. (For simplicity, we omit the discussion on the strategy that the nodes decide to participate in this section.)

For any given equilibrium in the fixed-payment contract query incentive network, we can construct a reward allocation scheme as follows. Let be the sequence of nodes in the path from the root (excluded) to the selected answer holder. We will reward node by amount of , where is the total reward offered by the root. (It , the first node receives a reward of .) Clearly, this is a valid reward allocation scheme. Furthermore, if the equilibrium is symmetric, i.e., all nodes in the -th level play the same strategy , the corresponding reward allocation scheme is in fact oblivious.

For an equilibrium in the split contract query incentive network, we can construct such a reward allocation scheme as well. However, it is clear that such an equilibrium does not lead to a sybil-proof query incentive mechanism. Consider the case with a single chain with branching factor . The first agent who has the answer can create a sybil and sign a split contract with ratio with it. In this way, the agent gets all the initial reward, which is much more than its fair share in the equilibrium.

For query incentive networks with fixed-payment contracts, it is shown that there exits a best-interest Nash equilibrium Cebrian et al. (2012) with a unique strategy function for all participating agents Kleinberg and Raghavan (2005). Notice that if one agent has an answer, it will report truthfully and pocket all the reward offered because creating sybils will not increase the total reward in this case while it could reduce the probability that its answer is selected in either RW or SP schemes. In fact, we have the following result.

{theorem}

The best-interest Nash equilibrium in the fixed-payment contract query incentive network with the RW answer selection scheme defines a sybil-proof query incentive mechanism. {proof} Let the best interest Nash equilibrium in the fixed-payment contracts be with initial reward . We have shown that if one agent has the answer, it has no incentive to attack with sybils. It is sufficient to consider agents that do not have answers. Now assume the reward allocation scheme defined by is not sybil-proof, i.e., one agent at level has incentive to attack with additional sybils between itself and its children in the normal form.

Correspondingly, consider the original game with fixed-payment contracts. We show that the same agent at level would benefit by not following in the equilibrium. In particular, in receiving offer , the agent will offer its children instead of , where is the function to iteratively apply for times.

In both cases, i.e., the normal form with sybils and the fixed-payment contract game with offer , all the (real) descendants will react as if they are lower in the tree. Notice that the two trees in the two cases are slightly different. However, the RW answer selection scheme will not be impacted by the sybils. Therefore, for the attacking agent, the probabilities that it is on the answer path are the same in both cases.

Let and be the probability that the agent is on the answer path if it does not attack and does attack with sybils respectively. With answer selection scheme RW, we have . By the assumption of the non-sybil-proofness in the normal form without forwarding cost, we have

 p2⋅(ri−1−ri+k)>p1⋅(ri−1−ri). (2)

Notice that in the fixed-payment contract game, the expected utility if this agent plays truthfully is . The expected utility if this agent plays is . Since , by Eqn. (2), we have

 p2⋅(ri−1−ri+k−1)>p1⋅(ri−1−ri)−p2≥p1⋅(ri−1−ri−1).

In other words, is not an equilibrium of the game with fixed-payment contracts. It is a contraction.

Remark: We showed the sybil-proofness with the RW answer selection scheme. Our proof does not directly apply to the SP answer selection scheme, which we leave as an interesting problem.

Although fixed-payment contracts already imply a sybil-proof query incentive mechanism, it is not cost-effective for the case of  Kleinberg and Raghavan (2005) or if one wants to find an answer with probability  Arcaute et al. (2007). In the rest of the paper, we will propose a new mechanism that is more cost-effective in these two cases.

## 3 Technical preparations

One main complexity of our analysis comes from the branching process. In this section, we summarize and develop some technical results regarding the branching process which will be needed in the analysis.

Define as the probability that there is no answer in the nodes of the first levels of the branching process. Then the probability that the first answer in our branching process is at level is , with .

Some crucial properties of the sequence of are summarized in Lemma 3. One essential property is that is single-peaked: it first increases approximately geometrically with a constant ratio. Then it stays at nearly maximum value for a constant number of levels until it starts to decrease geometrically. This property might be of independent interest.

{lemma}

Consider a branching process with branching factor . There exist levels such that

1. is a single-peaked sequence, peaking at level , i.e., , and , .

2. There exists constant , such that , .

3. and consequently .

4. There exists constant , such that , .

Following the analysis in the literature, we fix the branching factor as a constant but we allow to grow.

Define function

 t(x)=d∑j=0cjxj(1−1n)j.

Notice that . This is because, if the root has children, then the event that there is no answer at the first levels (with probability ) is the same as the event that none of its children has the answer (with probability ) and none of the level subtrees rooted at its children has the answer (with probability ). The following result studies the growth rate of .

{proposition}

For all ,

 λi+1λi∈[t′(ϕi),t′(ϕi−1)]. (3)
{proof}

Notice that . Therefore, As is continuous, by mean value theorem, there exists such that . Finally, the result comes by the fact that is a monotonically increasing function for .

We will also utilize the following two results from previous works.

{lemma}

[Arcaute et al. (2007)] Given constant , , we have .

{lemma}

[Cebrian et al. (2012)] Consider any GW branching process with branching factor . Then, for every such that , it holds that

 1−ϕiλi+1≤max{1b−1,1ϕi−ζ⋅11−Ψ′(ζ)}, (4)

where is the generation function of the branching process and is the extinction probability.

Now we are ready to prove the main result in this section.

{proof}

[of Lemma 3] When , notice that is a strictly decreasing sequence while is increasing for . Proposition 3 indicates that the ratio of is decreasing, which implies that is a single-peaked sequence. This proves (1) of Lemma 3.

Now we proceed to the second property. Let be some constant we will specify later. Define .

By Proposition 3, we have . Let

 ρ=(1−1n)⋅b⋅(1−5ϵd)>1, (5)

which holds for sufficiently small . By Lemma 3 and the definition of , we have for any . Therefore, property (2) holds by setting .

For the third property, we show that for some carefully chosen . Since , . Let be the extinction probability of the branching process. Assume

 0<ϵ<1−ζ. (6)

By definition of , we have and . Define . Notice that is non-decreasing for . By Lemma 3, we have

 1−ϕℓ(ϵ)λℓ(ϵ)+1 ≤c(1−ϕℓ(ϵ))≤c(ϵ). (7)

Since , we have

 λℓ(ϵ)+1≥1−ϕℓ(ϵ)c(ϵ)=1−ϕℓ(ϵ)+1−λℓ(ϵ)+1c(ϵ)>ϵ−λℓ(ϵ)+1c(ϵ).

Therefore, , as both and are constants independent of . In other words, for any constant satisfies Eqn. (5) and Eqn. (6), we can set and both property (2) and (3) hold. Since is growing from to , we must have . (The total probability is at most .)

Finally, we consider property (4) regarding the sequence of after . By the definition of and Proposition 3,

 t′(ϕℓ∗+1)≤λℓ∗+2λℓ∗+1≤t′(ϕℓ∗)≤λℓ∗+1λℓ∗<1. (8)

If we can assume is a constant less than , we are done. However, it is not clear whether is bounded away from or not. (For example, if , we cannot directly conclude on property (4). ) We consider two cases:

(1) . This case is trivial, since for , by the monotonicity of and Proposition 3. Therefore, property (4) holds with .

(2) . This case implies . Therefore, .

Notice that for any constant , . ( implies .) Hence, in case (2), we have .

Because is continuous, by the mean value theorem, there exists , such that

 t′(ϕℓ∗)−t′(ϕℓ∗+1)=t′′(y)⋅(ϕℓ∗−ϕℓ∗+1)≥t′′(ϕℓ∗+1)⋅λℓ∗+1=Ω(1).

Now since by Eqn. (8), we can conclude is bounded away from , i.e., is a constant smaller than . Then for any , we have . By setting , we prove property (4).

## 4 Optimal sybil-proof DR mechanism on chains

In this section, we discuss the case that the underlying tree is simply an infinite chain. Each node in the chain has an answer with an independent probability . We design a direct referral mechanism which is sybil-proof in this case. We also show that, in fact, the DR mechanism is optimal, up to some mild assumptions.

### 4.1 The direct referral reward scheme

Let be the level of the chain that we want to propagate the query to. Notice that any oblivious reward scheme can be written as a function for , and , where is the reward to the -th agent on an answer path with length , i.e. the first answer appears at level . Since we discuss DR mechanisms, we have for any and .

Let be the probability that one agent has an answer. Define , i.e., is the expected reward of the -th agent conditioned on the event that the first agents do not have any answer. Let be the probability that there is an answer in consecutive nodes. We consider the following reward allocation scheme in the direct referral mechanism.

{definition}

DR reward allocation scheme on chains. Define as:

 r(i,s)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩n⋅Ri+1+Ph−i−1.if i≤h−1∧s=1,∑h−1t=ir(t,1)+1,if 1≤i≤h∧s=0,1i+s≤h∧s>1,0,otherwise. (9)

Notice that . Therefore, by definition of , we have

 Ri=Ri+1+Ph−i−1. (10)

For what follows, we show that several properties of the direct referral mechanism (a) it is sybil-proof; (b) the expected cost is ; and (c) it is the optimal sybil-proof mechanism, i.e., with the optimal cost.

To show the sybil-proofness, we only need to prove that the following strategy profile is a Nash equilibrium: for each node with distance less than from the root, if does not have an answer, the strategy of is to propagate the query and does not create sybils; if it has an answer, the strategy of is to report the answer without creating sybils. For the node at distance , the node will return the answer if it has one; otherwise, it will not propagate further and will not create sybils.

{lemma}

The DR mechanism with above defined reward allocation scheme is sybil-proof on chains. {proof} For the node with distance smaller than from the root, if it does not hold an answer, the expected reward for propagating the query is always larger than 0. Thus, it will choose to propagate. If has an answer, does not need to propagate the query, since it can fake any possible path from itself to a node. Furthermore, since the DR scheme only rewards nodes of the first levels, the node at depth from the root will not propagate the query.

Based on the above discussion, for the sybil-proofness, we only need to rule out the strategy that one node create sybils. For the node at distance from the root, it is not beneficial to generate fake nodes as the scheme only rewards the first nodes in the chain. So it is sufficient to consider the nodes with distance less than to the root.

Consider an agent with distance to the root. We first assume that does not have an answer. Denote as the expected reward of if duplicates additional sybils conditioned on the event that no node from the root to has an answer. Then we have

 D(i,k) =Ri+k+k⋅Ph−i−k=Ri+k−1−Ph−i−k+k⋅Ph−i−k ≤Ri+k−1+(k−1)⋅Ph−i−k+1=D(i,k−1).

The second equality is by Eqn. (10). One can then inductively show . In other words, node will not benefit by generating sybils if it does not have an answer.

Next we consider the case that has an answer and all previous nodes do not, i.e., is the first answer holder. Assume that generates sybils, where . Since the DR scheme only rewards the first answer, will place the answer at the last sybil. (Otherwise, it may just generate less sybils.) Then the reward gets in this case is

 k−1∑s=0r(i+s,k−s)+r(i+k,0)≤k−1∑s=0r(i+s,1)+r(i+k,0) = k−1∑s=0r(i+s,1)+h−1∑s=i+kr(s,1)+1=h−1∑t=ir(t,1)+1=r(i,0)

The inequality comes from the fact that . By the above inequality, gets the highest reward without duplication. Combining all above together, we have shown that the DR mechanism is sybil-proof.

### 4.2 Efficiency of the DR mechanism

Next we upper bound the expected total reward of the DR mechanism on chains. We first compute the values of in the following lemma. {proposition} For , . {proof} By Eqn. (10), , which implies . By definition, . The result on directly follows.

{lemma}

The expected cost of the DR scheme is . {proof} We first consider the upper bound. Clearly, the cost is dominated by the s, which is at most .

Now consider the lower bound. By Eqn. (10), for , .

By definition of , for , . Then for all , . Therefore, the expected cost is at least

 Ph/8⋅mini≤h/8{r(i,0)}≥nh232Ph/8⋅Ph/2.

The lower bound is attained since and .

We have the main result of this section.

Theorem 1 (restated) If the underlying branching process is a deterministic chain, there exists a sybil-proof direct referral incentive mechanism with expected cost , where is the desired level of agents the root wants to propagate the query and is the answer rarity.

### 4.3 Optimality of the DR mechanism on chains

In fact, our DR mechanism is optimal with respect to a general class of query incentive mechanisms on chains. To proceed with our discussion, we focus on the query incentive mechanisms with following properties:

1. The answer selection is deterministic on the chain. (Notice that both RW and SP are deterministic and identical on a chain.)

2. The mechanism is sybil-proof.

3. The mechanism must retrieve an answer if there is one in the first agents.

4. The reward allocation scheme is normalized, i.e., the reward to each node from the root to the first answer must be at least .

We call a query incentive network is regular if all four constraints are satisfied. Notice that our DR mechanism is regular. Our DR mechanism is in fact has the smallest cost in all resulting configurations among all regular query incentive mechanisms. We provide a proof in Appendix A.

{theorem}

The DR mechanism on chains is an optimal regular query incentive mechanism.

## 5 Sybil-proof DR mechanisms on arbitrary branching processes

In this section, we present a sybil-proof query incentive mechanism for an arbitrary branching processes with branching factor . In particular, we present a Direct Referral (DR) mechanism for any interested height . We will show that the DR mechanism, which uses the SP answer selection scheme, is sybil-proof. After that, we show that the expected cost for the DR scheme is . It is desirable to point out that compared with previous works, our analysis does not depend on a particular height , and it does not require .

### 5.1 Direct referral query incentive mechanisms with height h

The DR mechanism will be using the SP answer selection scheme. Similar to the chain case, the reward allocation scheme in the DR mechanism is oblivious. In particular, the reward allocation scheme can be described by a function , where is the reward of the -th agent in the selected answer path and the selected answer holder is the -th agent for and .

To simply the notation, we decompose the referral reward (with ) into two parts: . is for the direct referral and ensures that the reward is at least 1. Specifically, the referral rewards in our the DR mechanism for height with and is defined as :

 x(i,s)=⎧⎪ ⎪⎨⎪ ⎪⎩maxi+1≤j≤h{x(j,1)+j−iλi+1h∑ℓ=i+1λℓ},if i≤h−1 and s=1,0,otherwise. (11)

and

 y(i,s)={1if i+s≤h and s>1,0,otherwise. (12)

Finally, the reward of the selected answer holder at level is defined for as follows. and for ,

 ai=x(i,1)+ai+1+1⇒ai=(h−i+1)+∑i≤j≤hx(j,1). (13)

Informally, in the above reward scheme, if the selected answer is at level , the answer holder receives , its direct referral at level receives and all other agents from the root to the selected answer holder receive . For simplicity, we abbreviate the sequence as .

Remark: Our mechanism requires a specified height . Furthermore, to compute the rewards, we need the knowledge of the branching process. Notice that, such knowledge is also required in previous literature to compute the initial reward as well as the contracts between the agents.

### 5.2 Sybil-proofness of the DR query incentive mechanism

We show that the DR mechanism defined above is sybil-proof. Eqn. (13) implies that if one agent at level has an answer, it does not have incentive to generate sybils, i.e., is its largest possible reward and generating sybils will not increase its chance to be selected. So to show the sybil-proofness of the DR mechanism, we only need to argue for those agents who do not have answers. The following lemma characterizes the probabilities that one agent is rewarded when it does not have an answer.

{proposition}

For an agent at level in the underlying -ary tree of the branching process, let be the event that is on the path from the root to the selected answer holder and be the event that is a child of . We have

 Pr[Rev(v)]=∑hj=i+1λidi % and Pr[DR(v)]=λi+1di.

Let be the event that does not have an answer.

 Pr[Rev(v)|NA(v)]=nn−1⋅∑hj=i+1λidi and Pr[DR(v)|NA(v)]=nn−1⋅λi+1di.
{proof}

happens if the selected answer is at level or lower, which is with probability as the mechanism only retrieves an answer in the first levels. Notice that there are exactly agents at level . (The root is treated as level .) By symmetry, the probability that is in the subtree of is exactly . Therefore, . By the same argument, we have .

Now consider the two probabilities conditioned on the event with . Notice that (resp. ) implies . In other words, (resp. ). The results then come from Bayes’ theorem.

{lemma}

The above DR mechanism is sybil-proof. {proof} As discussed earlier, it is sufficient to consider agents that do not have answers. Consider such an agent at level . Suppose will get more reward by generating sybils and attach its original subtree to the last sybil.

Consider the case that does not create sybils. Conditioning on the even , i.e. does not have an answer, by Proposition 5.2, the probability that receives the direct referral fee is and the probability that the answer selected is in the subtree rooted at is .

Notice that if generates sybils, both probabilities will decrease. This can be shown by coupling the randomness in both cases. For any fixed realization of the branching process and the answer placement, the probability that a particular answer in ’s subtree is selected is always higher (or equal) in the case that does not generate sybils by the SP answer selection scheme.

Now we consider the rewards in both cases. If does not generate sybils, the expected reward is

 R0i=nn−1⋅⎛⎝λi+1di⋅xi+∑hj=i+1λjdi⎞⎠. (14)

The first part is the reward from direct referral and the second part comes from . For the case that generates sybils, the expected reward of and its sybils is at most

 Rji≤nn−1⋅⎛⎝λi+1di⋅xi+j+(j+1)⋅∑hj=i+1λjdi⎞⎠. (15)

By Eqn. (11), , which implies . This is a contradiction. Therefore, agent will not benefit by generating sybils if it does not have the answer. Hence the DR mechanism is sybil-proof.

### 5.3 The expected cost of the DR mechanism

Now we start analyzing the expected cost of the DR mechanism, which can be described as follows

 h∑i=1λi⋅ai+h∑i=2λi⋅xi−1+h∑i=1λi⋅(i−1) (16)

The first term is the reward to the answer holder. The second term is the reward for the direct referral and the third term is for all other agents that forwarded the answer. Before we analyze the cost in Eqn. (16), we characterize the sequence .

###### Observation 5.1

The sequence of is decreasing.

{proposition}

For , , where is the constant in Lemma 3. {proof} We prove the statement by induction. The statement holds for as . Suppose it holds for all . Now consider . By construction,

 xj−1=maxℓ≥j{xℓ+(ℓ−j+1)λjh∑s=jλs}≤maxℓ≥j{γ⋅(h−ℓ)+γ⋅(ℓ−j+1)}=γ⋅(h−j+1).

The inequality is by Lemma 3 (4).

In other words, we will not pay a lot of referral fee for agents beyond level .

{proposition}

For all , we have . {proof} The statement is proved by induction. We first consider the case . By definition in Eqn. (11),

 xℓ∗ =maxj≥ℓ∗+1{x(j)+j−ℓ∗λℓ∗+1h∑ℓ=ℓ∗+1λℓ}≤maxj≥ℓ∗+1{x(j)+(j−ℓ∗)⋅(1+γ⋅λℓ∗+2λℓ∗+1)} ≤maxj≥ℓ∗+1{γ⋅(h−j)+(j−ℓ∗)⋅(1+γ)}≤(γ+1)⋅(h−ℓ∗). (17)

The first inequality is by Lemma 3 (4) and the second inequality is by Lemma 3 (1) and Proposition 5.3.

Now suppose it holds for all . Consider , we have

 λj⋅xj−1=maxℓ≥j{λj⋅xℓ+(ℓ−j+1)h∑k=jλk}≤maxℓ≥j{λj⋅xℓ+(ℓ−j+1)}. (18)

Assume the term is optimized at . Consider two cases:

Case 1: . In this case, by Proposition 5.3 and Eqn. (17).

Case 2: . We have

 λj⋅xj∗+(j∗−j+1)≤λj∗+1xj∗+(j∗−j+1) ≤(γ+1)⋅(h−j∗)+(j∗−j+1) ≤(γ+1)⋅(h−j+1).

The first inequality comes from by Lemma 3 (1). The second inequality is by induction. Therefore, in both cases, we have .

We are ready to show the expected cost of the above mechanism.

{lemma}

The expected reward of the DR query incentive mechanism is .

{proof}

By Proposition 5.3 and Proposition 5.3, the total referral fee is

 h∑i=2λi⋅xi−1≤ℓ∗∑i=2(γ+1)(h−i)+γ⋅(h−ℓ∗−1)⋅∑j≥ℓ∗+1λj=O(h2). (19)

Now we analyze the total expected reward for answer holders.

 h∑i=1λi⋅ai=h∑i=1λi⋅(h−i+1+h∑j=ixj) ≤h+h∑j=1xjj∑i=1λi ≤h+ℓ1−1∑j=1xjj∑i=1λi+ℓ∗∑j=ℓ1xj+h∑j=ℓ∗+1xj, (20)

where is defined in Lemma 3. We inspect the terms individually. Consider the second term in Eqn. (20). By Lemma 3 (2), grows at a rate of until for . Therefore, for any , . Then

 ℓ1−1∑j=1xjj∑i=1λi≤ℓ1−1∑j=1xj⋅λj⋅ρρ−1≤ℓ1−1∑j=1xj⋅λj+1⋅ρρ−1=O(h2).

The second inequality is by the fact for in Lemma 3 (1) and the last equality comes from Eqn. (19).

For the third term, by Proposition 5.3 for any ,

 xj≤(γ+1)(h−j)⋅1λj+1.≤(γ+1)(h−j)⋅1λℓ1=O(h).

The last equality comes from by Lemma 3 (3). Also by Lemma 3 (3), and the third term in Eqn. (20) is .

Finally, by Proposition 5.3, the last term in Eqn. (20) is . We conclude that

 h∑i=1λi⋅ai=O(h2). (21)

Combining Eqn. (19) and Eqn. (21), the expected cost of Eqn. (16) is .

We obtain the main result of this paper.

Theorem 1 (restated) For any branching process with branching factor , there is a constant such that for any answer rarity