Rating Protocol Design for Extortion and Cooperation in the Crowdsourcing Contest Dilemma
Abstract—Crowdsourcing has emerged as a paradigm for leveraging human intelligence and activity to solve a wide range of tasks. However, strategic workers will find enticement in their self-interest to free-ride and attack in a crowdsourcing contest dilemma game. Hence, incentive mechanisms are of great importance to overcome the inefficiency of the socially undesirable equilibrium. Existing incentive mechanisms are not effective in providing incentives for cooperation in crowdsourcing competitions due to the following features: heterogeneous workers compete against each other in a crowdsourcing platform with imperfect monitoring. In this paper, we take these features into consideration, and develop a novel game-theoretic design of rating protocols, which integrates binary rating labels with differential pricing to maximize the requester’s utility, by extorting selfish workers and enforcing cooperation among them. By quantifying necessary and sufficient conditions for the sustainable social norm, we formulate the problem of maximizing the revenue of the requester among all sustainable rating protocols, provide design guidelines for optimal rating protocols, and design a low-complexity algorithm to select optimal design parameters which are related to differential punishments and pricing schemes. Simulation results demonstrate how intrinsic parameters impact on design parameters, as well as the performance gain of the proposed rating protocol.
Crowdsourcing has emerged as a new data-collection and problem-solving model, offering a distributed and cost-effective approach to obtain needed data or services by soliciting contributions from a large group of people in the online community . To crowdsource a task, the requester submits it to a crowdsourcing platform with an associate reward. People who can accomplish the task, called workers, can choose to work on it and devise solutions to the requester for exchanging the payment via the crowdsourcing platform. Over the past decade, techniques for securing crowdsourcing operations have been expanding steadily, so is the number of applications of crowdsourcing . However, workers in a crowdsourcing platform have the opportunity to exhibit antisocial behaviors due to the openness of crowdsourcing, and hence crowdsourcing is deprived of its imaginal shine when collective efforts are derailed or severely hindered by elaborate sabotage .
Motivated in part by the DARPA Network Challenge , a crowdsourcing contest dilemma game was recently proposed in . It occurs a non-cooperative situation where two workers compete for a given task in a two-stage game, where the worker with a better solution wins the prize, and the loser gets nothing. In the first stage, each of these two workers determines whether he would like to devote a high level or low level effort. In the second stage, workers have the option of attacking or not attacking their opponents (e.g., disrupting the opponent’s solution, creating multiple identities to carry out a Sybil attack, etc.) depending on whether the attack allows them to get ahead. The equilibrium analysis shows that workers will find enticement in their self-interest to free-ride by taking the payment and choosing the in-house strategy (i.e., low level effort) in the first stage, while in the second stage, the expected number of attacks is one, regardless of any choice of intrinsic parameters, thereby greatly reduces social utility, which is a social dilemma.
The main reason why workers in the above two-stage game have the incentive to free-ride and attack is the absence of punishments for such behaviors. On the one hand, self-interested strategic workers are inclined to adjust their strategies over time to maximize their own utilities. On the other hand, they can not receive an direct and immediate benefit by following the recommended strategy (choosing crowdsourcing and not attacking in the first and second stage, respectively). Such a conflict leads to an inevitable fact that, many workers would be apt to free-ride to take the reward but refuse to pay efforts, and attack their opponents in order to be in the lead. Therefore, the main challenge in crowdsourcing competitions is how can the requester incentivize workers to comply with the social norm, i.e., workers who choose the crowdsourcing strategy in the first stage and do not attack their opponent in the second stage should be rewarded immediately, otherwise, they should be punished.
Although a variety of incentive mechanisms based on pricing, reputation and reciprocity schemes have been explored to induce cooperation in crowdsourcing [6, 7, 8, 9, 10, 11], existing mechanisms are not sufficiently effective due to the following features: heterogeneous workers compete against each other in a crowdsourcing platform with imperfect monitoring, and they can freely and frequently change their opponents. Hence, in order to compel selfish workers to follow the social norm and overcome the inefficiency of the socially undesirable equilibrium, it is of great importance to design optimal incentive mechanisms by taking these features into consideration.
In this paper, we aim to develop a novel game-theoretic design of rating protocols to address the crowdsourcing contest dilemma. The main topic of this paper is to maximize requesters’ utilities by extorting selfish workers and enforcing cooperation among them, i.e., paying workers as little as possible while providing sufficient incentives for individual workers to follow the social norm in order to sustain high-performance crowdsourcing platform, and thus evade socially undesirable equilibrium. To the best of our knowledge, standing from the requester’s point of view to maximize his utility in all time periods is rarely studied by other works. We believe studies on this topic is urgent since cost-efficiency is one of the main attractions of crowdsourcing, requesters may not have enough incentive to post tasks via a crowdsourcing platform if they cannot earn enough benefit. In this paper, we analyze how cooperation can be enforced and how to maximize the utility of a requester under the designed rating protocol by extorting selfish workers and enforcing cooperation among them. Our work is based on game theory because it has been found to be a powerful tool to study strategic interactions among selfish and rational individuals and design incentive mechanisms to stimulate cooperation among them [12, 13, 14].
I-a Main Contributions
The following is a list of our main contributions.
Standing at the protocol designer’s point view, we explore the strategy of requesters aiming to maximize their utilities on all tasks and provide workers sufficient incentives of contributing good behaviors in order to sustain high-performance crowdsourcing. To the best of our knowledge, this is the first work achieving extortion and cooperation simultaneously in crowdsourcing competitions.
Workers’ heterogeneity is taken into consideration when designing utility functions of a worker in the two-stage game, and thus we model the crowdsourcing contest dilemma as an asymmetric game. This makes our rating protocol applicable in heterogeneous crowdsourcing platforms.
A novel game-theoretic design of rating protocols that integrate binary rating labels with differential pricing is developed to incentivize workers to contribute good behaviors. Our rating protocol can achieve the social optimum, which is easy to design and flexible to implement.
Differential punishments are used to transfer payoffs from low-rating workers to high-rating workers, which can reduce performance loss in the presence of imperfect monitoring while providing sufficient incentives for individual workers to follow the social norm.
The problem of designing an optimal rating protocol that maximizes the revenue of the requester among all sustainable rating protocols is formulated, we rigorously analyze how heterogeneous workers’ behaviors are influenced by intrinsic parameters and design parameters as well as the workers’ evaluation of their individual long-term utilities, in order to characterize the optimal design.
A low-complexity algorithm is proposed to select optimal design parameters which are related to differential punishments and pricing schemes. Simulation results show the validity and effectiveness of our proposed algorithm for crowdsourcing contest dilemma.
I-B Related Work
In recent decades, it has already been noticed that there is an urgent need to stimulate cooperation among self-interested workers in crowdsourcing by introducing incentive mechanism [7, 8, 15]. There exist many types of incentive mechanisms such as pricing and reputation . Incentive mechanisms based on pricing incentivize individuals to provide good behaviors relying on monetary or matching rewards in the form of micropayments, which in principle can achieve the social optimum by internalizing external effects of self-interested individuals [17, 18]. However, as pointed out by ,  and , if an inefficient pricing based incentive mechanism is applied, “free-riding” happens when rewards are paid before a task starts, a worker always has the incentive to take the reward but refuse to devote efforts, whereas if rewards are paid after the task is complete, “false-reporting” arises since the requester has the incentive to lower or refuse the reward to workers by lying about the outcome of the task. Incentive mechanisms based on reputation scheme, on the other hand, take individuals’ reputation into consideration, which reward and punish individuals according to individuals’ past behaviors [10, 21, 22, 23]. A central reputation entity can offer a robust method to sustain cooperation . However, such an approach needs necessary traffic monitoring mechanisms, which will put a great burden on the central entity, and make it impractical in a large community. Alternatively, a distributed adaptive reputation scheme was proposed to provide a dynamical updating of reputation . The distributed reputation scheme, which does not rely on a central bank to control the currency, involves more complicated reputation update .
Traditional pricing and reputation schemes being used separately may be inefficient in a crowdsourcing contest dilemma game in which workers are part of a community and repeatedly interact. This is because workers’ behaviors are influenced by incurred costs and designed payment, as well as their long-term utilities, which cannot be solely determined by a pricing scheme. Besides, workers choose to crowdsource and devise solutions in exchange for payment, increasing workers’ reputation without differential payment cannot decrease their malicious behaviors. Recently, a considerable amount of efforts have been devoted using game theory to analyze how to maximize the social welfare while enforcing cooperation among individuals under a designed incentive mechanism [14, 27, 28, 29, 30]. Such incentive mechanisms are based on the principle of reciprocity and can be classified into direct reciprocity and indirect reciprocity . In a direct reciprocity mechanism, individuals can identify each other, and behaviors between them are based on their personal experience with each other. Direct reciprocity mechanism is highly effective in sustaining cooperation in a small system where individuals can identify each other and interact frequently with fixed partners . However, in most crowdsourcing platforms, workers have asymmetric service requirements and they can freely and frequently change their partners. Hence a personal history of past reciprocation with the same partner cannot be established. To encourage self-interested workers to provide good behaviors in crowdsourcing platforms, indirect reciprocity solutions have been proposed [28, 33], in which individuals decide their actions based on the available information including indirect information. Hence, an individual can be rewarded or punished by other individuals in a crowdsourcing platform even they have not had past interactions with him .
To implement indirect reciprocity in crowdsourcing, it is important to share as little as possible but enough amount of information about past interactions in a platform. The use of rating labels as a summary record of an individual requires significantly less amount of information being maintained, hence, the rating based incentive mechanism has a potential to form a basis for successful incentive mechanisms in a crowdsourcing platform. M. Kandori original proposes a rating protocol for a large anonymous society , in which each individual is attached a rating label based on the individual’s past behaviors indicating his social status in the system. Under a rating protocol, the rating label of an individual who complies with (resp. deviates from) the social norm goes up (resp. down), and individuals with different rating labels are treated differently by the other individuals they interact with. Hence, an individual with high/low rating label can be rewarded/punished by other individuals in the crowdsourcing platform who have not had past interactions with him. Recently, a variety of rating protocols have been explored to force cooperation in crowdsourcing platforms [11, 34, 36, 37, 38]. However, as shown in , several factors hinder the direct implementation of these works in crowdsourcing competitions. These factors can be summarized as: (i) competitive relations exist not only between workers, but also between workers and requesters; (ii) In the presence of imperfect monitoring, individual’s rating may be inaccurate; (iii) workers can freely and frequently change their opponents in the two-stage game. Because of these features of crowdsourcing competitions, it is important to overcome the inefficiency of the socially undesirable equilibrium by designing a novel rating protocol.
In our previous work , we have designed a rating protocol based on game theory to address the crowdsourcing contest dilemma. In such a rating protocol, we capture the fundamental aspect of providing necessary and sufficient incentives for workers to contribute good behaviors. Yet, it only considered the expected one-period utility. As a result, incentives for requesters to crowdsource task will be greatly reduced or eliminated, because no requester will choose to crowdsource tasks if they pay more than they earn. Therefore, from the protocol designer’s point of view, it is crucial to maximize requesters’ utilities while sufficient incentives are provided for workers to contribute good behaviors in order to sustain high-performance in the crowdsourcing. Besides, our previous work assumes that workers are homogenous, which is a strong assumption as many crowdsourcing scenarios are heterogeneous in the sense that workers’ abilities are different. Therefore, in order to apply the designed rating protocol to heterogeneous crowdsourcing platforms, it is necessary to take workers’ heterogeneity into consideration when designing utility functions.
I-C Paper Organization
The remainder of this article is organized as follows. In section II, we describe the crowdsourcing contest dilemma game with rating protocols. In section III, we formulate the problem of designing an optimal rating protocol with constraints. Then we design the optimal rating protocols in Section IV. Section V presents simulation results to illustrate key features of the designed rating protocol. Finally, conclusions are drawn in Section VI.
Ii System Models
Ii-a System Setting
|cost of the ith-worker in the first stage.|
|cost of the ith-worker in the second stage.|
|damage inflicted by an attack.|
|probability that errors occur in the ith-stage game.|
|discount factor to denote workers’ patience.|
|The strength of reward imposed to workers.|
|strength of punishment imposed to workers.|
|payment rewarded to -worker.|
|set of rating labels.|
|expected one-period utility of a worker.|
|expected long-term utility of a worker.|
|expected one-period utility of a requester.|
|social utility under the rating protocol|
In a crowdsourcing contest dilemma game [5, 6], two competing workers interact with each other in a two-stage game to obtain a better solution to a task which can be crowdsourced. In the first stage, each of these two workers can choose to achieve a given task via crowdsourcing (denoted as C) or solve the problem in-house (denoted as S). As any C strategy is costly, and workers’ costs are different by considering their heterogeneity, we assume that the ith-worker will consume a cost for selecting C, while the cost is approximated to 0 if choosing S. In the second stage, workers decide whether or not to attack their opponents (attacking is denoted by A, while not attacking is denoted by N). Similarly, an attack is costly, we assume that the ith-worker will consume cost to attack his opponent, in order to inflict the damage (the attacking process is socially valuable only if ). Each one of , and can be referred to as a fraction of the total reward , which is normalized to be 1. The discount factor is the rate at which a worker discounts his future payoff and reflects the patience of him. Taking into account imperfect monitoring, the outcome of the task received by the requester is inconsistent with the effort of the worker. Let and denote the probabilities that monitoring or reporting errors occur in the first stage and the second stage, respectively. In short, there exist eight intrinsic parameters in a crowdsourcing contest dilemma game, namely , , , , , , and . We summarize these parameters as well as other notations used in the paper in Table I.
In the above proposed model, workers’ heterogeneity (i.e., workers costs and abilities are different) is taken into consideration, which is the only difference from our previous one . This makes our model more close to reality. The entire pay-off matrix for the revised game played in the first stage was depicted in Table II, where we set , ). The concrete computation process for Table II is given in Appendix A.
In the resulting equilibrium, for any choice of intrinsic parameters, only CC and SS can be pure strategy equilibria in the first stage, and malicious behavior is the normal, not the abnormal, i.e., the expected number of attacks is one regardless of any choice of intrinsic parameters, which is contrary to the conventional wisdom in the area. In other words, workers find enticement to free-ride and attack in their self-interest, an inefficient outcome arises for such a myopic equilibrium.
Ii-B Rating Protocol
In order to overcome the inefficiency of the socially undesirable equilibrium by extorting selfish workers and enforcing cooperation among them, i.e., taking the requesters’ point of view and trying to maximize their average utilities on all tasks, while provide sufficient incentives to compel rational and selfish individuals to contribute good behaviors in tasks, we devote with rating protocol to incentivize self-interested workers to comply with the social norm, and thus evade the myopic equilibrium. In this paper, we integrate binary rating labels with differential pricing to incentivize workers to contribute good behaviors. In order to provide enough incentives by transferring payoffs from low-rating workers to high-rating workers, we use differential punishments that punish workers with different ratings differently. The proposed rating protocol is defined as follows.
A rating protocol is defined as the rules that a crowdsourcing platform uses to regulate the behavior of his workers, and is represented as a quadruple , i.e., a set of binary rating labels , a social strategy , a rating scheme , and a pricing scheme .
denotes the set of binary rating labels, where 1 is the good rating and 0 is the bad rating.
represents the adopted social strategy for a worker with rating label , where .
specifies how a worker’s rating should be updated based on his adopted strategies and current rating as follows:
defines the rules that rewarding/punishing workers by implementing differential prices for the contributions according to the rating of workers:
where is the minimal price, is the maximal price.
Remark: A schematic representation of a rating scheme based on Definition 1 is provided in Figure 1. Under the rating update rule, if the social strategy adopted by a worker with rating 0 is observed to be CN (i.e., the worker chooses C in the first stage game and chooses N in the second stage game.), his rating will increase to 1 with probability , and hold rating 0 with probability ; otherwise, if the social strategy adopted by a worker with rating 0 is not CN, he will hold rating 0 with probability 1. The analysis for the scenario of a worker with rating 1 follows in a similar manner. Hence, can be referred to as the strength of reward imposed on workers when they contribute good behaviors, while can be referred to as the strength of punishment imposed on workers when they do not follow the recommended strategies. Other more elaborated rating update rules may be considered, but we will show that this simple one is good enough to enforce cooperation among self-interested workers, and maximizes the social utility of the requester. In order to enforce incentives on service provision, the protocol designer assigns workers with high rating higher payment as reward and hence, workers are encouraged to contribute good behaviors to increase their ratings for a return of higher payment. To determine the range of feasible prices and , we assume that the maximum benefit which a requester obtains from receiving one unit of service is 1. It is obvious that the feasible prices should be restricted to .
Iii Problem Formulation
Iii-a Stationary Rating Distribution
Given a rating protocol , suppose that each worker always follows the given recommended strategy , who is called as a “compliant worker”, where the worker who deviates from social norm and plays is called as a “non-compliant worker”. As time passes, the ratings of compliant workers are updated, hence the distribution of ratings in a crowdsourcing platform evolves as a Markov chain, whose transition probability is determined by the recommended strategy employed by workers. Since we are interested in the long term utilities of workers, we study the rating distribution in the long run when all workers follow the given recommended strategy . Let be the fraction of -workers (i.e., a worker with rating is denoted by -worker) in the total population at the beginning of an arbitrary period t, then the transition from to is determined by the rating scheme , as shown in Eq. (3).
Where denotes the transition probability that a -worker from the current period becomes a -worker in the next period under the rating protocol when the worker’s action is , which can be expressed as
As , with simple manipulations, there exists a unique stationary rating distribution , which is derived as follows:
Iii-B Sustainable Conditions
Given a fixed pricing scheme under the rating protocol , the expected one-period utility of a worker is only determined by his adopted strategy as well as his current rating label, which can be expressed as follows:
Where the expected payoff associated with actions CN and CA under monitoring or reporting errors can be found according to Eq.(38) in Appendix B. Since a non-compliant worker may benefit by unilaterally deviating from the recommended strategy CN, and myopically choosing strategies CA, SA or SN, then his expected payoff has its maximal value when it chooses strategy CA with probability 1, i.e., a worker’s optimal strategy choice is binary regardless of his current rating label: either CN or CA, and hence there is no need to consider the remaining two strategies SA and SN that a non-compliant worker may choose.
The expected long-term utility of a -worker is the infinite-horizon discounted sum of his expected one-period utility with his expected future payoff multiplied by a common discount factor , which can be computed by solving the following recursive equation
Where is the rate at which a worker discounts his future payoff, and reflects his patience. It is obvious that a more patient worker has a larger discount factor.
According to the self-interested strategic nature of workers, they are willing to comply with the recommended strategy , if and only if they find it advantageous to their self-interest, i.e., they cannot be benefitted with respect to their expected long-term utility upon deviations given the deployed rating protocol . Such a rating protocol is called a sustainable rating protocol, and the sustainability of it is correspondingly defined as follows:
A rating protocol is sustainable if the optimal social strategy for a -worker is .
When a -worker follows the recommend strategy under the rating protocol , he receives expected long-term utility . On the contrary, he will receive expected long-term utility when he deviates from the recommended strategy to . By comparing these two payoffs, we can derive necessary and sufficient conditions for a rating protocol to be sustainable, as shown in Proposition 1.
A rating protocol is sustainable if and only if
For the “if” part: Given a sustainable rating protocol , a -worker will comply with the optimal social strategy with , his expected long-term utilities can be expressed as follows:
According to the one-shot deviation principle , the expected long term utility of a worker with rating 0 unilaterally deviating from to only in the current period and following afterwards, which can be computed by solving
Similarly, the expected long-term utility of a deviating worker with rating 1 is given by
For the “only if” part: Suppose that inequality (8) is satisfied under the rating protocol , we can obtain and for any . With this in mind, we can derive the range for these design parameters (i.e., , , and ) in (8), and thus design a rating protocol under which sustains the incentive for workers to comply with the recommended strategies. Hence, this proposition follows. \qed
Iii-C Rating Protocol Design with Constraints
Given a sustainable rating protocol , each worker always chooses to devote a high level of effort in the first stage and does not attack his opponent in the second stage, hence, the requester’s utility will be only determined by the payment that he rewards the winner, which is to the winner having rating 1, and to the winner having rating 0. Let denote the expected one-period utility of the requester in the case that the winner has rating , taking into account monitoring or reporting errors and . The expression of can be derived as follows:
Where is the utility of the requester when he reports that his task has been fulfilled, which happens with probability . When the task has not been accomplished, which happens with probability , the utility of the requester is , i.e., the request suffers a payment , and receives no benefit.
In a two-stage game of a crowdsourcing competition, the worker who has a higher productivity at the end of the second stage wins reward , whose value is determined by the winner’s rating label . Let denote a focal worker with rating competing against his opponent worker with rating , to win the reward. There are 4 possible combinations for namely (0,0), (0,1), (1,0) and (1,1). The computation process for the social utility of the requester in these four cases is as follows:
is the probability that the focal worker with a rating competes against the opponent worker whose rating is . When both of the focal worker and his opponent worker comply with the recommended strategy, each one is equally likely to win the reward, and the expected probability for each one to be the winner is . Given a sustainable rating protocol , strategic workers will find enticement to follow the recommended strategy in their self-interest regardless of their ratings even under the imperfect monitoring, and which worker wins the game depends only on the strategies adopted by both workers, hence, the probability of becoming the winner is independent of the workers’ ratings and thus each worker has the same probability of becoming the winner.
Let denote the social utility under the sustainable protocol , which can be formulated as follows:
We assume that the protocol designer is profit-seeking and aims to design a sustainable rating protocol that maximizes the requester’s expected one-period utility, which is the benefit per period of a requester, deducts the payment reward to workers. In order to attract workers to stay in the platform for a long period of time, additional incentive constraints are needed to prevent the long period utility of a worker to be negative, regardless of his current rating label. Given this, the problem of designing a rating protocol can be formally expressed as:
The rating protocol design problem is formulated as follows:
Iv Optimal Design of Rating Protocols
In this section we investigate the design of an optimal rating protocol that solves the design problem of Eq. (22), i.e., selecting the optimal rating scheme and optimal pricing scheme , which are determined by four design parameters . In order to characterize an optimal design, which is denoted as , we investigate the impacts of four design parameters on the social welfare , and the incentive for satisfying constraints in Eq. (22).
Iv-a Constraints in the Optimal Design Problem
In order to maximize the social utility, the requester wants to pay his workers as little as possible, on the premise of supporting enough incentive for workers to follow the recommend strategy CN, and stay in the system for a long period. The above observations are summarized in the following theorem.
is always an optimal solution to (22).
Given and , the social utility monotonically decreases with , which is maximized when . Let us assume that a worker deviates from the social norm, his rating will be decreased to 0 with a large probability, and he will receive the reward as a punishment. Hence, the worker’s incentive is also maximized when . Therefore, this statement follows. \qed
In the remainder of our design, we set by default without further notice. Given a fixed , variables , , , , , and are defined in Eq. (23) for convenience.
Given a fixed , constraints in the rating protocol design problem (22) for th-worker are satisfied if and only if .
For the “if” part: Given a fixed , we assume that each constraint in the rating protocol design problem (22) are satisfied, there are two constraints that need to be fulfilled simultaneously for the ith-worker.
Firstly, we need to satisfy the one-shot deviation principle, and thus have the lower bound of by substituting (11) into (8), i.e., . If the ith-worker holds rating 0, then his expected one-period utility , this is because the ith-worker will receive no reward but consume by choosing C in the first stage. The expected one-period utility of a worker with rating 0 whose adopted strategy is CA will loss more than the worker with rating 0 whose adopted strategy is CN, because the former one consumes in the first stage, and in the second stage, but the latter one only consumes in the first stage, i.e., . Therefore, we have , and regardless of any choice of and , we thus have .
Secondly, in order to make sure that workers have sufficient incentive to contribute good behaviors, we must hold both and . Since the worker with rating 1 will obtain reward , which is higher than the worker with rating 0 whose reward is . We thus only have to make sure that the long-term utility of the worker with rating zero must be non-negative which corresponds to the upper bound of by substituting (11) into (9), i.e., . Hence, this statement follows.
For the “only if” part, we assume that , it is easy to determine that constraints in the rating protocol design problem (22) are satisfied, and hence the “only if” part can be proved, which is omitted here. \qed
Given a fixed , we have
Social utility monotonically increases with .
Iv-B Optimal Value of the Design Problem
We now focus on the optimal value of the remaining three design parameters and . As shown in the proof of Proposition 2, there exist two constraints that need to be fulfilled simultaneously for the ith-worker. Similary, there exist another two constraints for the jth-worker, and hence we have
In the following, we design a rating protocol that achieves the social optimum at the equilibrium under the conditions that all of these four constraints hold. We assume that is fixed, the four constraints constitute a convex set Q, i.e., if and only if , the two constraints in Proposition 2 will be satisfied simultaneously. Since the feasible domain of is a convex set, the optimal solution must be at the boundary of the feasible domain, i.e., or , thereby we find the local optimum in two cases and then find a global optimum from such two cases.
The output derived by Algorithm 1 is an optimal solution of (22).
Algorithm 1 takes , , , , , , and as input and returns . It consists of 3 parts: Part 1 (line 1-4) determines the constrained parameters, i.e., , , , and ; Part 2 (line 5-16) determines whether there exists a feasible solution with the remaining three design parameters satisfying constraints in (22). If it is true, then we obtain a partial optimal solution; Part 3 (line 17-20) returns the global optimal solution according to all the local optimal solutions we have found. The computational complexity of Algorithm 1 is , where m and n represent the reciprocal of accumulation unit for determining and the number of iterations for computing the expected long-term utility in (7), respectively. The detailed explanation of this algorithm is as follows:
Part 1 (line 1-4): According to (23), it is easy to determine that if , and we thus have . If we require that these two constraints hold simultaneously, we only need to make sure , as shown in line 1 and 2. Similarly, if , we then only need to make sure , as shown in line 3 and 4.
Part 2 (line 5-16): It is obvious that the feasible set formed by constraints and is a convex set, and hence the optimal values of and must at the endpoint. According to (23), it is easy to determine that , , and both and are positive or negative. According to Proposition 3, and the social utility monotonically increases with . Therefore, we obtain two possible solutions that either or , and divide it into four cases as shown in Figure 2 by graphical method for the linear programming problem.
Case and (line 5-10): The feasible domain of is illustrated as shown in Figure 2(a) and 2(b) when . We assume that for Case , and have according to (23). Therefore, it is easy to determine that holds. Moreover, we need to make sure that holds for . In Case as shown in Figure 2(b), we assume that , and ensure to hold for . As the social utility increases as the slope increases, is directly derived. Let and denote the smallest and largest value of (i.e., the value of when is fixed), respectively, such that holds. Since the social utility decreases with , hence . The expression of the social utility under the condition that holds is as follows (where we set ):
Case and (line 11-16): The feasible domain of is illustrated as shown in Figure 2(c) and 2(d) when . We assume that for Case , which is similar with Case , we only need to ensure holds for . In Case as shown in Figure 2(d), we assume that , and need to make sure that holds for . Since the social utility monotonically increases as the slope increases, we then have . Let and denote the smallest and largest value of when is fixed, respectively, such that holds. As the social utility increases with when