A Social Welfare Optimal Sequential Allocation Procedure
Abstract
We consider a simple sequential allocation procedure for sharing indivisible items between agents in which agents take turns to pick items. Supposing additive utilities and independence between the agents, we show that the expected utility of each agent is computable in polynomial time. Using this result, we prove that the expected utilitarian social welfare is maximized when agents take alternate turns. We also argue that this mechanism remains optimal when agents behave strategically.
A Social Welfare Optimal Sequential Allocation Procedure
Thomas Kalinowski Universität Rostock Rostock, Germany thomas.kalinowski@unirostock.de Nina Narodytska and Toby Walsh NICTA and UNSW Sydney, Australia {nina.narodytska,toby.walsh}@nicta.com.au
1 Introduction
There exist a variety of mechanisms to share indivisible goods between agents without side payments [?; ?; ?; ?; ?]. One of the simplest is simply to let the agents take turns to pick an item. This mechanism is parameterized by a policy, the order in which agents take turns. In the alternating policy, agents take turns in a fixed order, whilst in the balanced alternating policy, the order of agents reverses each round. Bouveret and Lang (?) study a simple model of this mechanism in which agents have strict preferences over items, and utilities are additive. They conjecture that computing the expected social welfare for a given policy is NPhard supposing all preference orderings are equally likely. Based on simulation for up to 12 items, they conjecture that the alternating policy maximizes the expected utilitarian social welfare for Borda utilities, and prove it does so asymptotically. We close both conjectures. Surprisingly, we prove that the expected utility of each agent can be computed in polynomial time for any policy and utility function. Using this result, we prove that the alternating policy maximizes the expected utilitarian social welfare for any number of items and any linear utility function including Borda. Our results provides some justification for a mechanism in use in school playgrounds around the world.
2 Notation
We have items and agents. Each agents has a total preference order over the items. A profile is an tuple of such orders. Agents share the same utility function. An item ranked in th position has a utility . For Borda utilities, . The utility of a set of items is merely the sum of their individual utilities. Preference orders are independent and drawn uniformly at random from the set of all possibilities (full independence). Agents take turns to pick items according to a policy, a sequence . At the th step, agent chooses one item from the remaining set. Without loss of generality, we suppose . For profile , denotes the utility gained by agent supposing every agent always chooses the highest ranked item in their ranking from the available items. We write for the expectation of over all possible profiles. We take an utilitarian standpoint, measuring social welfare by the sum of the utilities: . By linearity of expectation, the expected utilitarian social welfare is . To help compute the expected utilities, we need a sequence given by , for and for all . Asymptotically To simplify notation, we suppose empty sums are zero and empty products are one.
3 Computing the Expected Social Welfare
Bouveret and Lang ? conjectured that it is NPhard to compute the expected social welfare of a given policy. This calculation takes into account a superexponential number of possible profiles. Nevertheless, as we show here, the expected utility of each agent can be computed in just time for an arbitrary utility function, and time for Borda utilities. We begin with this last case, and then extend the results to the general case.
Let denote the set of all policies of length for agents. For , we define an operator mapping , by deleting the the first entry. More precisely, for . For example, and .
Lemma 1.
For Borda scoring, agents, items and with , we have:
and these values can be computed in time.
Proof.
Agent 1 picks her first item, giving her a utility of . After that, from her perspective, it’s the standard game on items with policy , so she expects to get an utility of . This proves the first equation. For the other agents, it is more involved. Let be a fixed agent. For , let denote the probability that under policy agent gets the item with utility . Note that this probability does not depend on the utility function but only on the ranking: it is the probability that agent gets the item of rank in her preference order. By the definition of expectation,
(1) 
There are three possible outcomes of the first move of agent 1 with respect to the item that has utility for agent . With probability , agent 1 has picked an item with utility less than (for agent ), with probability , agent 1 has picked an item with utility more than , and with probability it was the item of utility equal to . In the first case there are only items of utility less than left, hence the probability for agent to get the item of utility is . In the second case there are still items of value less than , hence the probability to get the item of utility is . In the third case, the probability to get the item of utility is zero, and together we obtain
(2) 
Substituting this into (1) yields
In the first sum we substitute and this yields
The first term in the first sum and the last term in the second sum are equal to zero, so they can be omitted and we obtain
The time complexity follows immediately from the recursions. ∎
Example 1.
Consider two agents with Borda utilities and the policies and . We compute expected utilities and expected social welfare for each of them using Lemma 1. Table 1 shows results up to two decimal places.
1  2  0  1  1  2  0  1  1 
2  12  2  1.5  3.5  22  0  3  3 
3  212  2.67  4.5  7.17  222  0  6  6 
4  1212  6.67  5.63  12.3  1222  4  7.5  11.5 
5  21212  8  10.63  18.63  11222  9  9  18 
6  121212  14  12.4  26.4  111222  15  10.5  25.5 
Note that expected values computed in all examples in the paper coincide with the results obtained by the bruteforce search algorithm from [?].
Due to the linearity of Borda scoring the probabilities in the proof of Lemma 1 cancel, and this will allow us to solve recursions explicitly and to prove our main result about the optimal policy for Borda scoring in Section 5.
In the general case, we can still compute the expected utilities , and thus , but we need the probabilities from the proof of Lemma 1: is the probability that under policy , agent gets the item ranked at position in her preference order. Computing these probabilities using (2) adds a factor of to the runtime.
Lemma 2.
For agents, items, a policy and an arbitrary scoring function , the expected utility for agent is
and can be computed in time.
Lemma 2 allows us to resolve an open question from [?].
Corollary 1.
For agents and an arbitrary scoring utility function , the expected utility of each agent, as well as the expected utilitarian social welfare can be computed in polynomial time.
For some special policies, the recursions in Lemma 1 can be solved explicitly. A particularly interesting policy is the strictly alternating one (denoted AltPolicy)
Proposition 1.
Let be the strictly alternating policy of length starting with . The expected utilities and utilitarian social welfare for two agents and Borda scoring are
For agents the expectations are, for all ,
Proof.
A proof for the two agents case can be done by straightforward induction using the recursions from Lemma 1. For agents we outline the proof. The full proof is given in the Appendix A of the online version [?]. First, we solve the recursions for the expected utility of agent when the number of items is . Using these values, we approximate the expected utility for the remaining combinations of and . ∎
Example 2.
For a fixed number of agents, we write the number of items as with . We call a policy balanced if for all and . For any balanced policy, the expected utility of any agent lies between that of agents 1 and in the alternating policy. Thus every agent has expected utility .
4 Comparison with Other Mechanisms
We compare with two other allocation mechanisms that provide insight into the efficiency of the alternating policy. The best preference mechanism (BestPref) allocates each item to the agent who prefers it most breaking ties at random. The random mechanism (Random) allocates items by flipping a coin for each individual item.
Proposition 2.
For two agents, and Borda utilities, the expected utilitarian social welfare of BestPref is For agents, it is
Proof.
Due to space limitations we only present a proof for two agents. The proof for agents is again given in the online Appendix B. Let be the set of all permutations of . Then the expected utilitarian social welfare is
We split the inner sum into two sums: one for all permutations with , and one for the remaining permutations. Hence, we get
We compute the value of each inner sum separately. In the first sum each term equals , so we have to determine the number of terms. For there are possible values , and for a fixed value of there are permutations of the remaining values. So the first sum has terms of value , hence it equals . The second sum contains for each exactly terms of value , hence it equals . So the expected utilitarian social welfare is
Proposition 3.
For agents, and Borda utilities, the expected utilitarian social welfare of Random is
Proof.
As the probability of each agent obtaining the th item is , the expected utilitarian social welfare is ∎
Table 2 summarizes the expected utilitarian social welfares for these mechanisms.
AltPolicy  BestPref  Random 
Clearly, BestPref is an upper bound on the expected utilitarian social welfare for any allocation mechanism. As in [?], we define asymptotic optimality of a sequence of policies where is a policy for items by
As can be seen from the table, AltPolicy is an asymptotically optimal policy. By the observation in the end of the previous section the same is true for any balanced policy, and this implies Proposition 5 in [?]. However, the proof in [?] is incorrect as it implies that the expected utility is for every agent which contradicts our upper bound for BestPref. See Appendix C for a detailed discussion of the gaps in the proof. Of course, for any given and preference orderings, AltPolicy may not give the maximal utilitarian social welfare possible.
Example 3.
Consider two agents and six items with the following preferences: and . The AltPolicy policy gives items and to agents 1 and 2 respectively. Hence, the total welfare is . Consider a policy which gives the following items to agents: and . The total welfare is now .
The Random mechanism gives the worst expected utilitarian social welfare among the three mechanisms. Moreover, as increases the expected utilitarian social welfare produced by Random declines compared with the other two mechanisms: .
With two agents, the expected loss using AltPolicy compared to BestPref (which requires full revelation of the preference orders) is less than . In particular, with high probability AltPolicy yields an utilitarian social welfare very close to the upper bound.
Proposition 4.
For two agents and any , with probability at least , AltPolicy is a approximation of the optimal expected utilitarian social welfare.
Proof.
Let the random variables and denote the utilitarian social welfare for AltPolicy and BestPref. Then and for the expectations we have and .
So , and by Markov’s inequality
Writing it multiplicatively, with probability at least ,
A similar result holds for more than two agents.
Proposition 5.
For agents, there exists a constant such that for every with probability at least AltPolicy is a approximation of the optimal expected utilitarian social welfare.
5 Optimality of the Alternating Policy
We now consider the problem of finding the policy that maximizes the expected utilitarian social welfare for Borda utilities. Bouveret and Lang ? stated that this is an open question, and conjectured that this problem is NPhard. We close this problem, by proving that AltPolicy is in fact the optimal policy for any given with two agents.
Theorem 1.
The expected utilitarian social welfare is maximized by the alternating policy for two agents supposing Borda utilities and the full independence assumption.
Note that by linearity of expectation this implies optimality of the alternating policy for every linear scoring function with , . In particular, the result also holds for quasiindifferent scoring where for large .
In the following let always be the alternating policy of length . We also recall that due to symmetry we can only consider policies that starts with , e.g. policy is equivalent to . To prove Theorem 1 we need to prove that for any policy of length the expected utilitarian social welfare is smaller or equal to the expected utilitarian social welfare of . That is, . We proceed in two steps. First, we describe recursively, by representing the policy in terms of its deviations from . Second, given the recursive description of , we prove by induction that this difference is never positive (Proposition 6). The proof is not trivial as the natural inductive approach to derive from does not go through. Hence, we will prove a stronger result in Theorem 2 that implies Proposition 6 and Theorem 1.
Recursive definition.
To obtain a recursive definition of , we observe that any policy can be written in terms of its deviations from AltPolicy policy . We explain this idea using the following example. Consider a policy . There are two ways to extend with a prefix to obtain policies of length 5: and which is equivalent to . We say that follows AltPolicy in extending as its prefix is which coincides with the alternation step. We say that deviates from AltPolicy in extending as its prefix is which does not correspond to the alternation step.
Next we define a notion of the policy tree, which is a balanced binary tree, that represents all possible policies in terms of deviations from . The main purpose of this notion is to explain intuitions behind our derivations and proofs. We start with the policy , which is the root of the tree. We expand a policy to the left by a prefix of length one. We can follow the strictly alternation policy by expanding (1) with prefix . This gives policy which is equivalent to due to symmetry. Alternatively, we can deviate from AltPolicy by expanding (1) with prefix . This gives policy . This way we obtain all policies of length . We can continue expanding the tree from and following the same procedure and keeping in mind that we break symmetries by remembering only polices that start with . The following example show all polices of length at most . By convention, given a policy in a node of the tree we say that we follow AltPolicy on the left branch and deviate from AltPolicy on the right branch.
Example 4.
Figure 1 shows a tree which represents all policies of length at most . A number below each policy shows the value of the expected utilitarian social welfare for this policy. As can be seen from the tree, AltPolicy is the optimum policy for all . Consider, for example, . We can obtain this by deviations from (shown as the dashed path): .
Next we give a formal recursive definition of . We recall that from Lemma 1 the recursions for AltPolicy
For any , we obtain a similar recursion that depends on whether follows or deviates from in extension of at each step. In the first case, the prefix of is and in the second case the prefix is . So we have
Then
We introduce notations to simplify the explanation. Using Proposition 1 we define . We define the sets
Note that for an element corresponding to a policy we have . Hence, has a higher expected utilitarian social welfare than if and only if .
The recursions above provide a description of the sets . We have because is the only policy of length 1, and for the set consists of the elements and where runs over . Theorem 1 is equivalent to the following statement.
Proposition 6.
Let and
for where , . Then for all .
Figure 2 shows the sets , in the policy tree.
Proving optimality.
We might try to prove Proposition 6 inductively by deriving for the point corresponding to policy from for corresponding to policy . Unfortunately, the induction hypothesis is too weak as the following example shows.
Example 5.
Assume corresponding to some policy . Let be obtained from by deviating from . With we obtain . Thus satisfies Proposition 6 while violates it.
To remedy this problem we would like to strengthen the proposition, for example by proving for all where is some function with for all . The difficulty of finding such a function is indicated by Figure 3 showing the set . Different markers distinguish the points arising from by following from those deviating from .
The key idea of our proof is to strengthen Proposition 6 in another direction. We describe this strengthening first and then outline the induction argument. The technical details of the proof are presented in the online Appendix D. Consider a policy that is represented by a node at level in the policy tree. Instead of requiring the inequality only for the point that corresponds to policy , we also require it for (i) all policies that lay on the path that follow only the right branches from and (ii) all polices that lay on the path that starts from by following the left branch once and then only follow the right branches. To formalize this idea, for we define functions by and . Note that for all , as encodes the case when we follow the left branch and – the right branch. Figure 2 illustrates this correspondence. We also consider iterated compositions of these functions. For every let denote the identity on , i.e. , and for let denote the function
Applying to the point corresponding to gives the point which corresponds to the policy that is obtained from by following the right branch times. For all and , we define the function . corresponds to starting in level , following the first left branch and then right branches. For , for and for . Proposition 6 is a consequence of the following theorem.
Theorem 2.
Let and
for where , . Then for every and every the following statements are true.

For all , if then .

For all , if then .
Proof sketch.
We provide the full proof in the Appendix D.2. We give a highlevel overview. We start with a few technical lemmas to derive an explicit description of functions and . This gives us explicit expressions for the sums and in terms of and . Then we proceed through the induction proof. We summarize the induction step here. Suppose the statements of the theorem are already proved for all sets with . Let be an arbitrary element of . Suppose for some (the case is similar). Figure 4 shows and that is obtained from by following the left branch. By the induction hypothesis, whenever . The corresponding nodes are highlighted in gray in Figure 4. To complete the induction step we need to show for . The corresponding nodes are indicated by dashed circles. The result for (gray and dashed) follows immediately as . For we first express in terms of and . Then, by induction for . Inverting the representation of in terms of and we derive a bound , depending on , and this stronger bound is used to prove for . ∎
The extension of Theorem 2 to agents is not straightforward. Firstly, it requires deriving exact recursions for the expected utility for an arbitrary . This is not trivial, as Proposition 1 only provides asymptotics. An easier extension might be to other utility functions. The alternating policy is not optimal for all scoring functions. For example, it is not optimal for the approval scoring function which has for and 0 otherwise. However, we conjecture that AltPolicy is optimal for all convex scoring functions (which includes lexicographical scoring).
6 Strategic Behaviour
So far, we have supposed agents sincerely pick the most valuable item left. However, agents can sometimes improve their utility by picking less valuable items. To understand such strategic behaviour, we view this as a finite repeated game with perfect information. [?] proves that we can compute the subgame perfect Nash equilibrium for the alternating policy with two agents by simply reversing the policy and the preferences and playing the game backwards. More recently, [?] prove this holds for any policy with two agents.
We will exploit such reversal symmetry. We say that a policy is reversal symmetric if and only the reversal of , after interchanging the agents if necessary, equals . The policies and are reversal symmetric, but is not. The next result follows quickly by expanding and rearranging expressions for the expected utilitarian social welfare using the fact that we can compute strategic play by simply reversing the policy and profile and supposing truthful behaviour.
Theorem 3.
For two agents and any utility function, any reversal symmetric policy that maximizes the expected utilitarian social welfare for truthful behaviour also maximizes the expected utilitarian social welfare for strategic behaviour.
As the alternating policy is reversal symmetric, it follows that the alternating policy is also optimal for strategic behaviour. Unfortunately, the generalisation of these results to more than two agents is complex. Indeed, for an unbounded number of agents, computing the subgame perfect Nash equilibrium becomes PSPACEhard [?].
7 Conclusions
Supposing additive utilities, and full independence between agents, we have shown that we can compute the expected utility of a sequential allocation procedure in polynomial time for any utility function. Using this result, we have proven that the expected utilitarian social welfare for Borda utilities is maximized by the alternating policy in which two agents pick items in a fixed order. We have argued that this mechanism remains optimal when agents behave strategically. There remain open several important questions. For example, is the alternating policy optimal for more than two agents? What happens with nonadditive utilities?
References
 [Bouveret and Lang, 2011] S. Bouveret and J. Lang. A general elicitationfree protocol for allocating indivisible goods. In T. Walsh, editor, Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pages 73–78, 2011.
 [Brams and Fishburn, 2000] S.J. Brams and P.C. Fishburn. Fair division of indivisible items between two people with identical preferences: Envyfreeness, paretooptimality, and equity. Social Choice and Welfare, 17(2):247–267, 2000.
 [Brams and Kaplan, 2004] S.J. Brams and T.R. Kaplan. Dividing the indivisible. Journal of Theoretical Politics, 16(2):143–173, 2004.
 [Brams et al., 2003] S.J. Brams, P.H. Edelman, and P.C. Fishburn. Fair division of indivisible items. Theory and Decision, 55(2):147–180, 2003.
 [Brams et al., 2012] S. Brams, D. Kilgour, and C. Klamler. The undercut procedure: an algorithm for the envyfree division of indivisible items. Social Choice and Welfare, 39(23):615–631, 2012.
 [Herreiner and Puppe, 2002] D.K. Herreiner and C. Puppe. A simple procedure for finding equitable allocations of indivisible go ods. Social Choice and Welfare, 19(2):415–430, 2002.
 [Kalinowski et al., 2012] T. Kalinowski, N. Narodytska, T. Walsh, and L. Xia. Elicitationfree protocols for allocating indivisible goods. Fourth International Workshop on Computational Social Choice, 2012.
 [Kalinowski et al., 2013a] T. Kalinowski, N. Narodytska, and T. Walsh. A social welfare optimal sequential allocation procedure. Technical report, CoRR archive, available at http://arxiv.org/abs/1304.5892, 2013.
 [Kalinowski et al., 2013b] T. Kalinowski, N. Narodytska, T. Walsh, and L. Xia. Strategic behavior when allocating indivisible goods sequentially. In TwentySeventh AAAI Conference on Artificial Intelligence (AAAI13), 2013.
 [Kohler and Chandrasekaran, 1971] D.A. Kohler and R. Chandrasekaran. A class of sequential games. Operations Research, 19(2):270–277, 1971.
Acknowledgments
The authors are supported by the Australian Governments Department of Broadband, Communications and the Digital Economy, the Australian Research Council and the Asian Office of Aerospace Research and Development through grant AOARD124056.
Appendix
Appendix A Proof of Proposition 1 for agents
We determine the asymptotic behaviour of the expected utilities for the strictly alternating policy of length
As the policy is now determined by the number of items we simplify notation by letting be the expected utility of agent for the allocation of items. Then , and
for . Decoupling these recursions we get for the first agent
and this allows us to write down the expected utility exactly for one residue class modulo per agent.
Proposition 7.
For , if then the expected utility of agent equals
Proof.
We start with and prove by induction that for all . The induction starts at with . For with , we have by induction
Now we proceed by induction on . For our recursion gives
For the remaining residue classes mod we provide asymptotic statements.
Proposition 8.
For fixed the expected utility of agent equals
Proof.
For this follows from Proposition 7. Otherwise let be the unique element of such that . The following estimates prove the claim. From
it follows that
This gives
Corollary 2.
The expected utilitarian social welfare for the alternating policy is .
Appendix B Proof of Proposition 2 for agents
Proposition 9.
Let be the BestPref policy with agents using Borda utility functions. The expected utilitarian social welfare is
Proof.
The expected utilitarian social welfare for this procedure is the expected value of the random variable