The Inverse Shapley Value Problem111An extended abstract of this work appeared in the Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP 2012).
The Inverse Shapley Value Problem222An extended abstract of this work appeared in the Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP 2012).
For a weighted voting scheme used by voters to choose between two candidates, the Shapley-Shubik Indices (or Shapley values) of provide a measure of how much control each voter can exert over the overall outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley and Martin Shubik in 1954 [SS54] and are widely studied in social choice theory as a measure of the “influence” of voters. The Inverse Shapley Value Problem is the problem of designing a weighted voting scheme which (approximately) achieves a desired input vector of values for the Shapley-Shubik indices. Despite much interest in this problem no provably correct and efficient algorithm was known prior to our work.
We give the first efficient algorithm with provable performance guarantees for the Inverse Shapley Value Problem. For any constant our algorithm runs in fixed poly time (the degree of the polynomial is independent of ) and has the following performance guarantee: given as input a vector of desired Shapley values, if any “reasonable” weighted voting scheme (roughly, one in which the threshold is not too skewed) approximately matches the desired vector of values to within some small error, then our algorithm explicitly outputs a weighted voting scheme that achieves this vector of Shapley values to within error If there is a “reasonable” voting scheme in which all voting weights are integers at most that approximately achieves the desired Shapley values, then our algorithm runs in time and outputs a weighted voting scheme that achieves the target vector of Shapley values to within error
In this paper we consider the common scenario in which each of voters must cast a binary vote for or against some proposal. What is the best way to design such a voting scheme? Throughout the paper we consider only weighted voting schemes, in which the proposal passes if a weighted sum of yes-votes exceeds a predetermined threshold. Weighted voting schemes are predominant in voting theory and have been extensively studied for many years, see [EGGW07, ZFBE08] and references therein. In computer science language, we are dealing with linear threshold functions (henceforth abbreviated as LTFs) over Boolean variables.
If it is desired that each of the voters should have the same “amount of power” over the outcome, then a simple majority vote is the obvious solution. However, in many scenarios it may be the case that we would like to assign different levels of voting power to the voters – perhaps they are shareholders who own different amounts of stock in a corporation, or representatives of differently sized populations. In such a setting it is much less obvious how to design the right voting scheme; indeed, it is far from obvious how to correctly quantify the notion of the “amount of power” that a voter has under a given fixed voting scheme. As a simple example, consider an election with three voters who have voting weights 49, 49 and 2, in which a total of 51 votes are required for the proposition to pass. While the disparity between voting weights may at first suggest that the two voters with 49 votes each have most of the “power,” any coalition of two voters is sufficient to pass the proposition and any single voter is insufficient, so the voting power of all three voters is in fact equal.
Many different power indices (methods of measuring the voting power of individuals under a given voting scheme) have been proposed over the course of decades. These include the Banzhaf index [Ban65], the Deegan-Packel index [DP78], the Holler index [Hol82], and others (see the extensive survey of de Keijzer [dK08]). Perhaps the best known, and certainly the oldest, of these indices is the Shapley-Shubik index [SS54], which is also known as the index of Shapley values (we shall henceforth refer to it as such). Informally, the Shapley value of a voter among the voters is the fraction of all orderings of the voters in which she “casts the pivotal vote” (see Definition 1 in Section 2 for a precise definition, and [Rot88] for much more on Shapley values). We shall work with the Shapley values throughout this paper.
Given a particular weighted voting scheme (i.e., an -variable linear threshold function), standard sampling-based approaches can be used to efficiently obtain highly accurate estimates of the Shapley values (see also the works of [Lee03, BMR10]). However, the inverse problem is much more challenging: given a vector of desired values for the Shapley values, how can one design a weighted voting scheme that (approximately) achieves these Shapley values? This problem, which we refer to as the Inverse Shapley Value Problem, is quite natural and has received considerable attention; various heuristics and exponential-time algorithms have been proposed [APL07, FWJ08, dKKZ10, Kur11], but prior to our work no provably correct and efficient algorithms were known.
Our Results. We give the first efficient algorithm with provable performance guarantees for the Inverse Shapley Value Problem. Our results apply to “reasonable” voting schemes; roughly, we say that a weighted voting scheme is “reasonable” if fixing a tiny fraction of the voting weight does not already determine the outcome, i.e., if the threshold of the linear threshold function is not too extreme. (See Definition 2 in Section 2 for a precise definition.) This seems to be a plausible property for natural voting schemes. Roughly speaking, we show that if there is any reasonable weighted voting scheme that approximately achieves the desired input vector of Shapley values, then our algorithm finds such a weighted voting scheme. Our algorithm runs in fixed polynomial time in , the number of voters, for any constant error parameter . In a bit more detail, our first main theorem, stated informally, is as follows (see Section 6 for Theorem 26 which gives a precise theorem statement):
Main Theorem (arbitrary weights, informal statement). There is a poly-time algorithm with the following properties: The algorithm is given any constant accuracy parameter and any vector of real values . The algorithm has the following performance guarantee: if there is any monotone increasing reasonable LTF whose Shapley values are very close to the given values , then with very high probability the algorithm outputs such that the linear threshold function has Shapley values -close to those of .
We emphasize that the exponent of the running time is a fixed constant that is independent of .
Our second main theorem gives an even stronger guarantee if there is a weighted voting scheme with small weights (at most ) whose Shapley values are close to the desired values. For this problem we give an algorithm which achieves accuracy in time. An informal statement of this result is (see Section 6 for Theorem 27 which gives a precise theorem statement):
Main Theorem (bounded weights, informal statement). There is a poly-time algorithm with the following properties: The algorithm is given a weight bound and any vector of real values . The algorithm has the following performance guarantee: if there is any monotone increasing reasonable LTF whose Shapley values are very close to the given values and where each is an integer of magnitude at most , then with very high probability the algorithm outputs such that the linear threshold function has Shapley values -close to those of .
Discussion and Our Approach. At a high level, the Inverse Shapley Value Problem that we consider is similar to the “Chow Parameters Problem” that has been the subject of several recent papers [Gol06, OS08, DDFS12]. The Chow parameters are another name for the Banzhaf indices; the Chow Parameters Problem is to output a linear threshold function which approximately matches a given input vector of Chow parameters. (To align with the terminology of the current paper, the “Chow Parameters Problem” might perhaps better be described as the “Inverse Banzhaf Problem.”)
Let us briefly describe the approaches in [OS08] and [DDFS12] at a high level for the purpose of establishing a clear comparison with this paper. Each of the papers [OS08, DDFS12] combines structural results on linear threshold functions with an algorithmic component. The structural results in [OS08] deal with anti-concentration of affine forms where is uniformly distributed over the Boolean hypercube, while the algorithmic ingredient of [OS08] is a rather straightforward brute-force search. In contrast, the key structural results of [DDFS12] are geometric statements about how -dimensional hyperplanes interact with the Boolean hypercube, which are combined with linear-algebraic (rather than anti-concentration) arguments. The algorithmic ingredient of [DDFS12] is more sophisticated, employing a boosting-based approach inspired by the work of [TTV08, Imp95].
Our approach combines aspects of both the [OS08] and [DDFS12] approaches. Very roughly speaking, we establish new structural results which show that linear threshold functions have good anti-concentration (similar to [OS08]), and use a boosting-based approach derived from [TTV08] as the algorithmic component (similar to [DDFS12]). However, this high-level description glosses over many “Shapley-specific” issues and complications that do not arise in these earlier works; below we describe two of the main challenges that arise, and sketch how we meet them in this paper.
First challenge: establishing anti-concentration with respect to non-standard distributions. The Chow parameters (i.e., Banzhaf indices) have a natural definition in terms of the uniform distribution over the Boolean hypercube . Being able to use the uniform distribution with its many nice properties (such as complete independence among all coordinates) is very useful in proving the required anti-concentration results that are at the heart of [OS08]. In contrast, it is not a priori clear what is (or even whether there exists) the “right” distribution over corresponding to the Shapley values. In this paper we derive such a distribution over , but it is much less well-behaved than the uniform distribution (it is supported on a proper subset of , and it is not even pairwise independent). Nevertheless, we are able to establish anti-concentration results for affine forms corresponding to linear threshold functions under the distribution as required for our results. This is done by showing that any reasonable linear threshold function can be expressed with “nice” weights (see Theorem 3 of Section 2), and establishing anti-concentration for any “nice” weight vector by carefully combining anti-concentration bounds for -biased distributions across a continuous family of different choices of (see Section 4 for details).
Second challenge: using anti-concentration to solve the Inverse Shapley problem. The main algorithmic ingredient that we use is a procedure from [TTV08]. Given a vector of values (correlations between the unknown linear threshold function and the individual input variables), it efficiently constructs a bounded function which closely matches these correlations, i.e., for all . Such a procedure is very useful for the Chow parameters problem, because the Chow parameters correspond precisely to the values – i.e., the degree- Fourier coefficients of – with respect to the uniform distribution. (This correspondence is at the heart of Chow’s original proof [Cho61] showing that the exact values of the Chow parameters suffice to information-theoretically specify any linear threshold function; anti-concentration is used in [OS08] to extend Chow’s original arguments about degree-1 Fourier coefficients to the setting of approximate reconstruction.)
For the inverse Shapley problem, there is no obvious correspondence between the correlations of individual input variables and the Shapley values. Moreover, without a notion of “degree- Fourier coefficients” for the Shapley setting, it is not clear why anti-concentration statements with respect to should be useful for approximate reconstruction. We deal with both these issues by developing a notion of the degree- Fourier coefficients of with respect to distribution and relating these coefficients to the Shapley values 333We note that Owen [Owe72] has given a characterization of the Shapley values as a weighted average of -biased influences (see also [KS06]). However, this is not as useful for us as our characterization in terms of “-distribution” Fourier coefficients, because we need to ultimately relate the Shapley values to anti-concentration with respect to .. (We actually require two related notions: one is the “coordinate correlation coefficient” , which is necessary for the algorithmic [TTV08] ingredient, and one is the “Fourier coefficient” , which is necessary for Lemma 15, see below.) We define both notions and establish the necessary relations between them in Section 3.
Armed with the notion of the degree- Fourier coefficients under distribution , we prove a key result (Lemma 15) saying that if the LTF is anti-concentrated under distribution , then any bounded function which closely matches the degree- Fourier coefficients of must be close to in distance with respect to . (This is why anti-concentration with respect to is useful for us.) From this point, exploiting properties of the [TTV08] algorithm, we can pass from to an LTF whose Shapley values closely match those of .
Organization. Useful preliminaries are given in Section 2, including the crucial fact (Theorem 3) that all “reasonable” linear threshold functions have weight representations with “nice” weights. In Section 3 we define the distribution and the notions of Fourier coefficients and “coordinate correlation coefficients,” and the relations between them, that we will need. At the end of that section we prove a crucial lemma, Lemma 15, which says that anti-concentration of affine forms and closeness in Fourier coefficients together suffice to establish closeness in distance. Section 4 proves that “nice” affine forms have the required anti-concentration, and Section 5 describes the algorithmic tool from [TTV08] that lets us establish closeness of coordinate correlation coefficients. Section 6 puts the pieces together to prove our main theorems. Finally, in Section 7 we conclude the paper and present a few open problems.
Notation and terminology. For , we denote by . For , , we denote
Given a vector we write to denote . A linear threshold function, or LTF, is a function which is such that for some
Our arguments will also use a variant of linear threshold functions which we call linear bounded functions (LBFs). The projection function is defined by for and otherwise. An LBF is a function
Shapley values. Here and throughout the paper we write to denote the symmetric group of all permutations over Given a permutation and an index , we write to denote the string in that has a 1 in coordinate if and only if , and we write to denote the string obtained from by flipping coordinate from to With this notation in place we can define the generalized Shapley indices of a Boolean function as follows:
(Generalized Shapley values) Given the -th generalized Shapley value of is the value
(where “” means that is selected uniformly at random from ).
A function is said to be monotone increasing if for all , whenever two input strings differ precisely in coordinate and have , , it is the case that It is easy to check that for monotone functions our definition of generalized Shapley values agrees with the usual notion of Shapley values (which are typically defined only for monotone functions) up to a multiplicative factor of 2; in the rest of the paper we omit “generalized” and refer to these values simply as the Shapley values of
We will use the following notion of the “distance” between the vectors of Shapley values for two functions :
i.e., the Shapley distance is simply the Euclidean distance between the two -dimensional vectors of Shapley values. Given a vector we will also use to denote
The linear threshold functions that we consider. Our algorithmic results hold for linear threshold functions which are not too “extreme” (in the sense of having a very skewed threshold). We will use the following definition:
(-reasonable LTF) Let , be an LTF. For we say that is -reasonable if
All our results will deal with -reasonable LTFs; throughout the paper should be thought of as a small fixed absolute constant (such as ). LTFs that are not -reasonable do not seem to correspond to very interesting voting schemes since typically they will be very close to constant functions. (For example, even at , if the LTF has a threshold which makes it not an -reasonable LTF, then agrees with the constant function on all but a fraction of inputs in )
Turning from the threshold to the weights, some of the proofs in our paper will require us to work with LTFs that have “nice” weights in a certain technical sense. Prior work [Ser07, OS11] has shown that for any LTF, there is a weight vector realizing that LTF that has essentially the properties we need; however, since the exact technical condition that we require is not guaranteed by any of the previous works, we give a full proof that any LTF has a representation of the desired form. The following theorem is proved in Appendix A:
Let be an -reasonable LTF and . There exists a representation of as such that (after reordering coordinates so that condition (i) below holds) we have: (i) , ; (ii) ; and (iii) for all we have , where .
Tools from probability. We will use the following standard tail bound:
(Chernoff Bounds) Let be a random variable taking values in and let be i.i.d. samples drawn from . Let . Then for any , we have
We will also use the Littlewood-Offord inequality for -biased distributions over One way to prove this is by using the LYM inequality (which can be found e.g. as Theorem 8.6 of [Juk01]); for an explicit reference and proof of the following statement see e.g. [AGKW09].
Fix and let denote the -biased distribution over (under which each coordinate is set to independently with probability ) Fix and define . If , then for all we have
Basic Facts about function spaces. We will use the following basic facts:
The functions are linearly independent and form a basis for the subspace .
Fix any and let be a probability distribution over such that for all We define for . Suppose that is an orthonormal set of functions, i.e., for all Then we have As a corollary, if then we have
3 Analytic Reformulation of Shapley values
The definition of Shapley values given in Definition 1 is somewhat cumbersome to work with. In this section we derive alternate characterizations of Shapley values in terms of “Fourier coefficients” and “coordinate correlation coefficients” and establish various technical results relating Shapley values and these coefficients; these technical results will be crucially used in the proof of our main theorems.
There is a particular distribution that plays a central role in our reformulations. We start by defining this distribution and introducing some relevant notation, and then give our results.
The distribution . Let us define ; clearly we have , and more precisely we have We also define as for , so we have
For we write to denote the number of ’s in . We define the set to be , i.e., .
The distribution is supported on and is defined as follows: to make a draw from , sample with probability . Choose uniformly at random from the -th “weight level” of , i.e., from
Useful notation. For we define the “coordinate correlation coefficients” of a function (with respect to ) as:
(here and throughout the paper denotes the constant 1).
Later in this section we will define an orthonormal set of linear functions . We define the “Fourier coefficients” of (with respect to ) as:
An alternative expression for the Shapley values. We start by expressing the Shapley values in terms of the coordinate correlation coefficients:
Given , for each we have
Recall that can be expressed as follows:
Since the -th coordinate of is and the -th coordinate of is , we see that is a weighted sum of . We now compute the weights associated with any such .
Let be a string that has coordinates that are 1 and has Then the total number of permutations such that is . Consequently the weight associated with for such an is .
Now let be a string that has coordinates that are 1 and has Then the total number of permutations such that is . Consequently the weight associated with for such an is .
Thus we may rewrite Equation (4) as
Let us now define . Using the fact that , it is easy to see that one gets
We next observe that . Next, let us define (for ) as follows :
So we may rewrite Equation (5) in terms of as
and consequently we get
finishing the proof. ∎
Construction of a Fourier basis for distribution . For all we have that , and consequently by Fact 6 we know that the functions form a basis for the subspace of linear functions from . By Gram-Schmidt orthogonalization, we can obtain an orthonormal basis for this subspace, i.e., a set of linear functions such that and for all
We now give explicit expressions for these basis functions. We start by defining as . Next, by symmetry, we can express each as
Using the orthonormality properties it is straightforward to solve for and . The following Lemma gives the values of and :
For the choices
the set is an orthonormal set of linear functions under the distribution .
We note for later reference that and
We start with the following proposition which gives an explicit expression for when ; we will use it in the proof of Lemma 9.
For all we have .
For brevity let us write , i.e., , the -th “slice” of the hypercube. Since is supported on , we have
If or , it is clear that
and when , we have
Recall that and for This means that we have
Thus we may write as
For the latter sum, we have
For the former, we can write
Thus, we get that overall equals
as was to be shown. ∎
Proof of Lemma 9.
We begin by observing that
since . Next, we solve for and using the orthonormality conditions on the set . As and , we get that . This gives
where the penultimate equation above uses Proposition 10. Thus, we have shown that . To solve for , we note that
However, since the set is orthonormal with respect to the distribution , we get that
Now, using Proposition 10, we get
Thus, we get that
as was to be shown. ∎
Relating the Shapley values to the Fourier coefficients. The next lemma gives a useful expression for in terms of :
Let be any bounded function. Then for each we have