New Approximations for Coalitional Manipulation in General Scoring Rules

New Approximations for Coalitional Manipulation in General Scoring Rules

Orgad Keller
orgad.keller@gmail.com
Department of Computer Science, Bar-Ilan University, Israel
   Avinatan Hassidim
avinatan@cs.biu.ac.il
Department of Computer Science, Bar-Ilan University, Israel
   Noam Hazon
noamh@ariel.ac.il
Department of Computer Science, Ariel University, Israel
Abstract

We study the problem of coalitional manipulation—where manipulators try to manipulate an election on candidates—under general scoring rules, with a focus on the Borda protocol. We do so both in the weighted and unweighted settings.

For these problems, recent approaches to approximation tried to minimize , the number of manipulators needed to make the preferred candidate win (thus assuming that the number of manipulators is not limited in advance), we focus instead on minimizing the maximum score obtainable by a non-preferred candidate.

In the strongest, most general setting, we provide an algorithm for any scoring rule as described by a vector : for some , it obtains an additive approximation equal to , where is the sum of voter weights. In words, this factor is the maximum difference between two scores in that are entries away, multiplied by . The unweighted equivalent is provided as well.

For Borda, both the weighted and unweighted variants are known to be -hard. For the unweighted case, our simpler algorithm provides a randomized, additive approximation; in other words, if there exists a strategy enabling the preferred candidate to win by an margin, our method, with high probability, will find a strategy enabling her to win (albeit with a possibly smaller margin). It thus provides a somewhat stronger guarantee compared to the previous methods, which implicitly implied (with respect to the original ) a strategy that provides an -additive approximation to the maximum score of a non-preferred candidate: when is , our strategy thus provides a stronger approximation.

For the weighted case, our generalized algorithm provides an -additive approximation, where is the sum of voter weights. This is a clear advantage over previous methods: some of them do not generalize to the weighted case, while others—which approximate the number of manipulators—pose restrictions on the weights of extra manipulators added.

We note that our algorithms for Borda can also be viewed as a -multiplicative approximation since the values we approximate have natural (unweighted) and (weighted) lower bounds.

Our methods are novel and adapt techniques from multiprocessor scheduling by carefully rounding an exponentially-large configuration linear program that is solved by using the ellipsoid method with an efficient separation oracle. We believe that such methods could be beneficial in social choice settings as well.

1 Introduction

Elections are one of the pillars of democratic societies, and are an important part of social choice theory. In addition they have played a major role in multiagent systems, where a group of intelligent agents would like to reach a joint decision [9]. In its essence, an election consists of agents (also called voters) who need to decide on a winning candidate among candidates. In order to do so, each voter reveals a ranking of the candidates according to his preference and the winner is then decided according to some protocol.

Ideally in voting, we would like the voters to be truthful, that is, that their reported ranking of the candidates will be their true one. However, almost all voting rules are prone to manipulation: Gibbard and Satterthwaite [11, 22] show that for any reasonable preference-based voting system with at least candidates, voters might benefit from reporting a ranking different than their true one in order to make sure that the candidate they prefer the most wins. Furthermore, several voters might decide to collude, to form a coalition and then to coordinate their votes in such a way that a specific candidate (hereafter the preferred candidate) will prevail. Such a setting is reasonable especially when the voters are agents that are operated by one party of interest.

For some time, the hope for making voting protocols immune to manipulations at least in practice relied on computational assumptions: for several common voting protocols, it was shown that computing a successful voting strategy for the manipulators is -hard [3, 6, 25, 10]. However, as it is many times the case, approximation algorithms and heuristics were devised in order to overcome the -hardness albeit with some compromises on the quality of the resulting strategy. This paper fits within this scheme.

In this paper we focus on general scoring rules and in particular on the Borda voting rule. We first study the problem of (constructive) unweighted coalitional manipulation (UCM)111The problem was called CCUM, for “constructive coalitional unweighted manipulation”, in [26].: assume that additional voters (hereafter the manipulators), all of them preferring a specific candidate , can be added to the voting system, thus forming a coalition. Also assume that all original voters (hereafter the non-manipulators) voted first (or equivalently, that the non-manipulators are truthful and that their preferences are known). Find a strategy for the manipulators telling each one of them how to vote so that wins, if such strategy exists. We call such a strategy a -winning strategy. In the weighted variant (WCM), the manipulators are weighted; essentially this means that points awarded by a voter to a candidate are multiplied by the voter’s weight.

-WCM, for all positional scoring rules , except plurality-like rules, was shown to be -hard when  [6, 13, 21]. Therefore, Borda-WCM is -hard. Borda-UCM eluded researchers for some time; it was first conjectured and finally proven to be -hard [7, 4].

As a way of overcoming the hardness, recent research [26] focused on an approximation to the minimum number of manipulators needed to be added to the system in order to guarantee that the preferred candidate would win. For Borda-UCM, they showed that if there exists a -winning strategy for manipulators, then they will find a -winning strategy with at most one additional manipulator – besides the given by the problem definition. For Borda-WCM, they showed that if there exists a -winning strategy using the given weighted manipulators, they will find a -winning strategy using additional manipulators, if the sum of weights of the additional manipulators equals the maximum over the weights of the original manipulators.

This kind of approximation might seem a bit problematic: first, the ability to add a manipulator is a strong operation, perhaps too strong; for instance, for Borda-UCM, adding a manipulator adds to the difference between and its highest-scoring competitor. Second, while in some cases it might be reasonable that the party behind the manipulators can add another manipulator to the system, in many cases we do not expect this to be true. Furthermore, in the weighted case assumptions are needed to be made on the weight of the additional manipulators – also a problematic aspect. Instead, it is interesting to ask what can we assert—assuming that the number of manipulators cannot be changed—on the ability to promote a specific candidate given the non-manipulator scores of all candidates and the value (or equivalently, the length- vector of manipulator weights).

We provide a positive result of the following type:

Main Result: If there exists a manipulation strategy enabling to win by a large-enough margin, we efficiently find a successful manipulation strategy making win.

Take the unweighted case as an example: assume that we can provide, for some function , an -additive approximation to the maximum difference obtainable between ’s final score and the final score of the highest-scoring non-preferred candidate. Then, if there exists a -winning strategy such that this difference is at least , we can be rest-assured that the algorithm will find a -winning strategy.

This, in turn, boils down to approximating an upper-bound to the score of the highest-ranked candidate who is not . Earlier research of this flavor focused only on cases where the number of candidates is bounded: for -WCM, Brelsford et al. [5] provide an FPTAS to that upper bound (to be exact, they provide an FPTAS to the same exact value we defined, and then use it to provide another FPTAS to another value, which is their value-of-interest.222They are interested in the difference between the score of the preferred candidate and the highest-scoring non-preferred candidate when including the manipulator votes, minus the same difference when not including the manipulator votes. Notice that the upper-bound we defined is the only non-trivial value in this computation.) For -UCM, if the number of candidates is bounded, the entire problem becomes easy and polynomial-time solvable [6, Proposition 1]. Compared to this line of work, we do not limit ourselves to bounded number of candidates.

1.1 Our Results and Contributions

Consider a general positional scoring rules , as is usually described by a vector (see Section 2.1 for full definitions). Now let be the minimum possible score (ranging over all possible manipulation strategies ) of the highest scoring candidate who is not , where is the final score of a candidate (w.r.t. voting rule , election , and strategy ) and is the candidate set.

Our main technical contribution is a constructive proof to the following two theorems. Let for some constant , and let . In words, is the biggest difference between a score in and another score entries away from it.

Theorem 1.

There exists a randomized Monte Carlo algorithm for -UCM which provides a -additive approximation to with an exponentially-small failure probability.

Theorem 2.

There exists a randomized Monte Carlo algorithm for -WCM which provides a -additive approximation to with an exponentially-small failure probability, where is the sum of voter weights.

These theorems immediately pave the way to the following corollaries:

Corollary 3.

There exists a randomized Monte Carlo algorithm for Borda-UCM which provides an -additive approximation to with an exponentially-small failure probability.

Corollary 4.

There exists a randomized Monte Carlo algorithm for Borda-WCM which provides an -additive approximation to with an exponentially-small failure probability.

Taking Borda-UCM as an example, if there exists a -winning strategy enabling to win by a margin of compared to the score of the highest-scoring non-preferred candidate, our method will find a -winning strategy (albeit with a possibly smaller margin). Similar guarantees apply in the more general settings.

Notice that for Borda, such approximations can also be seen as a -multiplicative approximation on the score of the highest-scoring non-preferred candidate, and thus is superior to an FPTAS; to see that, notice that for Borda-WCM the overall ‘voting mass’ given by the manipulators is , and so the highest scoring candidate has score of at least . Therefore 333The notation suppresses poly-logarithmic factors. is a lower order term. This is an advantage over the previous methods:

  • Opposed to the heuristics in [8], we provide provable guarantees.

  • Also opposed to the heuristics in [8], our algorithm generalizes to the weighted case.

  • Compared to the reverse algorithm of [26] for Borda-UCM, while adding only a single extra manipulator sounds like a minor operation, it is not; as mentioned, an extra manipulator implies the addition of points to the difference between and its highest-scoring competitor.

  • Consider the reverse algorithm of [26], and assume that adding extra manipulators is not allowed. We will show that their method implies no better than an -additive approximation to the score of the highest-scoring non-preferred candidate. Our approximation is thus superior when is .

  • Compared to previous methods, are results are linear-programming-based, and not greedy. Thus, they make a decision based on the entire input, as opposed to repeatedly making a decision based on a greedy estimate w.r.t. some subset of the problem.

The following claim analyzes reverse according to our metric. It is proven in Section 3.

Claim 5.

For any , when the addition of more than extra manipulators is not allowed, there are families of cases in which the optimal strategy enables to win by a margin of at least , but reverse fails to find a -winning strategy.

Our techniques are novel: they employ the use of configuration linear programs (C-LP), a method that is also used in the scheduling literature, namely for two well-studied problems, the problem of scheduling on unrelated machines [24], and the so-called Santa Claus problem [2]. See Section 1.2 for a discussion of these problems. It is important to note that the solutions to the two above problems and to ours all differ from one another with respect to how the algorithm proceeds once the C-LP result is computed.

C-LPs are used for the generation of an initial, invalid strategy, which is later modified to become valid. They are unique in the sense that they are linear programs that have an exponential number of variables, an issue which we solve by referring to the LP dual and using the ellipsoid method with a polynomially-computable separation oracle [17, 12]. We have also implemented our algorithm: as a result of not finding a library which enables solving an LP this way, we simulated this by an iterative use of a general LP-solving library, each time adding a violated constraint based on running the separation oracle externally.

1.2 Related Work

Borda.

The Borda voting mechanism was introduced by Jean-Charles de Borda in 1770. It is used, sometimes with some modifications, by parliaments in countries such as Slovenia, and competitions such as the Eurovision song contest, selecting the MVP in major league baseball, Robocup robot soccer competitions and others. The Borda voting mechanism is described as follows: every agent ranks the candidates from to , and awards the candidate ranked -th a score of . Notice that this makes the scores given by each single voter a permutation of . Finally, the winning candidate is the one with the highest aggregate score.

Easiness Results.

The computational complexity of coalitional manipulation problems was studied extensively. For general scoring rules , most earlier work considered the case where the number of candidates is bounded: Conitzer at al. [6] show that when is bounded, -UCM is solvable in polynomial time.

Even when is unbounded, Plurality-UCM and Veto-UCM are still easy using the greedy algorithm of Zuckerman et al. [26]. This also holds for -approval-UCM, which generalizes both [19].

-Hardness Results.

In the weighted case, the situation is different: for all positional scoring rules , except plurality-like rules, -WCM is -hard when  [6, 13, 21]. In particular, this holds for Borda-WCM.

However, the computational hardness of Borda-UCM still remained open for quite some time, until finally shown to be -hard as well [7, 4] in 2011, even for the case of and adding manipulators.

Approximating the number of manipulators.

Zuckerman et al. [26] present a greedy algorithm later referred to as reverse444This name was given in [7].. reverse works as follows: after the non-manipulators had finished voting, we go over the manipulators one by one, and each manipulator will rank the candidates (besides ) by the reversed order of their aggregated score so far (candidate with the highest score so far gets the lowest ranking). As mentioned, for Borda-UCM, reverse can be seen as an additive approximation for the objective of finding the minimum number of manipulators needed.

For Borda-WCM, their approximation has the following flavor. Let be the set of the given weighted manipulators. If there exists a -winning strategy using , they will find a -winning strategy using additional manipulators, if the sum of weights of the additional manipulators equals , where is the weight of manipulator .

Approximating the Maximum Score of a Non-Preferred Candidate.

Returning to earlier results, when is bounded, Brelsford et al. [5, Lemma 3] provide an FPTAS with respect to the maximum score of a non-preferred candidate. As mentioned, this paves the way for an FPTAS on their value-of-interest.

Heuristics for Borda.

Davies et al. [8] present two additional heuristics: iteratively, assign the largest un-allocated score to the candidate with the largest gap (Largest Fit), or to the candidate with the largest ratio of gap divided by the number of scores yet-to-be-allocated to this candidate (Average Fit). To the best of our knowledge, these algorithms do not have a counterpart for the weighted case.

Configuration Linear Programs.

As discussed, configuration linear programs were also used in scheduling literature, for example for the following two problems which were extensively studied before:

  • In the so-called Santa Claus problem [2], Santa Claus has presents that he wishes to distribute between kids, and is the value that kid has to present . The goal is to maximize the happiness of the least happy kid: , where is the presents allocated to kid .

  • In the problem of scheduling on unrelated machines [24]. We need to assign jobs between machines, and is the time required for machine to execute job . The goal is to minimize the makespan , where is the jobs assigned to machine .

Both papers researched a natural and well-researched ‘restricted assignment’ variant of the two problems where . In [2], they obtained an -multiplicative approximation to the first problem and in [24], they obtained a -multiplicative approximation to the second.

2 Preliminaries

2.1 Problem Definition

Candidate Set.

With a slight change of notation, let be a candidate set consisting of the preferred candidate and the other candidates . Note that we changed the notation so that the overall number of candidates is ; this will help streamline the writing.

Election.

An election is defined by a candidate set and a set of voters where each voter submits a ranking of the candidates according to its preference. Formally, we define , where each is a total order of the candidates. For example, is one such possible order if . Then, some decision rule is applied in order to decide on the winner(s); formally is the set of winners of the elections. In the specific case of a positional scoring rule , the rule is described by a vector for which , and is polynomial in , used as follows: each voter awards to the candidate ranked -th555Usually is defined in a descending manner, such that , and is awarded to the candidate ranked -th. Our choice helps streamline the presentation.. Finally, the winning candidate is the one with the highest aggregated score. In the specific case of Borda scoring rule, we have that .

WCM and UCM.

In the -(constructive) weighted coalitional manipulation (-WCM) problem, we are given as input:

  • A score profile vector representing the aggregated scores given so far to each candidate in by the original voters in an election under the rule . Notice that eliminates the need for obtaining as input: as we have no control on the truthful voters , is thus a sufficient representation for the outcome of non-manipulator votes.

  • A vector of positive integers representing the weights of manipulators who will be added to the election. The weights have the following meaning: each manipulator is replaced by identical but unweighted copies of himself.

It then should be determined if when adding the manipulators then either (a) no strategy under exists in which wins, or that (b) there exists a voting strategy under such that can win. In this case, the algorithm should find it. -(constructive) unweighted coalitional manipulation (-UCM) is the specific case where is the all-ones vector and therefore can be replaced in the input by the integer .

Manipulation Matrices.

Note that in case (b), the output is a voting strategy which can be represented as a matrix in which the entry describes the score given by manipulator to candidate , and where each row of is a permutation of . Such a representation is also called a manipulation matrix. We can relax the requirement that each row of is a permutation, and replace it by the requirement that each score-type , is repeated exactly times in . Such a matrix is called a relaxed manipulation matrix. We can perform this relaxation as Davies et al. [8, Theorem 7] show that each relaxed manipulation matrix can be rearranged to become a valid manipulation matrix while preserving each candidate’s final score.

High Probability.

Throughout the paper, when we use the term ‘with high probability’, we mean an arbitrarily-chosen polynomially-small failure probability, i.e., success probability of the form where is a constant that can be chosen without affecting the asymptotic running time. ‘Failure’ refers to the event that the algorithm does not provide the desired approximation guarantee.

In this paper we will use various forms of the Hoeffding inequalities, which are variants of the Chernoff inequalities:

Generalized Hoeffding inequality [14, Theorem 2].

Let be independent random variables where each is bounded by the interval respectively. Let and . Then

By redefining the above inequality in terms of the sum (instead of the mean ) and defining we derive the following equivalent formulation:

Specifically, when , we obtain the following ‘classic’ Hoeffding inequality, which is equivalent to [14, Theorem 1]:

2.2 Reduction to a Pure Min-Max Problem

Since we know what will be the final score of (non-manipulator votes are known and each manipulator will give the maximum score possible), we can effectively discard and treat the problem as a minimization problem on the final scores of only. In other words, we focus on finding , where is ’s final score. Thus the output is actually a relaxed manipulation matrix.

Another thing to note is that we can not assume anything about the values in the initial score profile ; this follows from [8, Lemma 1], where it is shown that in the context of Borda, for any given non-negative integer vector , we can define a set of non-manipulators (along with their preferences) and an additional candidate that will induce an initial score profile for some values and . Since such an additive translation and the addition of a candidate that will be awarded less than any other candidate have no influence on the winner (and on the difference between each two candidates’ final scores), and since our results are concerned with an additive approximation, we should assume no prior limitation on the nature of values in .

3 Lower Bound for REVERSE

We start by showing that there are cases in which the reverse algorithm for Borda gives only an -additive approximation to minimum final score of the highest-scoring non-preferred candidate.

Proof of Claim 5.

We provide an infinite family of cases where the claim holds.

Let and let for some integer . Consider the case where after the non-manipulators voted, all candidates (but ) have the same score for all . Effectively this can be normalized to .

By the reverse algorithm, the first manipulator can award with respectively, after which the second manipulator will be obliged to award with respectively. Repeat this process with the rest of the manipulators, until the final one. It can be verified that will end up with the maximal score of .

Conversely, as an upper bound for an optimal solution, consider the following strategy: place all scores to be given in a descending sequence, that is the sequence . Give the first scores in the sequence to respectively, the next to respectively, and the last to respectively. Since every score-type has copies, we have just described a relaxed manipulation matrix and therefore by Davies et al. [8, Theorem 7] it can be rearranged to become a valid manipulation matrix without changing the final score of each candidate. Now notice that the score given to any candidate is of the form for some . As this is at most (when ), the difference is thus . ∎

4 Linear Programming for UCM

We will begin by providing a “natural” way to formulate the min-max version of the problem as an Integer Program (IP). As solving IPs is -hard, we will relax it to the equivalent Linear Program (LP). However, such a natural LP will not be useful in our setting, and we will thus introduce a totally different LP formulation, called Configuration Linear Programming (C-LP). The number of variables in the C-LP is exponential in the size of the input. Nevertheless, we show that our C-LP can be solved in polynomial time.

Let and . We define the variables for , and the variable , with the intent that will equal the number of times candidate received a score of , and will serve as the upper-bound on each candidate final aggregate score. The IP can then be stated as follows:

subject to:

(1)
(2)
(3)
(4)

where (1) guarantees that every score was awarded times, (2) guarantees that every candidate was given scores, and (3) guarantees that every candidate gets at most points.

It should be noted that when treating the problem as a min-max problem, we need to take as a variable that we wish to minimize (this is done by the objective function). However, if we consider the original definition in which our aim is to make the preferred candidate win, can be set to (the final score of the preferred candidate), and the IP will not have an objective function.

4.1 Integrality Gap of the Natural LP

While we can relax this IP into an LP by replacing the set in the last constraint to be the continuous interval , it will not be as helpful. We will shortly show why; however note that as this sub-section relates to the deficiencies of the original LP formulation, it can be safely skipped and is not needed for the full understanding of our algorithms.

Consider a “pure” LP rounding algorithm, applied w.l.o.g. to a minimization problem. Such an algorithm works as follows: given an instance of a problem, it solves its associated relaxed LP (the problem’s natural IP where integrality constraints are replaced with their continuous counterparts) and then rounds the resulting solution in some way or the other such that a valid (non-necessarily optimal) solution to the original IP is obtained. The approximation analysis of such algorithms is based on reasoning about how worse is the objective value of the rounded solution compared to the fractional one. In other words, what is the increase—or “damage done”—to the optimum objective incurred by the rounding process. Since the fractional optimum of a relaxed LP is a lower bound to the integral optimum, i.e., the optimum of the original problem, the same factor also upper-bounds the difference between the objective value of the rounded solution and the one of the integral optimum. Thus this process derives an approximation guarantee. We show that in our case, the increase can be , by showing that there are cases in which the difference between the integral and fractional optimum objective values is , and thus an algorithm solely based on the rounding procedure cannot hope for an additive approximation. This kind of reasoning is known as an integrality gap, and is demonstrated by the following:

Lemma 6.

For Borda-UCM, an algorithm based solely on rounding the relaxed natural LP cannot obtain additive approximation.

Proof.

We show a lower bound on the approximation ratio in the form of an additive integrality gap. In other words, we show an infinite family of instances where the integral solution to the LP (and thus, to the original problem) gives worse objective value when compared to the fractional solution.

Consider the simple case of candidates, all having equal initial score (w.l.o.g.  for all ) and a single manipulator. When solving the problem, one candidate will be awarded and thus will have final score of . However, in the fractional solution, the optimum is obtained by splitting each score equally, that is, setting for every and . Now every candidate obtained a final score of . Therefore notice that the gap between the objective of the integral and fractional solutions is . ∎

4.2 Introducing Configuration LPs

In order to work around this we will have to resort to a totally different approach, in which variables no longer represent score types, and instead represent the set of scores (configuration) that can be awarded to a candidate.

Formally, a configuration for some candidate is a vector of dimension in which represents a number of scores of type that has received, and for which , that is, the overall number of scores awarded is . For a candidate and a bound , let be the set of configurations that do not cause the candidate overall score to surpass , i.e., the set of configurations for which .

We formulate the configuration LP as follows:

(5)
(6)
(7)

where we wish that the ’s would serve as indicator variables indicating whether or not was awarded with configuration , (5) guarantees that every candidate was given at most configuration and (6) guarantees that every score was awarded at least times. The choice of inequalities over equalities will be explained soon.

Example 1.

Consider the case where , , (i.e., Borda) and . We are omitting the non-manipulator votes that provided , however recall that there is a possible non-manipulator voting yielding any up to an additive factor and an addition of a candidate. Now assume (this is indeed the optimum). Let us focus on the last candidate . should therefore contain all configurations which award at most points. Those configurations are ( points), ( point), , ( points), and ( points).

When solving the , only two of her configurations will get non-zero value: and . We omit the variables corresponding to the rest of the candidates.

After solving the LP, we will execute a rounding procedure that will transform the fractional LP solution into a valid solution for the original problem. This procedure can increase the score of some of the candidates, and hence we wish to start with the smallest possible (so that even after the increase the final score will hopefully be bounded by ).

To find the smallest possible , we perform a one-sided binary search on the value of . For this purpose, for each possible value of that we come across during the binary search, we redefine the LP and then solve the new LP from scratch, and see if it has a valid solution. The reason we do not add as a variable in an objective function (instead of the binary search) is that the number of summands in Equations (5,6) depends on .

This formulation has the obvious drawback that the number of variables is exponential in . However, following the approach of [2], if we find a polynomially-computable separation oracle we can solve the LP by referring to the LP dual and using the ellipsoid method. Such an oracle will require a solution to the following seemingly unrelated problem as a subroutine: a variant of the classic Knapsack problem.

4.3 -Multiset Knapsack

Let be a set of distinct items, where each item has an associated value and a weight . We also obtain a weight upper-bound and a value lower-bound . As opposed to ordinary knapsack, we also obtain an integer . We are required to find a multiset of exactly items (i.e., we can repeat items from the item-set), such that ’s overall weight is at most and ’s overall value is greater than (or to return that no such multiset exists).

Lemma 7.

The -multiset knapsack can be solved in time polynomial in , , and (which is pseudo-polynomial due to the dependence on ).

Proof.

We fill out a table , for and , in which is the highest value obtainable with a size- multiset of items of aggregate-weight at most . Notice that can be filled using the following recursion:

(8)

where if it is defined, i.e., and , and otherwise is .

Therefore can be filled-out using dynamic programming. Finally, the entry contains the highest value obtainable with overall weight at most . Therefore, if , we have found a required multiset; otherwise such does not exists. The resulting multiset itself can be recovered using backtracking on the table . Noticed that the amount of work done is . ∎

4.4 Solving the UCM C-LP

We return to our problem. The choice of inequalities over equalities is motivated by our use of the LP dual in Theorem 9. However, they have the same effect as equalities, as shown by the following lemma:

Lemma 8.

In a solution to the above C-LP, Equations (5,6) will actually be equalities.

Proof.

Notice that by Equation (6):

(9)
(10)
(11)
(12)

where (12) holds by plugging (5) into (11). We therefore obtain that

which forces both above non-trivial LP inequalities to be equalities. ∎

Theorem 9.

Given a fixed value , the UCM C-LP can be solved in polynomial time.

Proof.

We need to refer to the LP dual for our C-LP in order to solve it; we briefly repeat some LP duality concepts here, refer to [23] for complete definitions and discussion.

The dual of a maximization problem is a minimization problem. In order to define it we can treat our primal program as a maximization problem having all coefficients in its objective function. In the dual there is a variable for every constraint of the primal, and a constraint for every variable of the primal. Therefore, we define a variable for each candidate and a variable for each score-type (since the primal has a constraint for each candidate and for each score-type ). However, since our primal has an exponential number of variables, the dual will have an exponential number of constraints. We will show how to address this.

In short, the non-trivial constraints are then obtained by transposing the constraint-coefficient matrix of the primal, using the primal objective function coefficients as the right-hand side of the dual constraints, and using right-hand side of the primal constraints as the coefficients of the dual objective function.

The process yields the following dual:

subject to:

As mentioned, the dual has an exponential number of constraints. However it is solvable; the ellipsoid method [17] is a method for solving an LP which iteratively tries to find a point inside the feasible region described by the constraints. However, we do not need to provide all the constraints in advance. Instead, the algorithm can be provided with a subroutine, called a separation oracle, to which it calls with a proposed point, and the subroutine then either confirms that the point is inside the feasible region or that it returns a violated constraint [12]. The ellipsoid method algorithm performs a polynomial number of iterations, therefore if the separation oracle runs in polynomial time as well, the LP is solved in overall polynomial time. Notice that the polynomial number of iterations performed by the ellipsoid method implies that the number of constraints that played a part in finding the optimum (known as active constraints) was polynomial as well. In other words, we could effectively discard all the constraints but a polynomial number of them.

As discussed, a separation oracle for the dual, given a proposed solution , needs to find in polynomial time a violated constraint, if exists. It remains to show that such a separation oracle is polynomial-time computable.

Observe that a violated constraint to this program is a pair for which (and therefore ) and at the same time . Fortunately, for a specified , finding a configuration that induces a violated constraint can be seen as finding a -multiset (since ) given by a solution to our knapsack variant: is the item set (over which ranges), the value for the item is , while its weight is . The given value lower bound is , and is the given upper bound on the weight. Effectively, we use the possibly-tighter weight bound instead, as bounds the overall weight obtainable with a size- multiset. As now the weight bound is polynomial in and , the solution to our knapsack variant becomes polynomial.

We repeat this knapsack-solving step for each until we find a violated constraint, or conclude that no constraint is violated. Once we have solved the dual using the ellipsoid method with the separation oracle, we can discard all variables in the primal that do not correspond to violated constraints of the dual, since the inclusion of those constraints (resp. their corresponding variables) did not have any effect on the dual optimum (resp. the primal optimum).666In other words, the dual of the dual without the discarded constraints is the primal without their corresponding variables. Another way to explain this is that this is exactly the complementary slackness condition of the Karush-Kuhn-Tucker conditions [16, 18], a necessary condition for obtaining the optimum. The primal now contains only a polynomial number of variables and can be solved directly using the ellipsoid method or any other known polynomial solvers for LP, such as [15]. ∎

5 Algorithm for UCM

Solve the above mentioned configuration-LP formulation as described in Section 4. As mentioned, while both constraints are inequalities, in any solution they will actually be equalities. For each candidate , observe the variables , . Since , treat the ’s as a distribution over the configurations for and randomly choose one according to that distribution. For the time being, give this configuration.

While every candidate now has a valid configuration (and her score does not exceed ), it is possible that the number of scores of a certain type is above or below . Formally, if candidate received a configuration , let the array such that be the histogram of the scores. It is then possible that . If we would translate the configuration given to each candidate to the list of the scores awarded within it, and would write this list as the column of a matrix, this matrix might not be a relaxed manipulation matrix. In order to solve this, we need to replace some of scores in this matrix with others such that the number of scores of each type will be . On the other hand, we need to make sure this process does not add much to the score of each candidate.

Let be a tuple representing the event that candidate received a score of in its configuration. Place all such tuples in a single multiset (if is awarded to more than once, repeat as needed). Now sort this multiset according to the value in an non-decreasing manner (break ties between candidates arbitrarily) thus creating the event-sequence , i.e., the tuples are now indexed by their rank in this sequence. We now start the actually fixing of the scores given; for each tuple having rank in the list, we change the score awarded to (as described by the tuple) from to . To perform this change in the algorithm, it is enough to set followed by setting . This is correct as , for any , represents the number of scores awarded to .

Notice that now every score is repeated times (there are only possible values mapping to the same value). Finally, the corrected configurations represent the final strategy. This can be easily represented as a relaxed configuration matrix by referring to the matrix , where is a column constructed by taking the configuration , represented as an ordered-multiset of scores (each repeats times), in some arbitrary order.

The entire process is summarized as Algorithm 1.

Solve the C-LP as described in Section 4 foreach  do define distribution s.t.  for all and randomly choose .   /* is the empty list */
1 foreach ,  do
      Append copies of to   /* represents the score type */
2      
Sort in an ascending order by   /* if */
3 Re-index such that for  do
       Observe tuple /* Assign the score to , instead of the previous : */
4      
return the relaxed manipulation matrix corresponding to
Algorithm 1 Approximation algorithm.

Let for some constant . Let . In words, is the biggest different between a score in and another score entries away from it.

Lemma 10.

Let be a configuration obtained for some candidate by the rounding process, and let be its corrected version given by the process described above. Then with arbitrary-chosen polynomially-small failure probability,

.

Proof.

Let be the histogram of the original configurations , and let the array be the array of histogram partial sums, i.e., . In a similar manner, define to be the partial sums array w.r.t. each candidate . We will show that with high probability, .

Fix a specific . Notice that

according to the LP constraints, and that , that is, is a random variable which is the sum of the independent random variables for . In addition, for every candidate , it holds that , as a configuration contains at most scores. Therefore, using the generalized Hoeffding inequality [14, Theorem 2]:

Setting , for some arbitrary constant , we get that , that is, the probability that we deviate from by more than can be made arbitrarily polynomially small. Using the union bound, the same can be made to hold for all simultaneously.

Now observe a tuple before being possibly corrected by the algorithm. Since its rank in the sorted sequence is at most the number of scores whose type is at most , which is by definition , we get that , where the second inequality holds with high probability. Therefore by the algorithm changing the score to , the score increases by at most .

Now observe some candidate with a given configuration corrected to become a configuration by the algorithm. Since at worst case, all of ’s scores where affected as such, her overall score has increased by at most . ∎

Corollary 11.

The above algorithm provides an additive approximation with high probability. By repeating the randomized rounding procedure a linear number of times, the failure probability becomes exponentially-small. The overall running time is polynomial.

Proof.

Let be the optimal value for the original problem, and let be the best bound obtainable via the above C-LP combined with the binary search on . Notice that , as the optimal solution is also a valid solution for the C-LP. Now observe the highest scoring candidate in the C-LP. When the algorithm terminates, we get that with high-probability her score is . If we repeat the randomized rounding procedure a linear number of times and pick the iteration yielding the minimum addition to , the probability of not getting a -approximation becomes exponentially-small.

As the additional score given by the algorithm to any other candidate is also , the bound holds for all candidates. We conclude that this is indeed an additive approximation.

As discussed, solving the C-LP is done in polynomial time (by the polynomial number of iterations of the ellipsoid method and the polynomial runtime of the -multiset knapsack separation oracle). The rounding is dominated by going over a polynomial number of non-zero variables of the C-LP and is therefore polynomial as well. It is repeated a linear number fo times in order to provide an exponentially-small failure probability. ∎

From here we can directly obtain Corollary 3:

Proof of Corollary 3.

By noticing that for Borda, . ∎

6 Linear Programming for WCM

When turning to the WCM problem, the ‘natural’ LP still suffers from the deficiencies described in Section 4. We again resort to using configurations. However, configurations will now be defined in a different manner, since now, when each voter has an associated weight, voters are not identical anymore and therefore our configurations need to capture the identity of the voters.

A configuration for some candidate is now defined as a length- sequence in which if the voter awarded to . For a candidate and a bound , is again the set of configurations that do not cause the candidate overall score to surpass , which this time is formally .

The configuration LP is now formulated as follows:

(13)
(14)
(15)

Again, we wish that the ’s would serve as indicator variables indicating whether or not was awarded with configuration , (13) guarantees that every candidate was given at most configuration and (14) guarantees that every score was awarded by every voter at least once. The choice of inequalities over equalities will be explained soon.

We present another—much more complex—Knapsack variant, which will be used later by the separration orcale needed for solcing the C-LP.

6.1 -Sequence Knapsack

Let be a set of distinct items. In the -sequence knapsack problem we are required to construct a length- sequence of items; we can repeat items from the item-set, however, we are subject to some additional constraints as will be specified immediately. The input to the problem is the following:

  • A value , for every where is the value obtained by placing item at location in the sequence.

  • A cost for each item , and a penalty for each location . Placing an item at location in the sequence has a penalized-cost , i.e., it depends on both the item’s weight and the penalty for location .

  • A value lower-bound .

  • A penalized-cost upper-bound .

The resulting sequence should abide the following constraints:

  • ’s overall value is greater than .

  • ’s overall penalized-cost is at most .

If such sequence exists, we should return it; otherwise we return that no such sequence exists.

Lemma 12.

The -sequence knapsack can be solved in time polynomial in ,, and (which is pseudo-polynomial due to the dependence on ).

Proof.

Similar to the proof of Lemma 7, we fill out a table , for and , in which is the highest value obtainable with a length- sequence of items of penalized-cost at most . This time is filled using a different recursion:

(16)

where if it is defined, i.e., and , and otherwise is .

Therefore can be filled-out using dynamic programming. Finally, the entry contains the highest value obtainable with overall cost at most . Therefore, if , we have found a required sequence; otherwise such does not exists. The resulting sequence itself can be recovered using backtracking on the table . Noticed that the amount of work done is . ∎

6.2 Solving the WCM C-LP

We return to our problem. Again, the choice of inequalities over equalities is motivated by our use of the LP dual in Theorem 14. However, they have the same effect as equalities, as shown by the following lemma:

Lemma 13.

In a solution to the above C-LP, Equations (13,14) will actually be equalities.

Proof.

Notice that by Equation (14):

(17)
(18)
(19)
(20)
(21)

where (21) holds by plugging (13) into (20). We therefore obtain that

which forces both above non-trivial LP inequalities to be equalities. ∎

Theorem 14.

Given a fixed value , the WCM C-LP can be solved in polynomial time.

Proof.

We again refer to the LP-dual, which this time is:

subject to:

Furthermore, the above single non-trivial constraint can be more conveniently re-written as

We again turn to the ellipsoid method with a separation oracle; this time, a violated constraint to this program is a pair for which (and therefore ) and at the same time . For a specified , finding a configuration that induces a violated constraint can be seen as finding a -sequence given by a solution to our knapsack variant: is the item set (over which ranges), the value for placing item at location is , item ’s cost is , and the penalty for location is . The given value lower bound is , and is the given upper bound on the penalized cost. Effectively, we use the possibly-tighter weight bound instead, as bounds the overall cost obtainable with a length- sequence. As now the weight bound is polynomial in and , the solution to our knapsack variant becomes polynomial.

We repeat this knapsack-solving step for each until we find a violated constraint, or conclude that no constraint is violated. Once we have solved the dual using the ellipsoid method with the separation oracle, we continue in a similar fashion to the proof of Theorem 9. ∎

7 Algorithm for WCM

Solve the above mentioned configuration-LP formulation as described in Section 6. As both constraints will be equalities, for each candidate , we treat the ’s as a distribution over the configurations for since , and randomly choose one according to that distribution. For the time being, give this configuration.

As for UCM, every candidate now has a valid configuration but constraints may still be violated; it is possible that the number of scores of a certain type given by a specific voter is not exactly . Formally, fix a specific voter ; we let the array such that be histogram of the scores with respect to . It is then possible that . Our goal, as before, is to fix this without introducing too much of an addition to the candidates’ overall scores. However, there is now some added complexity due to the necessity to preserve the identity of the voter when fixing a specific score given.

Let be a tuple representing the event that candidate received a score of from voter in its configuration. Fix a manipulator , and place all tuples having as their respective voter in a set , that is . Sort according to each tuple ’s score-index , and let be the resulting list. Notice that any tuple in represents the event that currently . Now change