Group Fairness in Multiwinner Voting

# Group Fairness in Multiwinner Voting

L. Elisa Celis, Lingxiao Huang, and Nisheeth K. Vishnoi
EPFL, Switzerland
first.last@epfl.ch
###### Abstract

We study multiwinner voting problems when there is an additional requirement that the selected committee should be fair with respect to attributes such as gender, ethnicity, or political parties. Every setting of an attribute gives rise to a group, and the goal is to ensure that each group is neither over nor under represented in the selected committee. Prior work has largely focused on designing specialized score functions that lead to a precise level of representation with respect to disjoint attributes (e.g., only political affiliation). Here we propose a general algorithmic framework that allows the use of any score function and can guarantee flexible notions of fairness with respect to multiple, non-disjoint attributes (e.g., political affiliation and gender). Technically, we study the complexity of this constrained multiwinner voting problem subject to group-fairness constraints for monotone submodular score functions. We present approximation algorithms and hardness of approximation results for various attribute set structures and score functions.

## 1 Introduction

The problem of selecting a committee from a set of candidates given the preferences of the voters, called a multiwinner voting problem, arises in various social, political and e-commerce settings – from electing a parliament to govern a country, to selecting a committee to decide prizes and awards, to e-commerce applications in which a small representative subset must be selected from a large set of data. Formally, there is a set of “candidates” (i.e., people, products, articles, or other items) which can be selected, and a set of voters that each has preferences over the candidates. The goal is to select a size- subset of based on these preferences. Given the preference lists, it remains to specify how the selection will be made. One popular approach is to define a “total” score function according to the voters’ preferences. This reduces the selection to an optimization problem: pick a size- committee that maximizes the score. Different views on the desired properties of the selection process have led to a number of different scoring rules and, consequently, to a variety of algorithmic problems that have been a topic of much recent interest; see [17]. Prevalent examples include general multiwinner voting rules such as the committee scoring rules [15, 3], the approval-based rules [2], the OWA-based rules [41], variants of the Monroe rule [5, 28, 39] and the goalbase rules [44].

An important concern, that has gained significance due to the growing deployment of algorithms to select committees or subsets in various contexts, is that voting rules can allow or even exacerbate human biases. For instance, it has been shown that voting rules, in the most general sense, can affect the proportion of women in the US legislature [36], and result in a disproportionate electorate (such as an under-represented minority) [16, 22]. An increasing awareness of such algorithmic bias has reached the public and government eye and there are generic [34] and specific recommendations. For example, the Royal Commission on the Electoral System requires that the chosen committee should have enough representations of minority groups. 1

In response, “proportional representation” rules [28, 7], that ensure that the preferences of the electorate are reflected proportionately in the elected body, are being deployed. In this setting, each group should have representations proportional to the number of voters of that group. Formally, let be disjoint groups of candidates where represents a given group. For any , let be the fraction of voters who prefer group . Then, a voting rule achieving full proportionality would ensure that the selected committee satisfies The above property is also called “quota” [7, Definition 5] and can often be achieved by modifying or designing appropriate score functions. It has been shown that the Monroe rule achieves fully proportional representation (quota), and the proportional approval voting (PAV) rule [7] achieves a weaker version called “lower quota”, i.e., holds for all . However, if a group is small (the size is less than where is the number of voters), there may be no represented candidate in the optimal committee. Aziz et al. [2] propose an axiom called “justified representation” which addresses this problem, and introduce approval-based rules that satisfy justified representation. Chamberlin and Courant [11] propose the CC-rule which aims to obtain a committee such that for all . Still, such proportional representative schemes can, at best, maintain societal biases and may not be able to alleviate them. Koriyama et al. [23] argue that a fair outcome must necessarily give smaller groups of voters disproportionately many representatives and invoke the concept of “degressive proportionality”.222There are two conditions for degressive proportionality: one is that the number of representatives of group increases as increases, and the second is that the representation ratio decreases as increases. Subsequently, Brill et al. [7] study which approval-based election rules can achieve different kinds of proportionality, including degressive proportionality (especially, Penrose’s square root law [32]), and full proportionality. The above works consider a single attribute (such as political affiliation); this results in a group structure that is a partition. Often, in applications there may be multiple attributes (e.g., gender, ethnicity and political affiliation) leading to non-disjoint groups, and we may wish to guarantee proportionality across all of them. Lang and Skowron [24] take a step towards this and consider the problem of committee selection in the multi-attribute setting with the goal of being close to a given target composition. However, this and previous works do not allow for completely arbitrary group structures, flexible fairness constraints and, importantly, to take into account general multiwinner score functions that associate a utility to the selected committee; hence a new multiwinner voting framework along with the corresponding algorithmic solutions are required in which the score is optimized subject to ensuring that no group is over/under-represented.

### 1.1 Our Contributions

We introduce and study multiwinner voting problems in the presence of “group-fairness” constraints, which have also been considered for other fundamental problems, including classification [14, 45], data summarization [9] and ranking [10]. Given (potentially) non-disjoint groups over candidates, we consider constraints on the committee of size of the form: where and are given as input depending on the desired requirement of proportionality. Let denote the family of all size committees that satisfy the fairness constraints. Given a score function , 333As is standard, the score function is given as an oracle that can evaluate in polynomial time for any committee . we then study the complexity of finding an that maximizes .

Our framework both encompasses and generalizes prior work, and is defined by (1) a multiwinner voting rule, plus (2) fairness constraints with respect to the groups. The groups and corresponding fairness parameters are taken as input and can be set according to the underlying context and desired outcome. Furthermore, the fair multiwinner voting rule that results after adding fairness constraints will continue to satisfy many nice properties (including, e.g., consistency, monotonicity, and fair variants of weak unanimity or committee monotonicity) of the (unconstrained) voting rule; see Section 3.4 for details. Our framework includes existing models that achieve various kinds of proportional representation by simply setting the fairness parameters appropriately, and furthermore can handle more realistic and complex cases in which the proportions are flexible and/or candidates belong to more than one group; see Section 3.3 for examples.

Towards designing algorithms to compute such group-fair committees, we consider the general class of monotone submodular multiwinner voting rules whose total score function is “monotone” and “submodular”. Recall that a function is a monotone submodular (MS) function if for all and for all . This includes the Chamberlin-Courant (CC) rule, the Monroe rule, the OWA-based rules and the goalbase rules. As this generalizes prior work on the unconstrained case, the algorithmic problems that arise largely remain NP-hard and we focus on developing approximation algorithms for them.444It is a common practice in the voting literature to develop approximation algorithms when solving the exact problem is NP-hard, including for other committee selection problems (see, e.g., the references in Sec 3). An important parameter that plays a role in the complexity of the constrained multiwinner voting problem is the maximum number of groups in which any candidate can be; we denote it by Our results (classified by the settings of the parameter ) are summarized in Tables 1 and 2.

When , that is when each committee member can be a part of at most one group, we present a -approximation algorithm (Theorem 17) which matches the -hardness of approximation [31]. When unlike the unconstrained case, even checking whether there is a feasible solution becomes NP-hard (Theorem 8). Further, the problem of finding a solution that violates cardinality or fairness constraints up to any multiplicative constant factor remains NP-hard (Theorem 9 and 10). Moreover, even if the feasibility is guaranteed, the problem remains hard to approximate within a factor of (Theorem 11). To bypass the issue of ensuring feasibility, we assume the problem instance always has a feasible solution and present certain sufficient conditions on the input instances that guarantee feasibility. For instance, if we assume that the fraction of each group in the selected committee is allowed some slack when compared to their proportion in the set of candidates ( and ). then a random size- committee is likely to be feasible. We discuss this and other natural assumptions that can ensure feasibility in Section 3.2. Subsequently, we give a near-optimal bi-criterion approximation algorithm violating each fairness constraint by a small multiplicative factor for the general class of MS voting rules (Theorem 14). We also study special cases: the fairness constraints involve only lower bounds (Theorem 16), only upper bounds (Theorem 15), or there is a constant number of fairness constraints (Theorem 19).

Finally, we also study some special voting rules with MS score functions such as SNTV, -CC and -CC (see Section 2.2 for definitions) where the unconstrained problem has recently received considerable attention. See Table 2 for a summary of our results for these rules. Unlike the unconstrained case where a PTAS is known, we show that constrained -CC multiwinner voting is -inapproximable ( is constant) even if and (Theorem 18). The case of is intriguing and not entirely settled. We prove that the constrained SNTV multiwinner voting problem has a polynomial-time exact algorithm if (Theorem 20).

Our algorithms for constrained MS multiwinner voting in fact apply to the more general problem of constraind monotone submodular maximization. Our algorithmic results combine two existing tools that have been extensively used in the monotone submodular maximization literature. The first is called “multilinear extension” (Definition 12), which extends the discrete MS score function to a continuous function over a relaxed domain. By applying a continuous greedy process via multilinear extension, a fractional solution of a high score can be computed efficiently. The second is to round the fractional solution to a size- committee by “dependent rounding”. In the case of (Theorem 14), we use a swap randomized rounding procedure introduced by [12]. In the case of (Theorem 17) we design a two-layered dependent rounding procedure that runs in linear time. Some of our algorithmic results are achieved by reduction to well-studied problems, like the monotone submodular maximization problem with -extendible system (Theorem 15) and constrained set multi-cover (Theorem 16). Our hardness results build on [10] and follow from reductions from well-known NP-hard problems, including -hypergraph matching, 3-regular vertex cover, constrained set multi-cover and independent set.

## 2 Preliminaries

In this section, we present the formal definition of our model and the definitions of certain special monotone submodular voting rules which are considered in this paper.

### 2.1 Our Model

We first give the definition of the constrained multiwinner voting problem.

###### Definition 1

(Constrained multiwinner voting) Given a set of candidates, a set of voters that each has preferences over the candidates, and a score function , a number , arbitrary groups and numbers , we consider fairness constraints on the committee of size of the form:

 ℓi≤|S∩Pi|≤ui,∀i∈[p].

Let denote the family of all size committees that satisfy the fairness constraints. The goal of the constrained multiwinner voting problem is to select an that maximizes . If the score function is monotone submodular, we call the problem the constrained MS multiwinner voting problem. We only require that we are given an oracle to the function – given a set , such an oracle outputs

### 2.2 Special Cases of Monotone Submodular Voting Rules

A classical monotone and submodular multiwinner voting rules is called Chamberlin-Courant voting [39]. In this section, we recall three special cases of the Chamberlin-Courant rule: SNTV, -CC, and -CC, which will be used in the following sections. We first define some notions. Let be a set of preference orders where ranks all the candidates for voter . For a preference order and a candidate , we write to denote the position of in (candidate ranked first has position 1 and ranked last has position ). Let be a size- committee and let be a preference order. We denote to be the sequence of positions of the candidates in sorted in increasing order. Define to be the set of all size- increasing sequences of elements from .

###### Definition 2

(CC) In the Chamberlin-Courant rule, there exists a positional score function satisfying that if . Define for any . The total score function is defined by .

In the following, we introduce three special cases of the CC rule: single non-transferable vote (SNTV), Approval-based Chamberlin-Courant (-CC) and Borda-based Chamberlin-Courant (-CC).

###### Definition 3

The single non-transferable vote (SNTV) rule uses the following positional score function:

 γm(i)= 1, if i=1, (1a) γm(i)= 0, otherwise. (1b)
###### Definition 4

The Approval-based Chamberlin-Courant (-CC) rule uses the following positional score function:

 γm(i)= 1, if i≤k, (2a) γm(i)= 0, otherwise. (2b)
###### Definition 5

The Borda-based Chamberlin-Courant (-CC) rule uses the following positional score function:

 γm(i)=m−i. (3)

If using the SNTV/-CC/-CC rule, we call our framework the constrained SNTV/-CC/-CC multiwinner voting problem respectively.

###### Remark 6

All three special CC rules have practical applications. Suppose an airline wants to select some movies to display. One of the best strategies is to present a set of options which are as diverse as possible, i.e., each passenger should see some movie appealing to him or her. If each passenger is only satisfies by his or her favorite movie, then the SNTV rule is a natural choice. On the other hand, if each passenger has a set of good movies and is satisfied by any one of them, then it is natural to use the -CC rule. Finally, if each passenger has a ranking of the movies and the individual score of movies decreases linearly according to the ranking, then -CC is our rule of choice.

## 3 Discussion

### 3.1 Generality of Fairness Constraints

Our framework encompasses many existing notions of fairness that have arisen in the machine learning literature. We summarize them in the following.

#### Fairness in the ML Literature.

1. Statistical parity [14]: Consider the case where the voters can also be partitioned into the same groups (e.g., if the groups encode ethnicity). Let denote the number of voters that have type . A committee of size has statistical parity if for all . Given some for each , we say committee satisfies -statistical parity if We can set fairness constraints that guarantee -statistical parity by setting and such that for all .

2. Diversity: Diversity rules [13, 37] typically look at the population of applicants (i.e., candidates) and assert that the number of minorities in the committee should be proportional to the number of minorities in the applicant pool, e.g., the 80% rule [13, 37]. Thus, the goal would be to ensure that, for a committee of size satisfies We say a committee satisfies -diversity if . We can ensure this by setting and so that for all .

#### Fairness in the Voting Literature.

Additionally, the constraints also generalize notions of proportionality that have arisen in the voting literature such as fully proportional representation [28], fixed degressive proportionality [23] and flexible proportionality [7]; see the following. For any , let denote the number of voters who prefer to type .

1. Unconstrained multiwinner voting: Let and for all .

2. Fully proportional representation [28]: Let and for all .

3. Fixed degressive proportionality [23]: For example, to ensure the committee satisfies the Penrose square root law, let [32] and for all . More generally, any fixed degressive proportionality can be attained by setting fairness parameters appropriately.

4. Flexible proportionality [7]:

1. In [7, Definition 5], the authors consider a type of flexible proportionality called lower quota, where each group gets at least seats. Then we can set and for all .

2. Another example of flexible proportionality is as follows. If we would be satisfied with a committee that has proportions which lie anywhere between the fully proportional and Penrose square root law, then we can set and for all . Such flexibility is valuable because it can allow for a higher score; see Example 3.2 in Section 3.3.

#### Fairness across non-disjoint groups.

Importantly, our fairness constraints also allow us to ensure fairness across multiple sensitive attributes – the above example from the literature only consider groups that partition the set of candidates (e.g., ethnicity). Our framework significantly generalizes these notions by allowing arbitrary groups over which we may wish to impose constraints.

1. Non-disjoint group constraints: One can ensure both fully proportional representation by political party and by demographic group by placing constraints (a) above on groups that correspond to the political party and constraints (b) on groups that correspond to demographics. Sometimes, groups can be arbitrary subsets instead of multiple partitions. For instance, let groups represent the major of applicants, and there are applicants with a double major or even more. Our framework can also handle this; see Example 3.4 in Section 3.3.

### 3.2 Feasibility Conditions and Properties

Our algorithmic results can bypass the barriers posed by our hardness results in Section 5 by assuming the instances are feasible. Here we argue that under many natural conditions in the multiwinner voting setting, we can deduce that feasible solutions exist and can construct them. We present some examples below.

1. The proportional representative condition ensures feasibility: Assume that

 k≥100lnp. (4)

This is natural as we expect to be much larger than ; each candidate only has a small set of types over which we would like to impose fairness. Furthermore, assume that there is some slack between the true and allowable fraction of each type in the selected committee, i.e.,

 ℓi≤k⋅(|Pi|m−c), ui≥k⋅(|Pi|m+c) (5)

for all and some small constant . Given these assumptions, a random committee uniformly chosen from all size- committees is feasible with high probability by the sampling variant of Chernoff bound 555see [27] for details. and union bound.

2. The single type condition ensures feasibility: Assume that for each type , there are sufficiently many candidates with only this type . Moreover, we assume . This guarantees that we can always select candidates with single type satisfying all fairness constraints.

3. The bounded-parameter condition ensures feasibility: Assume we are in a setting where the s are unbounded; this is natural, e.g., when we simply want to ensure minority representations. Further, assume that . Thus, we can ensure the committee includes at lest candidates belonging to .

### 3.3 Price of Fairness

Enforcing fairness must naturally come at a cost – the feasible space of committees becomes smaller and hence the optimal score may decrease. This leads to a natural question: To what extent does the score decrease by introducing the fairness constraints? In some cases, the constraints can result in an arbitrarily bad score; see the following examples. The first example shows that even a small change of fairness parameters may lead to a significant difference in the optimal score.

###### Example 3.1

Consider a multiwinner voting instance with , , and . Let , , , , and . Voters have a preference order , and the other 100 voters have a preference order . Suppose we use the -CC rule. The optimal committee of the unconstrained case is of total score 49*200=9800.

1. If the fairness parameters are , , and . Then there is only one feasible committee of total score 1*200=200. We lose a lot of total score in this setting.

2. However, if we slightly change the fairness parameters by resetting (other parameters keep the same), then is a feasible committee of score 9800, which is equal to the unconstrained case.

The second example shows that even a slight amount of flexibility in the constraints (i.e., ) can significantly improve the score as compared to existing “fixed” notions of fairness such as fully proportional representation and fixed degressive proportionality (in which, effectively, ).

###### Example 3.2

Consider a multiwinner voting instance with , , and . Let , and . Voters are also partitioned into two groups: 100 voters prefer type 1, and 900 voters prefer type 2. The goalbase score function is defined as follows: A committee has score 100 if it has exactly three members in , and has score 0 otherwise. The optimal score of the unconstrained case is 100, which can be achieved by .

1. Suppose we require full proportionality, i.e., the size-20 committee consists of candidates in and candidates in . In this case, we set and . Any committee satisfying this condition has score 0.

2. Suppose we require the Penrose square root law, i.e., the size-20 committee consists of candidates in and candidates in . In this case, we set and . Any committee satisfying this condition has score 0.

3. Suppose we only require a flexible proportionality: the selected committee contains at least two representations in and at least 15 representations in . In this case, we set , and . An optimal committee is of score 100.

On the other hand, in natural settings, e.g., when each voter prefers candidates of a given type over all other candidates, the optimal constrained score remains close to the optimal unconstrained score no matter how we select the fairness parameters; see the following example.

###### Example 3.3

Consider a multiwinner voting instance satisfying the following conditions.

1. For each , we have . Moreover, .

2. Each voter has a preferred type , and each candidate in is one of her favourite candidates.

Consider any feasible fairness constraints satisfying that for all . We have the following observations.

1. Suppose we use the -CC rule. Since each group has at least one representation, the positional score of each voter is 1. Therefore, the optimal score must be , which is equal to the unconstrained case.

2. Suppose we use the -CC rule. The optimal score of the unconstrained case is at most . By the condition of fairness constraints, each group has at least one representation. Hence the positional score of each voter is at least . Therefore, the optimal score is at least , no matter how we choose the fairness parameters.

Finally, we give an example illustrating that our framework can ensure fairness across multiple attributes and loss few score as compared to the unconstrained version.

###### Example 3.4

Consider a multiwinner voting instance with , , and . There are two attributes: gender and ethnicity, and four groups: men group , women group , Caucasian group , and African-American group . The preference orders are given as follows.

1. Voters have a preference order .

2. Voters have a preference order .

3. Voters have a preference order .

4. The last fifty voters have a preference order .

Suppose we use the -CC rule. The optimal committee of the unconstrained case is of total score 7*200=1400. However, this committee consists of four men and lacks fairness in gender.

Our framework can overcome this problem by setting for all , i.e., we require the chosen committee to consist of two men and two women, while consisting of two Caucasians and two African-Americans. Then an optimal committee is of total score 7*100+6*100=1300. Thus by introducing the fairness constraints, the optimal committee is fair both of gender and ethnicity and loses only a small amount of score.

### 3.4 Properties of Voting Rules with Fairness Constraints

There are various properties that generally one would like a multiwinner voting rule to satisfy. For example, Elkind et al. [15] show that the single non-transferable vote (SNTV) rule satisfies many properties including committee monotonicity, solid coalitions, consensus committee, weak unanimity, monotonicity, homogeneity, and consistency. A variety of nice properties continue to be satisfied by the scoring rules in the presence of fairness constraints. We present some examples below.

Consistency means that if is an optimal committee for voters and also for voters , then must also be an optimal committee for the set of voters . As the fairness constraints restrict the set of feasible committees in the same way for any set of voters, the argument for consistency remains the same (see [15, Theorem 7]). In fact, consistency holds for any committee scoring rule with fairness constraints. Similarly, by the same arguments as in [15], if a committee scoring rule satisfies monotonicity (respectively, homogeneity) then the same scoring rule in the presence of fairness constraints still satisfies monotonicity (respectively, homogeneity).

On the other hand, the remaining properties may not be preserved. The problem arises because the fairness constraints can make certain committees infeasible and thus force the desired property to be violated. Feasibility, however, appears to be the only bottleneck and motivates the definition of corresponding fair properties. For example, weak unanimity states that if a set of candidates dominates (see [17, Section 2.2.1]) all other size subsets of candidates with respect to any voter’s preference list, then must be an optimal committee. We can instead define a fair version of weak unanimity; see the following definition.

###### Definition 7

(Fair weak unanimity.) Let be the collection of all size- committees that satisfy all fairness constraints. If dominates (see [17, Section 2.2.1] for the definition) any other committee in with respect to any voter’s preference list, then must be an optimal committee.

Any committee scoring rule that satisfies such weak unanimity (or, similarly, committee monotonicity) will satisfy the fair version of these properties in the presence of fairness constraints.

### 3.5 Incorporating Fariness in Multiwinner Voting Rules

Some multiwinner voting rules are not defined by a score function; e.g., the single transferable vote (STV) rule [42] and the Greedy Monroe rule [39], which select candidates in rounds instead of selecting the whole committee simultaneously. It is unclear how fairness constraints can be added to such voting rules and we leave this problem for future work.

## 4 Other Related Work

The study of total score functions and their resulting optimization problems have received much attention in recent years. Often the optimization problem turns out to be NP-hard; both -CC and -CC are NP-hard [35], is the best approximation ratio for -CC [39], the Monroe rule is computationally hard even if the voting parameters are small [5], and the OWA-based rules are hard in general [41] as are the goalbase rules in various settings [44]. Hence, one must largely resort to developing approximation algorithms for these problems. Towards this, there is a greedy algorithm which achieves approximation ratio of for both the -CC and -CC rules [26] and a PTAS for -CC [39]. For the Monroe rule, there is a 0.715-approximation algorithm with Borda scoring and a -approximation algorithm with an arbitrary positional score function [39]. There is also a -approximation algorithm for OWA-based rules with monotone weight vectors [41].

The majority of score functions for multiwinner voting rules that have been studied are monotone submodular, including the CC rules [39], Monroe rule [39], OWA-based rules [41] and goalbase rules [44]. Algorithmic design for such score functions, including several of the -approximation algorithms mentioned above, have benefitted from theoretical developments in the area of monotone submodular function maximization [31, 8].

Goalbase score functions are defined by specifying an arbitrary set of logic constraints, and letting the score of a committee be a weighted sum of the constraints that it satisfies [43, 44]. However, there are no known efficient algorithms to solve goalbase functions; incorporating such logic constraints in the score leads to complicated objective functions that elude good optimization methods. In contrast, we can provide efficient approximation algorithms for our setting. There is some literature that studies voting in the multi-attribute setting; see the survey of Lang and Xia [25]. However, the aim of that line of work is to output a single winner, while we instead consider multiwinner voting. Another related model is called constraint approval voting (CAP), proposed by Brams [6] and Potthoff [33]. However, there is no efficient algorithm since the input in CAP is exponentially large in the number of attributes. The paper of [24] considers the case of selecting a committee that is close to satisfying a single target composition with respect to a specified set of multiple attributes, however their setting does not consider a score function.

Investigating the properties of multiwinner voting rules has been an important consideration. Skowron et al. [40] show that committee scoring rules can be characterized by four standard properties. Elkind et al. [15] consider ten different committee selection rules including CC and Monroe and study their properties.

## 5 Hardness Results

In this section we present our hardness results for the constrained multiwinner voting problem. Our first theorem addresses the complexity of the feasibility problem. Recall that is the maximum number of groups in which a candidate can be. The following theorem shows that when a candidate may be part of or more groups, just the feasibility problem can become NP-hard; even under mild feasibility conditions.

###### Theorem 8

(NP-hardness of feasibility: ) The constrained multiwinner voting feasibility problem is NP-hard for any . This is true even if we assume that all or all .

Proof: For the case that all and , we reduce from -hypergraph matching, which is known to be NP-hard [21]. The reduction is inspired by the NP-hard argument for the constrained ranking feasibility problem [10, Theorem 3.1]. Let be a -hypergraph with and . We construct a constrained multiwinner voting instance as follows. For each hyperedge , we construct a corresponding candidate . For each vertex , we construct a corresponding group . Thus, there are groups and candidates. For each group , let and , meaning that each group can be hit at most once. It corresponds to the requirement that each vertex can be covered by at most one hyperedge. Hence admits a size- matching if and only if there is a feasible size- committee in the constrained multiwinner voting instance, which completes the proof.

For the case that all and , we present a reduction from the 3-regular vertex cover problem, which is known to be NP-hard [1]. Let be a 3-regular graph with and . The problem is to test whether there is a vertex cover of size . We construct a constrained multiwinner voting instance whose feasibility is equivalent to having a vertex cover of size as follows. We construct a corresponding candidate for each vertex , and a corresponding group for each edge . Thus, there are candidates, groups and the degree . Let for all , i.e., each group has at least one representation. It corresponds that each vertex in should be covered by some chosen edge. Therefore, it can be concluded that each vertex cover of size corresponds to a size- feasible committee, which finishes the proof.

The next two theorems show that the feasibility problem remains hard even if one allows the size of the committee or the fairness constraints to be violated.

###### Theorem 9

(Hardness of feasibility with committee violations) Let for all . For any , the following gap violation variant of the constrained multiwinner voting feasibility problem is NP-hard.

1. Output YES if the input instance is feasible.

2. Output NO if there is no feasible solution of size at least .

Proof: We make a reduction from the following constrained set multi-cover problem [4], which is NP-hard to approximate within a factor by [4]. Given a ground set together with a weight function , and a collection of sets , the goal is to choose some sets of minimum cardinality covering each element by a factor of at least .

Then we construct a constrained multiwinner voting instance. Construct a corresponding candidate for each set , and a corresponding group for each element . Hence there are candidates and groups. Define to be for all . Observe that a feasible committee of size- exists if and only if the minimum cardinality of the constrained set multi-cover problem is at most . Therefore, a polynomial-time algorithm for the constrained multiwinner voting feasibility problem implies a -approximation algorithm for the constrained set multi-cover problem. This fact completes the proof.

###### Theorem 10

(Hardness of feasibility with fairness violations) Assume for each . For every , the following violation variant of the constrained multiwinner voting feasibility problem is NP-hard.

1. Output YES if the input instance is feasible.

2. Output NO if there is no solution which violates every fairness constraint by a factor of at most , i.e., there is no size- solution such that .

Proof: Similar to [10, Theorem 3.4], we use the inapproximability of independent set [20, 46] to prove the theorem. It is NP-hard to approximate the cardinality of the maximum independent set problem in undirected graphs within a factor of for any constant .

Given a graph with and and a number , the goal is to check whether there exists a size- independent set in . For each vertex , we construct a corresponding candidate . For every cardinality- clique, we construct a property and set its fairness upper bound to be 1. Observe that there are at most fairness constraints.

We have the following claim:

1. If there is a size- committee that violates every fairness constraint by a factor of at most , then there is an independent set of cardinality in .

2. If there is no size- feasible committee, then there is no cardinality- independent set.

If the above claim holds, then a polynomial-time algorithm for the constrained multiwinner voting feasibility problem implies a -approximation algorithm for the maximum independent set problem. Hence it remains to prove the claim.

For the first claim, if a size- committee that violates every fairness constraint by a factor of by a factor of at most exists, then we have a subset of size- that does not contain any -clique. By a standard upper-bound on the Ramsey number , there exists an independent set of cardinality in . The second claim is not hard, since every independent set of size implies a feasible size- committee for the constrained multiwinner voting instance.

Further, we show that if we assume that we know a feasible instance of the constrained multiwinner voting problem, the hardness does not go away.

###### Theorem 11

(Inapproximability for feasible instances) A feasible constrained multiwinner voting problem that satisfies for each is NP-hard to approximate within a factor of .

Proof: It suffices to prove the SNTV case. The reduction is from maximum -hypergraph matching, which is hard to approximate within a factor of by [21].

Let be a -hypergraph with and . We use the same construction of the constrained multiwinner voting instance as in the proof of Theorem 8. In addition, we construct another candidates which do not belong to any group. Note that this instance is satisfiable since is a feasible committee. Then we define the total score function. We first construct voters and let voter most prefer to (). According to the SNTV rule, a feasible committee consisting of candidates from and candidates from has score . On the other hand, such a committee corresponds to a cardinality- matching of the maximum -hypergraph matching problem. Therefore, to compute the optimal score is equivalent to finding the maximum -hypergraph matching in . Since the reduction is approximation preserving, we finish the proof.

## 6 Algorithmic Results

In this section we present our main algorithmic results. We first introduce a useful notion called “multilinear extension” for monotone submodular optimization.

###### Definition 12

(Multilinear extension) Given a monotone submodular function , the multilinear extension is defined as follows: For , denote to be a random vector in where the th coordinate is independently rounded to 1 with probability or 0 otherwise. Then we let

 F(y)=E[f(^y)]=∑R⊆[m]f(R)∏i∈Ryi∏j∉R(1−yj).

Let .

###### Lemma 13

[8] For any and , is convex.

Define as the family of all size- committees that satisfy the fairness constraints. Denote a polytope to be the set of all vectors that satisfy the cardinality constraint and all fairness constraints. Let denote the multilinear extension of the total score function . Let be the optimal score of the constrained MS multiwinner voting problem.

### 6.1 The Case of Δ≥2

Theorem 11 implies that it may be hard to find a committee only violating the fairness constraints by a small amount when . On the other hand, the following theorem shows that a constant approximation solution can be achieved that violates all fairness constraints by at most a multiplicative factor for feasible instances.

###### Theorem 14

(Bi-criterion algorithm when ) Consider a feasible constrained MS multiwinner voting instance with . Let and . There exists a polynomial-time algorithm that outputs a size- committee with score with constant probability, and satisfies the following for all :

 (1−\nicefrac2√lnp√L)ℓi≤|S∩Pi|≤(1+\nicefrac2√lnp√U)ui. (6)

The approximation ratio is for the SNTV rule.

Before proving this theorem, we discuss the assumption and the consequences of Theorem 14. Firstly, the assumption that is reasonable for many voting rules, such as the CC rule, the OWA-based rule, and the Monore rule. The reason is that if enough (say at least ) voters have at least one representation in the optimal committee, then the total score of these rules is usually at least .

Under reasonable assumptions, the violation in fairness constraints in the above theorem can be seen to be small. First, assume that no group is too small: Groups corresponding to gender, ethnicity and political opinions are often large. Combining this with the proportional representative condition (see Equations (4) and (5) in Section 3.2):

 k≫100lnp,

and

 ℓi≈k⋅(|Pi|m−0.05), ui≥k⋅(|Pi|m+0.05)

for all , we observe that is a small number. Thus, the violation of the group-fairness condition by Theorem 14 is small. We expect such algorithmic solutions to be deployed for the development of automated systems, such as movie selection on the airplane and news recommendation for websites, for which the above assumptions are natural.

Proof: [of Theorem 14] Let be any given constant. We first obtain a fractional solution by using the continuous greedy algorithm in [8, Section 3.1] and using the fact that there is an efficient algorithm to optimize a linear function over . It follows that by [8, Appendix A]. Next, we run the randomized swap rounding algorithm in [12] and obtain a size- committee . By [12, Theorem 2.1], we have and for any and any ,

 Pr[|S∩Pi|≤(1−δ1)ℓi]≤e−ℓiδ21/2,Pr[|S∩Pi|≥(1+δ2)ui]≤(\nicefraceδ2(1+δ2)1+δ2)ui. (7)

Then we have the following corollary.

 Pr[|S∩Pi|≥(1+δ2)ui]≤(\nicefraceδ2(1+δ2)1+δ2)ui(% Inequality~{}(???))≤(\nicefraceδ2eδ2(1+δ2))ui(e≤(1+δ2)1/δ2)=e−uiδ22. (8)

Let and . By the union bound, we have the following inequality

 Pr[∀i∈[p],(1−\nicefrac2√lnp√L)ℓi≤|S∩Pi|≤(1+\nicefrac2√lnp√U)ui] \lx@stackrelunion bound≥ 1−∑i∈[p]Pr[|S∩Pi|≤(1−\nicefrac2√lnp√L)ℓi]−∑i∈[p]Pr[|S∩Pi|≥(1+\nicefrac2√lnp√U)ui] \lx@stackrelInequalities~{}(???) and~{}(???)≥ 1−∑i∈[p]e−ℓi(\nicefrac2√lnp√L)2/2−∑i∈[p]e−ui(\nicefrac2√lnp√U)2 ≥ 1−∑i∈[p]e−2ℓilnp/L−∑i∈[p]e−4uilnp/U \lx@stackrelDefinitions of L,U≥ 1−∑i∈[p]e−2lnp−∑i∈[p]e−4lnp = 1−p⋅1p2−p⋅1p4≥1−2/p.

which shows that satisfies equation (6) with probability at least . On the other hand, we have for any , by [12, Theorem 2.2]. Let . Since , we have with probability . Combining all of the above, along with a union bound, we conclude the proof of the theorem.

Due to the fact that the total score function in the SNTV rule is linear, we obtain a fractional solution with in the first stage. Then by the same argument, the resulting committee satisfies that .

We now present our algorithmic results when there are only one-sided (upper or lower) fairness constraints. We first consider the case that all , i.e., each group is only required not to be over-represented. In this case, we call an algorithm -approximation () if the algorithm outputs a -approximate solution which violates all constraints by a factor of at most , i.e., holds for all .

###### Theorem 15

(Bi-criterion algorithm when ) Consider the constrained MS multiwinner voting problem satisfying for all . Suppose we have a feasible solution in advance. For any constant , the following claims hold

1. There exists a -approximation algorithm that runs in time.

2. For any constant , suppose that and . There exists a polynomial-time -approximation algorithm.

Proof Sketch. Define to be the collection of committees that have at most candidates and satisfy all fairness constraints. For the first claim, observe that is a -extendible system 666A pair , where is a finite set and is a collection of subsets of , is called a -extendible system if 1) (downclosed) and imply ; 2) (-extendible) For any and any element , if and , then there must exist a subset with such that .. Since is monotone submodular, we reduce the problem of finding to the monotone submodular maximization problem with -extendible system, which has a -approximation algorithm in time by [19, Theorem 1]. Therefore, we can compute a committee with and in time using Theorem 1 from [19]. Since , we have that and satisfies all fairness constraints. By the condition of the theorem, there exists a feasible solution in advance, i.e., and satisfies all fairness constraints. Then from , we arbitrarily select a set of candidates. Since both and satisfy all fairness constraints, we have for all ,

 |(S1∪S2)∩Pi|≤|S1∩Pi|+|S2∩Pi|≤ui+ui=2ui.

By monotonicity, we have . Therefore, we conclude that the committee is a -approximation solution.

For the second claim, we compute a fractional solution constrained to defined as follows: . Then we round it to a committee using the swap randomized rounding procedure in [12]. By [12, Theorem 5.2], can be guaranteed to be in , and . Then by the same argument as in the first claim, we select more candidates from and obtain a