Abstract
A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across subpopulations. We propose a desirable property for these procedures, slackconsistency: For any individual, the predictions of the model should be monotonic with respect to allowed slack (i.e., maximum allowed groupparity violation). Such monotonicity can be useful for individuals to understand the impact of enforcing fairness on their predictions. Surprisingly, we find that standard ML methods for enforcing fairness violate this basic property. Moreover, this undesirable behavior arises in situations agnostic to the complexity of the underlying model or approximate optimizations, suggesting that the simple act of incorporating a constraint can lead to drastically unintended behavior in ML. We present a simple theoretical method for enforcing slackconsistency, while encouraging further discussions on the unintended behaviors potentially induced when enforcing groupbased parity.
Nachum and Jiang
Groupbased Fair Learning Leads to Counterintuitive Predictions
Ofir Nachum &Heinrich Jiang
Google Research &Google Research
1 Introduction
Algorithmic fairness in machine learning (ML) has recently become an important concern. Without appropriate intervention during the preprocessing, training, or inference stages of an ML procedure, the resulting model can be biased against certain groups [2, 14, 15]. Accordingly, enforcing groupbased fairness notions such as demographic parity [8] and equal opportunity [14] is a difficult and active research problem.
One common approach to enforcing fairness is postprocessing. In postprocessing, one first learns a score function without concerns for fairness, and then decision thresholds are chosen for each group to ensure that various notions of group parity are satisfied [7, 11, 14]. Other than postprocessing, constrained optimization provides another common approach, in which enforcing group parity is framed as a constraint on a minimum loss objective. The resulting constrained optimization problem is then solved through the use of Lagrange multipliers [22, 12, 9, 5, 6, 1, 16].
In practice, a model is usually not required to perfectly satisfy the fairness constraint, but rather some amount of slack is allowed to tradeoff an allowable degree of bias with better accuracy [22]. In this paper, we explore the behavior of standard machine learning fairness procedures as this amount of allowable slack changes. We propose a desirable property, slackconsistency, which expects that fair learning procedures should give predictions that are monotonic in the amount of slack allowed. Such slackconsistency is intuitive: one would expect that for any given individual, there would be a prediction under no slack (i.e. perfectly satisfy fairness constraint) and a prediction under infinite slack (i.e. unconstrained), and that for any slack in between, the predictions would change monotonically between these two extremes. Moreover, given this behavior, we can move towards explainable ML fairness, where individuals can understand the impact of their predictions based on the amount of enforced fairness, thus making the process more transparent.
We show that, surprisingly, popular groupbased fairness methods fail to satisfy slackconsistency in both realworld and simple synthetic datasets, and this failure arises in situations agnostic to the complexity of the underlying model. Even for the postprocessing method, which may be implemented agnostic to optimization errors (using an exhaustive search over thresholds), slackinconsistency can arise in a variety of settings. Our findings thus show that the consequences of imposing groupbased fairness notions on ML models are poorly understood, and these consequences are often at odds with intuitive beliefs of what fairness should encourage an ML model to do. We propose a simple ML fairness method which provably possesses slackconsistency but at the same time encourage further discussions on the utility of imposing groupbased fairness notions in general.
2 Background
We consider a fair machine learning setting, in which individuals correspond to pairs where is a vector of features associated with the individual and is an additional feature corresponding to group membership. At times we will write as a function which returns the group membership of . For simplicity, we assume two groups; i.e, . We are given some dataset where is an observed label. For simplicity, we consider the binary classification setting; i.e, .
In standard machine learning, one is tasked with finding some (potentially stochastic) classifier within some family which minimizes a loss function on the dataset (e.g., misclassification rate). When groupbased fairness notions are imposed, the task is modified to finding an optimal loss classifier within the subset of unbiased functions of ; i.e., , where measures the bias of on .
Many works in the literature express the bias function in terms of some disparity of the predictions of between the two groups. For example, demographic parity [8] measures the unfairness of as the difference in positive prediction rates:
Demographic parity, although simple, has been criticized for unnecessarily encouraging poor classifiers in pursuit of fairness [14, 8, 17]. Accordingly, some works propose to define bias in terms of equal opportunity [14], which measures disparity in true positive prediction rates:
Unlike demographic parity, the true classifier always satisfies equal opportunity.
Additional variants of these constraints exist in the literature. For example, disparate impact [10] which enforces demographic parity while restricting the classifier to not use the protected attribute in its prediction . The notion of equal odds [14] augments equal opportunity to enforce both equal true positive rates and equal false positive rates. For simplicity, we will restrict our focus in this paper to the notions of demographic parity and equal opportunity, although our discussions and conclusions may easily extend to these more complicated notions of fairness.
3 SlackConsistency
In many instances of fair machine learning, a model is not enforced to be perfectly fair. Rather, it is enforced to be fair within some slack; i.e., it is allowed to have some bias, but that bias must be bounded or bounded absolutely for some . The reasons for this are twofold: First, a perfectly fair, zerobias classifier may not be feasible (or desirable) in practice. Second, the initial motivation for learning a fair classifier is often expressed in terms of some allowed amount of bias. For example, legal definitions of fairness often invoke the rule (commonly, the rule): The ratio between positively predicted individuals in one group versus another should not exceed [22].
Since any ML fairness optimization is affected by the allowed slack and this slack is specified by the problem formulation, it may be important to understand the precise ways in which a choice of affects the final classifier. We propose the following property:
Definition 1 (Slackconsistency).
A procedure for learning fair classifiers with respect to some amount of allowed slack and a fixed dataset is slackconsistent if, for any individual, the predictions of the learned classifier for this individual are monotonic with respect to .
More precisely, let us denote the procedure for learning fair classifiers as ; i.e., the result of running is a model that takes in features and group attribute of some individual and returns a probability of classifying this individual positively. Then, is slackconsistent if the predictions of any individual are monotonic with respect to . That is, for , we must have either or .
We argue that slackconsistency is a reasonable, intuitive, and desirable property for machine learning methods. For example, consider an individual from a disadvantaged group who would be positively labeled with no fairness enforcement (unbounded slack ). Slackconsistency ensures that when fairness is enforced (), the same individual should not be assigned a negative prediction (assuming a positive prediction at maximal favoring of the group ). Otherwise, we would be unfairly disadvantaging the individual in the process of attempting to undo a disadvantage for the individual’s group. The slackinconsistency of a model in this case may be interpreted as an implementation of a selffulfilling prophecy (see [8]; “Catalog of Evils”); i.e., a vendor maliciously chooses the ‘wrong’ members of a protected group to predict positively, ensuring that a future analysis will find that membership in the protected group is associated with less likelihood of positive outcomes. In the converse setting, an individual from the advantaged group who would be negatively labeled with no fairness enforcement should not be positively labeled when fairness is enforced.
The rest of the paper is organized as follows. In Section 4 we will investigate the behavior of popular methods for learning fair ML models in terms of slackconsistency, starting with constrained optimization and then focusing on postprocessing. Surprisingly, we will find that these methods fail to satisfy slackconsistency in almost all settings. Then, in Section 5 we present a simple theoretical method that satisfies slackconsistency given a Bayesoptimal score function and randomized classifiers, although we concede that practical scenarios often do not permit such classifiers.
4 Popular Methods Fail to Satisfy SlackConsistency
For simplicity, we consider the absolutely bounded bias setting, in which the bias of is restricted to for , although our findings can be generalized. We will show that the two common methods for learning fair classifiers, constrained optimization via the method of Lagrange multipliers and postprocessing via exhaustive threshold search, often fail to satisfy slack consistency.
For our experiments on nonsynthetic data, we employ the following datasets:

[leftmargin=*,noitemsep,topsep=0pt,parsep=2pt,partopsep=0pt]

ProPublica’s COMPAS recidivism data ( examples). The task is to predict recidivism based on criminal history, jail and prison time, demographics, and risk scores. We preprocess this dataset in a similar way as the Adult dataset and use two genderbased protected groups.
4.1 Constrained Optimization
In the constrained optimization approach, we have a loss function and a fairness constraint over parameter space . The Lagrangian is where and the goal is to find a solution to . In fairness problems, the loss function is taken to be the usual loss function for a model (e.g. hinge loss) and the fairness constraints (possibly with slack) are typically relaxed so that they are differentiable (e.g. hinge relaxation) and we can alternatively apply SGD to minimize in and maximize in until convergence. This is the approach a number of works adopt [22, 9, 12, 5].
In general settings where is the loss of nonconvex model such as a neural network, it may not be surprising that properties such as slackconsistency could fail to hold. In the nonconvex setting, as pointed out in recent works such as [1, 5], a saddle point to the Lagrangian may not even exist and thus models may have nothing to converge to. Moreover, even with convergence, there can be multiple solutions with different accuracy and fairness violations which nonetheless attain the same Lagrangian value. Similar behavior can happen with multiple fairness constraints which are in conflict, such as is known for equalized odds due to feasibility issues [3, 21, 17].
In this section, we consider the simplest of cases, where we use a linear model and a single fairness constraint (demographic parity or equal opportunity) on just two protected groups. In this case, an optimal saddle point to the Lagrangian exists and joint SGD is guaranteed to converge to it [5]. Despite these restrictions, we find that constrained optimization can still fail to satisfy slackconsistency. We illustrate this in Figure 1 on a number of benchmark fairness datasets: Adult, COMPAS, and Communities and Crime. We train a linear model subject to hinge relaxations of the fairness constraints and jointly train the Lagrangian using the ADAM optimizer under default settings with minibatch size of for epochs. We then sort the solutions by the actual fairness violations in training (rather than the violations on the hinge relaxation) to account for the variability between the original and relaxed constraints.
Figure 1 gives us an unsettling realization. It shows that the predictions for each protected group do not satisfy slackconsistency on average, which means that not only are there individuals whose predictions do not satisfy slackconsistency, but also that this property is not even maintained at the group level. It is also worth noting that the constrained optimization approach presents another counterintuitive sideeffect (also seen in Figure 1), in that the average soft prediction can increase (resp. decrease) while the average thresholded hard prediction decreases (resp. increases).
Overall, we find that the counterintuitive side effects associated with slackinconsistency can easily arise for constrained optimization, even in the simple case of a linear model.
4.2 PostProcessing
The postprocessing method [14] is perhaps one of the simplest approaches to fair classification. It starts with a score function which maps individuals to a continuous value and then selects thresholds for each protected group so that the resulting binary classifier from these thresholds has minimum cost while satisfying the fairness constraints. We provide a pseudocode of a slackenabled version of postprocessing in Algorithm 1. Note that we utilize an exhaustive search to find the optimal thresholds. Thus, unlike in the constrained optimization setting, our results are agnostic to any approximations in the optimization.
In general, the postprocessing method may require randomized thresholds (equivalent to the quantiles used by [20]). For our discussions, we will express this through the use of normalized thresholds. A normalized threshold corresponds to a distribution over at most two (adjacent) thresholds which achieves a positive prediction rate of . Although normalized thresholds are required in general, we note that their use is not ideal. In practical scenarios, a stochastic classifier can be seen as either capricious (if only one random classification is allowed per individual) or exploitable (if multiple random classifications are allowed). We will write the loss and bias as functions of these normalized thresholds. We note that slackconsistency of postprocessing is equivalent to monotonicity of the chosen normalized thresholds with respect to .
We present several counterexamples, which show that postprocessing does not yield slackconsistency. We begin by considering the application of postprocessing to a score function which is not Bayesoptimal. This scenario is typical in practice, where the score function is usually some learned function (e.g., a neural network or a decision tree ensemble). Theorem 1 provides an example of a dataset and such a score function for which postprocessing yields slackinconsistent solutions.
Theorem 1 (Slackinconsistency of postprocessing on nonBayesoptimal score function.).
There exists a distribution and score function such that the postprocessing method fails to satisfy slackconsistency.
Proof.
Consider the distribution partitioned into two protected groups, and each occurring with equal proportion and let our score function be and suppose that we are in the binary classification setting with the goal of demographic parity. Let the distribution for be as follows:

[leftmargin=*,noitemsep,topsep=0pt,parsep=2pt,partopsep=0pt]

of the points have uniformly distributed in and label with probability and label otherwise,

of the points have uniformly distributed in and label ,

of the points have uniformly distributed in and label .
Let the distribution of be as follows:

[leftmargin=*,noitemsep,topsep=0pt,parsep=2pt,partopsep=0pt]

of the points have uniformly distributed in and label ,

of the points have uniformly distributed in and label ,

of the points have uniformly distributed in and label ,

of the points have uniformly distributed in and label .
We plot the misclassification rate with respect to chosen threshold for each group in Figure 2 (left). Note that at any threshold , the classifier achieves positive prediction rate of on either group. For strict fairness constraints (), the ideal thresholds are thus . If we choose to increase the allowed slack by some small amount to , the ideal threshold for the first group will decrease, since the misclassification error has a positive derivative for the first group at . However, for a large enough slack, the ideal threshold for the first group will be at the global minimum, . Therefore, postprocessing applied to this example yields slackinconsistent solutions. ∎
In the previous theorem’s counterexample, the scorefunction was not Bayesoptimal. We next consider a scenario for which we have a Bayesoptimal classifier but are not allowed to employ stochastic classifiers. Indeed, stochastic classifiers are often undesirable in practice, since they can be seen as capricious (why should a random number determine my loan eligibility?) or exploitable (if I get denied a loan on my first try, I will apply again until I get accepted). We show that postprocessing in this scenario fails to satisfy slackconsistency.
Theorem 2 (Slackinconsistency of postprocessing on Bayesoptimal score function with deterministic thresholds.).
There exists a distribution where the score function is Bayesoptimal for each protected group and the postprocessing method fails to satisfy slackconsistency.
Proof.
See Figure 2, right three images. ∎
We conclude our counterexamples for postprocessing with Figure 3, which shows that on real datasets, the postprocessing method fails to be slackconsistent.
5 PostProcessing With Gabos Functions
While the previous section showed that postprocessing yields slackinconsistent solutions when one lacks either a Bayesoptimal classifier or access to stochastic thresholds, in this section we show that postprocessing is guaranteed to be slackconsistent given these two assumptions.
We begin by defining a Bayesoptimal score function. Notably, the function must be Bayesoptimal with respect to both features and group membership attribute (rather than with respect to only the features of the individual, which is how ML models are typically trained).
Definition 2 (Groupaware Bayesoptimal score (GABOS) functions).
We say that a score function is groupaware Bayesoptimal with respect to if its output is the empirical probability of individual being labelled positively; i.e., .
Theorem 3 (Consistency of postprocessing on GABOS functions).
Suppose is a GABOS function with respect to , and that the loss measures misclassification error of a thresholding of with respect to . Furthermore, suppose the bias measures demographic parity or equal opportunity. Then, the postprocessing method (Algorithm 1) applied to yields slackconsistent solutions.
Proof.
The setting of the theorem allows us to split into functions of the two separate groups:
(1)  
(2) 
We note that in the considered setting, the functions are convex with respect to and the functions are monotonically decreasing. We will find it useful to consider these functions in terms of the biases induced by . That is, we express the loss and bias functions as,
(3)  
(4) 
where we compute as . We note that in the considered setting, the range of is . Furthermore, the functions maintain their convexity with respect to in the case of demographic parity and equal opportunity with a Bayesoptimal score function (see the appendix for a short proof). We denote the left and right subdifferentials of by,
(5)  
(6) 
Consider a solution returned by postprocessing with . Let . Without loss of generality, we may assume ; i.e., . By the optimality of we have,
(7) 
Otherwise, we would be able to achieve lower bias and lower loss simultaneously. By the same logic we have,
(8) 
Now consider an analogous solution for such that with bias values . We will show that and through a sequence of three claims:
Claim 1: . Proof: Suppose otherwise; i.e., . Then we must have at least one of or (otherwise we contradict ). Suppose, first, that . By the convexity of , we have,
(9) 
This means that we may increase to simultaneously lower the loss and bias; contradiction. The same logic for the case of leads to an analogous contradiction.
Claim 2: . Proof: Suppose otherwise; i.e., . Let (see Figure 4 for a pictoral presentation). Since is convex we have,
(10) 
Note that implies that . Combining this with the fact that we have,
(11) 
Furthermore we have,
(12) 
and,
(13) 
implying that . Thus by the convexity of , we have . Recalling that , we have,
(14) 
Combining equations 10, 11, 14 we have,
(15) 
In conjunction with Equation 8, this means that is a feasible solution for with lower loss; contradiction.
Claim 3: . Proof: Analogous to the Claim 2.
These claims show that the biases of the optimal solution are monotonic in . Accordingly, the thresholds are monotonic as well, which implies that the solutions are slackconsistent, as desired. ∎
Given the previous theoretical result, we propose to learn a fair ML classifier by first learning a GABOS function and then applying postprocessing. This procedure is summarized in Algorithm 2. Learning a suitable GABOS function may be performed in an unsupervised manner (e.g., using clustering) or in a supervised manner (e.g., using decision trees to minimize loss). We have the following result, which ensures that Algorithm 2 yields slackconsistent solutions:
Theorem 4 (Consistency of Algorithm 2).
Performing GABOS learning with postprocessing yields slackconsistent solutions.
Proof.
Algorithm 2 essentially performs postprocessing with respect to a GABOS function on a simplified dataset . This yields slackconsistent solutions with respect to the reduced features . Now consider some arbitrary (not necessarily in the original training set). This individual will be mapped to a single partition regardless of slack. Therefore, its predictions will be monotonic with slack. ∎
Although we possess a solution to slackconsistency, we stress that this solution is not ideal for many practical scenarios for two reasons. First, the use of stochastic thresholds is undesirable, since, as mentioned before, they may be seen as either capricious or exploitable. Second, access to the group membership of an individual is often not available during inference (e.g., in web applications).
6 Discussion
Our work suggests that there is much to explore to make ML fairness methods more transparent. Conventional wisdom in machine learning suggests that lack of transparency and explainability arises from the use of complicated models. However, our work shows that in the context of ML fairness, even with the simplest underlying models and the most straightforward training procedures, introducing a single fairness constraint can have significant consequences on the understandability of the model. We may compare and contrast this to the phenomenon of adversarial examples in neural networks [13]. In the adversarial example setting, the complexity of the model leads to drastically unintended behavior. In our setting, it is the introduction of simple groupparity constraints which leads to counterintuitive behavior. We encourage researchers and practitioners to be wary of such complexity that may be introduced through seemingly simple augmentations to their models or training procedures.
Our findings also uncover a stark disconnect between groupbased fairness metrics and intuitive notions of fairness. Previous works have noted the disconnects between groupparity and individual notions of fairness [8] as well as between groupfair classifiers and future impact of decisions of those classifiers [19]. Our work provides further evidence of this disconnect through the notion of slackconsistency. Notably, our counterexamples show that standard methods for ML fairness violate slackconsistency for both individuals and groups as a whole on average, even when these individuals and groups come from the same data that the model was trained on. Thus, we encourage researchers to reassess the utility of using groupparity as a proxy for fairness.
To conclude, we reaffirm that slackconsistency is a generally desirable behavior and can protect an ML model from a wide range of unreasonable behaviors. As argued previously, it is natural to expect that, for any individual, there would be a prediction under no slack (i.e. perfectly satisfy fairness constraint) and a prediction under infinite slack (i.e. unconstrained), and that for any slack in between, the predictions would change monotonically between these two extremes. This way, an individual would under no circumstances be unfairly treated for the benefit of the group. Slackconsistency can also encourage predictions to be more robust: in practice, models have to be possibly frequently retrained, and slack consistency can ensure that small or even no changes in the fairness requirements would not lead to unreasonable changes in individual predictions. For these reasons, researchers and practitioners may find it beneficial to enforce slackconsistency itself in order to better guarantee a classifier’s behavior.
References
 [1] (2018) A reductions approach to fair classification. arXiv preprint arXiv:1803.02453. Cited by: §1, §4.1.
 [2] (201605) Machine bias. Note: https://www.propublica.org/article/machinebiasriskassessmentsin/criminalsentencing(Accessed on 07/18/2018) Cited by: §1.
 [3] (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5 (2), pp. 153–163. Cited by: §4.1.
 [4] (2018) Training wellgeneralizing classifiers for fairness metrics and other datadependent constraints. arXiv preprint arXiv:1807.00028. Cited by: 2nd item.
 [5] (2019) Twoplayer games for efficient nonconvex constrained optimization. International Conference on Algorithmic Learning Theory (ALT). Cited by: §1, §4.1, §4.1, §4.1.
 [6] (2018) Optimization with nondifferentiable constraints with applications to fairness, recall, churn, and other goals. arXiv preprint arXiv:1809.04198. Cited by: §1.
 [7] (2012) Information effect of entry into credit ratings market: the case of insurers’ ratings. Journal of Financial Economics 106 (2), pp. 308–330. Cited by: §1.
 [8] (2012) Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214–226. Cited by: §1, §2, §3, §6.
 [9] (2017) Scalable learning of nondecomposable objectives. In Artificial Intelligence and Statistics, pp. 832–840. Cited by: §1, §4.1.
 [10] (2015) Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268. Cited by: §2.
 [11] (2015) Computational fairness: preventing machinelearned discrimination. Cited by: §1.
 [12] (2016) Satisfying realworld goals with dataset constraints. In Advances in Neural Information Processing Systems, pp. 2415–2423. Cited by: §1, 1st item, §4.1.
 [13] (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §6.
 [14] (2016) Equality of opportunity in supervised learning. In Advances in neural information processing systems, pp. 3315–3323. Cited by: §1, §1, §2, §2, §4.2.
 [15] (2019) Identifying and correcting label bias in machine learning. arXiv preprint arXiv:1901.04966. Cited by: §1.
 [16] (2017) Preventing fairness gerrymandering: auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144. Cited by: §1.
 [17] (2016) Inherent tradeoffs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807. Cited by: §2, §4.1.
 [18] (2013) UCI machine learning repository. Irvine, CA. Cited by: 1st item, 2nd item.
 [19] (2018) Delayed impact of fair machine learning. arXiv preprint arXiv:1803.04383. Cited by: §6.
 [20] (2018) Proper likelihood ratio based roc curves for general binary classification problems. arXiv preprint arXiv:1809.00694. Cited by: Appendix A, §4.2.
 [21] (2017) Learning nondiscriminatory predictors. In Conference on Learning Theory, pp. 1920–1953. Cited by: §4.1.
 [22] (2015) Fairness constraints: mechanisms for fair classification. arXiv preprint arXiv:1507.05259. Cited by: §1, §1, §3, 1st item, §4.1.
Appendix A Supporting Theoretical Results
Theorem 5 (Convexity of misclassification error).
Considering all possible normalized thresholds of a Bayesoptimal score function, the misclassification error is a convex function of the truepositive rate (equal opportunity). The misclassification error is also convex with respect to positive prediction rate (demographic parity).
Proof.
We first note that the ROC of a Bayesoptimal score function is a concave onetoone function [20]. That is, the true positive rate is a concave onetoone function with respect to the false positive rate , and the false positive rate is a convex onetoone function with respect to the true positive rate .
The misclassification error of a thresholding may be expressed as , where are the proportions of positive and negative labelled points in the dataset, respectively. Since the first term of this expression is linear and the second term convex with respect to , we conclude that the misclassification error is a convex function of , as desired.
To characterize the misclassification with respect to positive prediction rate , we note that . Since the first term of this expression for is concave onetoone and the second term linear increasing with respect to , we deduce that is a concave onetoone function with respect to ; equivalently, is a convex onetoone function with respect to . The misclassification error may be expressed as , which is the sum of a constant, linear, and convex function with respect to . Thus, we conclude that the misclassification error is a convex function of , as desired. ∎