A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations. We propose a desirable property for these procedures, slack-consistency: For any individual, the predictions of the model should be monotonic with respect to allowed slack (i.e., maximum allowed group-parity violation). Such monotonicity can be useful for individuals to understand the impact of enforcing fairness on their predictions. Surprisingly, we find that standard ML methods for enforcing fairness violate this basic property. Moreover, this undesirable behavior arises in situations agnostic to the complexity of the underlying model or approximate optimizations, suggesting that the simple act of incorporating a constraint can lead to drastically unintended behavior in ML. We present a simple theoretical method for enforcing slack-consistency, while encouraging further discussions on the unintended behaviors potentially induced when enforcing group-based parity.
Nachum and Jiang
Group-based Fair Learning Leads to Counter-intuitive Predictions
Ofir Nachum &Heinrich Jiang
Google Research &Google Research
Algorithmic fairness in machine learning (ML) has recently become an important concern. Without appropriate intervention during the pre-processing, training, or inference stages of an ML procedure, the resulting model can be biased against certain groups [2, 14, 15]. Accordingly, enforcing group-based fairness notions such as demographic parity  and equal opportunity  is a difficult and active research problem.
One common approach to enforcing fairness is post-processing. In post-processing, one first learns a score function without concerns for fairness, and then decision thresholds are chosen for each group to ensure that various notions of group parity are satisfied [7, 11, 14]. Other than post-processing, constrained optimization provides another common approach, in which enforcing group parity is framed as a constraint on a minimum loss objective. The resulting constrained optimization problem is then solved through the use of Lagrange multipliers [22, 12, 9, 5, 6, 1, 16].
In practice, a model is usually not required to perfectly satisfy the fairness constraint, but rather some amount of slack is allowed to trade-off an allowable degree of bias with better accuracy . In this paper, we explore the behavior of standard machine learning fairness procedures as this amount of allowable slack changes. We propose a desirable property, slack-consistency, which expects that fair learning procedures should give predictions that are monotonic in the amount of slack allowed. Such slack-consistency is intuitive: one would expect that for any given individual, there would be a prediction under no slack (i.e. perfectly satisfy fairness constraint) and a prediction under infinite slack (i.e. unconstrained), and that for any slack in between, the predictions would change monotonically between these two extremes. Moreover, given this behavior, we can move towards explainable ML fairness, where individuals can understand the impact of their predictions based on the amount of enforced fairness, thus making the process more transparent.
We show that, surprisingly, popular group-based fairness methods fail to satisfy slack-consistency in both real-world and simple synthetic datasets, and this failure arises in situations agnostic to the complexity of the underlying model. Even for the post-processing method, which may be implemented agnostic to optimization errors (using an exhaustive search over thresholds), slack-inconsistency can arise in a variety of settings. Our findings thus show that the consequences of imposing group-based fairness notions on ML models are poorly understood, and these consequences are often at odds with intuitive beliefs of what fairness should encourage an ML model to do. We propose a simple ML fairness method which provably possesses slack-consistency but at the same time encourage further discussions on the utility of imposing group-based fairness notions in general.
We consider a fair machine learning setting, in which individuals correspond to pairs where is a vector of features associated with the individual and is an additional feature corresponding to group membership. At times we will write as a function which returns the group membership of . For simplicity, we assume two groups; i.e, . We are given some dataset where is an observed label. For simplicity, we consider the binary classification setting; i.e, .
In standard machine learning, one is tasked with finding some (potentially stochastic) classifier within some family which minimizes a loss function on the dataset (e.g., mis-classification rate). When group-based fairness notions are imposed, the task is modified to finding an optimal loss classifier within the subset of unbiased functions of ; i.e., , where measures the bias of on .
Many works in the literature express the bias function in terms of some disparity of the predictions of between the two groups. For example, demographic parity  measures the unfairness of as the difference in positive prediction rates:
Demographic parity, although simple, has been criticized for unnecessarily encouraging poor classifiers in pursuit of fairness [14, 8, 17]. Accordingly, some works propose to define bias in terms of equal opportunity , which measures disparity in true positive prediction rates:
Unlike demographic parity, the true classifier always satisfies equal opportunity.
Additional variants of these constraints exist in the literature. For example, disparate impact  which enforces demographic parity while restricting the classifier to not use the protected attribute in its prediction . The notion of equal odds  augments equal opportunity to enforce both equal true positive rates and equal false positive rates. For simplicity, we will restrict our focus in this paper to the notions of demographic parity and equal opportunity, although our discussions and conclusions may easily extend to these more complicated notions of fairness.
In many instances of fair machine learning, a model is not enforced to be perfectly fair. Rather, it is enforced to be fair within some slack; i.e., it is allowed to have some bias, but that bias must be bounded or bounded absolutely for some . The reasons for this are two-fold: First, a perfectly fair, zero-bias classifier may not be feasible (or desirable) in practice. Second, the initial motivation for learning a fair classifier is often expressed in terms of some allowed amount of bias. For example, legal definitions of fairness often invoke the -rule (commonly, the -rule): The ratio between positively predicted individuals in one group versus another should not exceed .
Since any ML fairness optimization is affected by the allowed slack and this slack is specified by the problem formulation, it may be important to understand the precise ways in which a choice of affects the final classifier. We propose the following property:
Definition 1 (Slack-consistency).
A procedure for learning fair classifiers with respect to some amount of allowed slack and a fixed dataset is slack-consistent if, for any individual, the predictions of the learned classifier for this individual are monotonic with respect to .
More precisely, let us denote the procedure for learning fair classifiers as ; i.e., the result of running is a model that takes in features and group attribute of some individual and returns a probability of classifying this individual positively. Then, is slack-consistent if the predictions of any individual are monotonic with respect to . That is, for , we must have either or .
We argue that slack-consistency is a reasonable, intuitive, and desirable property for machine learning methods. For example, consider an individual from a disadvantaged group who would be positively labeled with no fairness enforcement (unbounded slack ). Slack-consistency ensures that when fairness is enforced (), the same individual should not be assigned a negative prediction (assuming a positive prediction at maximal favoring of the group ). Otherwise, we would be unfairly disadvantaging the individual in the process of attempting to undo a disadvantage for the individual’s group. The slack-inconsistency of a model in this case may be interpreted as an implementation of a self-fulfilling prophecy (see ; “Catalog of Evils”); i.e., a vendor maliciously chooses the ‘wrong’ members of a protected group to predict positively, ensuring that a future analysis will find that membership in the protected group is associated with less likelihood of positive outcomes. In the converse setting, an individual from the advantaged group who would be negatively labeled with no fairness enforcement should not be positively labeled when fairness is enforced.
The rest of the paper is organized as follows. In Section 4 we will investigate the behavior of popular methods for learning fair ML models in terms of slack-consistency, starting with constrained optimization and then focusing on post-processing. Surprisingly, we will find that these methods fail to satisfy slack-consistency in almost all settings. Then, in Section 5 we present a simple theoretical method that satisfies slack-consistency given a Bayes-optimal score function and randomized classifiers, although we concede that practical scenarios often do not permit such classifiers.
4 Popular Methods Fail to Satisfy Slack-Consistency
For simplicity, we consider the absolutely bounded bias setting, in which the bias of is restricted to for , although our findings can be generalized. We will show that the two common methods for learning fair classifiers, constrained optimization via the method of Lagrange multipliers and post-processing via exhaustive threshold search, often fail to satisfy slack consistency.
For our experiments on non-synthetic data, we employ the following datasets:
ProPublica’s COMPAS recidivism data ( examples). The task is to predict recidivism based on criminal history, jail and prison time, demographics, and risk scores. We preprocess this dataset in a similar way as the Adult dataset and use two gender-based protected groups.
4.1 Constrained Optimization
In the constrained optimization approach, we have a loss function and a fairness constraint over parameter space . The Lagrangian is where and the goal is to find a solution to . In fairness problems, the loss function is taken to be the usual loss function for a model (e.g. hinge loss) and the fairness constraints (possibly with slack) are typically relaxed so that they are differentiable (e.g. hinge relaxation) and we can alternatively apply SGD to minimize in and maximize in until convergence. This is the approach a number of works adopt [22, 9, 12, 5].
In general settings where is the loss of non-convex model such as a neural network, it may not be surprising that properties such as slack-consistency could fail to hold. In the non-convex setting, as pointed out in recent works such as [1, 5], a saddle point to the Lagrangian may not even exist and thus models may have nothing to converge to. Moreover, even with convergence, there can be multiple solutions with different accuracy and fairness violations which nonetheless attain the same Lagrangian value. Similar behavior can happen with multiple fairness constraints which are in conflict, such as is known for equalized odds due to feasibility issues [3, 21, 17].
In this section, we consider the simplest of cases, where we use a linear model and a single fairness constraint (demographic parity or equal opportunity) on just two protected groups. In this case, an optimal saddle point to the Lagrangian exists and joint SGD is guaranteed to converge to it . Despite these restrictions, we find that constrained optimization can still fail to satisfy slack-consistency. We illustrate this in Figure 1 on a number of benchmark fairness datasets: Adult, COMPAS, and Communities and Crime. We train a linear model subject to hinge relaxations of the fairness constraints and jointly train the Lagrangian using the ADAM optimizer under default settings with minibatch size of for epochs. We then sort the solutions by the actual fairness violations in training (rather than the violations on the hinge relaxation) to account for the variability between the original and relaxed constraints.
Figure 1 gives us an unsettling realization. It shows that the predictions for each protected group do not satisfy slack-consistency on average, which means that not only are there individuals whose predictions do not satisfy slack-consistency, but also that this property is not even maintained at the group level. It is also worth noting that the constrained optimization approach presents another counter-intuitive side-effect (also seen in Figure 1), in that the average soft prediction can increase (resp. decrease) while the average thresholded hard prediction decreases (resp. increases).
Overall, we find that the counter-intuitive side effects associated with slack-inconsistency can easily arise for constrained optimization, even in the simple case of a linear model.
The post-processing method  is perhaps one of the simplest approaches to fair classification. It starts with a score function which maps individuals to a continuous value and then selects thresholds for each protected group so that the resulting binary classifier from these thresholds has minimum cost while satisfying the fairness constraints. We provide a pseudocode of a slack-enabled version of post-processing in Algorithm 1. Note that we utilize an exhaustive search to find the optimal thresholds. Thus, unlike in the constrained optimization setting, our results are agnostic to any approximations in the optimization.
In general, the post-processing method may require randomized thresholds (equivalent to the quantiles used by ). For our discussions, we will express this through the use of normalized thresholds. A normalized threshold corresponds to a distribution over at most two (adjacent) thresholds which achieves a positive prediction rate of . Although normalized thresholds are required in general, we note that their use is not ideal. In practical scenarios, a stochastic classifier can be seen as either capricious (if only one random classification is allowed per individual) or exploitable (if multiple random classifications are allowed). We will write the loss and bias as functions of these normalized thresholds. We note that slack-consistency of post-processing is equivalent to monotonicity of the chosen normalized thresholds with respect to .
We present several counterexamples, which show that post-processing does not yield slack-consistency. We begin by considering the application of post-processing to a score function which is not Bayes-optimal. This scenario is typical in practice, where the score function is usually some learned function (e.g., a neural network or a decision tree ensemble). Theorem 1 provides an example of a dataset and such a score function for which post-processing yields slack-inconsistent solutions.
Theorem 1 (Slack-inconsistency of post-processing on non-Bayes-optimal score function.).
There exists a distribution and score function such that the post-processing method fails to satisfy slack-consistency.
Consider the distribution partitioned into two protected groups, and each occurring with equal proportion and let our score function be and suppose that we are in the binary classification setting with the goal of demographic parity. Let the distribution for be as follows:
of the points have uniformly distributed in and label with probability and label otherwise,
of the points have uniformly distributed in and label ,
of the points have uniformly distributed in and label .
Let the distribution of be as follows:
of the points have uniformly distributed in and label ,
of the points have uniformly distributed in and label ,
of the points have uniformly distributed in and label ,
of the points have uniformly distributed in and label .
We plot the misclassification rate with respect to chosen threshold for each group in Figure 2 (left). Note that at any threshold , the classifier achieves positive prediction rate of on either group. For strict fairness constraints (), the ideal thresholds are thus . If we choose to increase the allowed slack by some small amount to , the ideal threshold for the first group will decrease, since the misclassification error has a positive derivative for the first group at . However, for a large enough slack, the ideal threshold for the first group will be at the global minimum, . Therefore, post-processing applied to this example yields slack-inconsistent solutions. ∎
In the previous theorem’s counter-example, the score-function was not Bayes-optimal. We next consider a scenario for which we have a Bayes-optimal classifier but are not allowed to employ stochastic classifiers. Indeed, stochastic classifiers are often undesirable in practice, since they can be seen as capricious (why should a random number determine my loan eligibility?) or exploitable (if I get denied a loan on my first try, I will apply again until I get accepted). We show that post-processing in this scenario fails to satisfy slack-consistency.
Theorem 2 (Slack-inconsistency of post-processing on Bayes-optimal score function with deterministic thresholds.).
There exists a distribution where the score function is Bayes-optimal for each protected group and the post-processing method fails to satisfy slack-consistency.
See Figure 2, right three images. ∎
We conclude our counterexamples for post-processing with Figure 3, which shows that on real datasets, the post-processing method fails to be slack-consistent.
5 Post-Processing With Gabos Functions
While the previous section showed that post-processing yields slack-inconsistent solutions when one lacks either a Bayes-optimal classifier or access to stochastic thresholds, in this section we show that post-processing is guaranteed to be slack-consistent given these two assumptions.
We begin by defining a Bayes-optimal score function. Notably, the function must be Bayes-optimal with respect to both features and group membership attribute (rather than with respect to only the features of the individual, which is how ML models are typically trained).
Definition 2 (Group-aware Bayes-optimal score (GABOS) functions).
We say that a score function is group-aware Bayes-optimal with respect to if its output is the empirical probability of individual being labelled positively; i.e., .
Theorem 3 (Consistency of post-processing on GABOS functions).
Suppose is a GABOS function with respect to , and that the loss measures mis-classification error of a thresholding of with respect to . Furthermore, suppose the bias measures demographic parity or equal opportunity. Then, the post-processing method (Algorithm 1) applied to yields slack-consistent solutions.
The setting of the theorem allows us to split into functions of the two separate groups:
We note that in the considered setting, the functions are convex with respect to and the functions are monotonically decreasing. We will find it useful to consider these functions in terms of the biases induced by . That is, we express the loss and bias functions as,
where we compute as . We note that in the considered setting, the range of is . Furthermore, the functions maintain their convexity with respect to in the case of demographic parity and equal opportunity with a Bayes-optimal score function (see the appendix for a short proof). We denote the left and right subdifferentials of by,
Consider a solution returned by post-processing with . Let . Without loss of generality, we may assume ; i.e., . By the optimality of we have,
Otherwise, we would be able to achieve lower bias and lower loss simultaneously. By the same logic we have,
Now consider an analogous solution for such that with bias values . We will show that and through a sequence of three claims:
Claim 1: . Proof: Suppose otherwise; i.e., . Then we must have at least one of or (otherwise we contradict ). Suppose, first, that . By the convexity of , we have,
This means that we may increase to simultaneously lower the loss and bias; contradiction. The same logic for the case of leads to an analogous contradiction.
Claim 2: . Proof: Suppose otherwise; i.e., . Let (see Figure 4 for a pictoral presentation). Since is convex we have,
Note that implies that . Combining this with the fact that we have,
Furthermore we have,
implying that . Thus by the convexity of , we have . Recalling that , we have,
In conjunction with Equation 8, this means that is a feasible solution for with lower loss; contradiction.
Claim 3: . Proof: Analogous to the Claim 2.
These claims show that the biases of the optimal solution are monotonic in . Accordingly, the thresholds are monotonic as well, which implies that the solutions are slack-consistent, as desired. ∎
Given the previous theoretical result, we propose to learn a fair ML classifier by first learning a GABOS function and then applying post-processing. This procedure is summarized in Algorithm 2. Learning a suitable GABOS function may be performed in an unsupervised manner (e.g., using clustering) or in a supervised manner (e.g., using decision trees to minimize loss). We have the following result, which ensures that Algorithm 2 yields slack-consistent solutions:
Theorem 4 (Consistency of Algorithm 2).
Performing GABOS learning with post-processing yields slack-consistent solutions.
Algorithm 2 essentially performs post-processing with respect to a GABOS function on a simplified dataset . This yields slack-consistent solutions with respect to the reduced features . Now consider some arbitrary (not necessarily in the original training set). This individual will be mapped to a single partition regardless of slack. Therefore, its predictions will be monotonic with slack. ∎
Although we possess a solution to slack-consistency, we stress that this solution is not ideal for many practical scenarios for two reasons. First, the use of stochastic thresholds is undesirable, since, as mentioned before, they may be seen as either capricious or exploitable. Second, access to the group membership of an individual is often not available during inference (e.g., in web applications).
Our work suggests that there is much to explore to make ML fairness methods more transparent. Conventional wisdom in machine learning suggests that lack of transparency and explainability arises from the use of complicated models. However, our work shows that in the context of ML fairness, even with the simplest underlying models and the most straightforward training procedures, introducing a single fairness constraint can have significant consequences on the understandability of the model. We may compare and contrast this to the phenomenon of adversarial examples in neural networks . In the adversarial example setting, the complexity of the model leads to drastically unintended behavior. In our setting, it is the introduction of simple group-parity constraints which leads to counter-intuitive behavior. We encourage researchers and practitioners to be wary of such complexity that may be introduced through seemingly simple augmentations to their models or training procedures.
Our findings also uncover a stark disconnect between group-based fairness metrics and intuitive notions of fairness. Previous works have noted the disconnects between group-parity and individual notions of fairness  as well as between group-fair classifiers and future impact of decisions of those classifiers . Our work provides further evidence of this disconnect through the notion of slack-consistency. Notably, our counter-examples show that standard methods for ML fairness violate slack-consistency for both individuals and groups as a whole on average, even when these individuals and groups come from the same data that the model was trained on. Thus, we encourage researchers to re-assess the utility of using group-parity as a proxy for fairness.
To conclude, we re-affirm that slack-consistency is a generally desirable behavior and can protect an ML model from a wide range of unreasonable behaviors. As argued previously, it is natural to expect that, for any individual, there would be a prediction under no slack (i.e. perfectly satisfy fairness constraint) and a prediction under infinite slack (i.e. unconstrained), and that for any slack in between, the predictions would change monotonically between these two extremes. This way, an individual would under no circumstances be unfairly treated for the benefit of the group. Slack-consistency can also encourage predictions to be more robust: in practice, models have to be possibly frequently retrained, and slack consistency can ensure that small or even no changes in the fairness requirements would not lead to unreasonable changes in individual predictions. For these reasons, researchers and practitioners may find it beneficial to enforce slack-consistency itself in order to better guarantee a classifier’s behavior.
-  (2018) A reductions approach to fair classification. arXiv preprint arXiv:1803.02453. Cited by: §1, §4.1.
-  (2016-05) Machine bias. Note: https://www.propublica.org/article/machine-bias-risk-assessments-in-/criminal-sentencing(Accessed on 07/18/2018) Cited by: §1.
-  (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5 (2), pp. 153–163. Cited by: §4.1.
-  (2018) Training well-generalizing classifiers for fairness metrics and other data-dependent constraints. arXiv preprint arXiv:1807.00028. Cited by: 2nd item.
-  (2019) Two-player games for efficient non-convex constrained optimization. International Conference on Algorithmic Learning Theory (ALT). Cited by: §1, §4.1, §4.1, §4.1.
-  (2018) Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals. arXiv preprint arXiv:1809.04198. Cited by: §1.
-  (2012) Information effect of entry into credit ratings market: the case of insurers’ ratings. Journal of Financial Economics 106 (2), pp. 308–330. Cited by: §1.
-  (2012) Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214–226. Cited by: §1, §2, §3, §6.
-  (2017) Scalable learning of non-decomposable objectives. In Artificial Intelligence and Statistics, pp. 832–840. Cited by: §1, §4.1.
-  (2015) Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–268. Cited by: §2.
-  (2015) Computational fairness: preventing machine-learned discrimination. Cited by: §1.
-  (2016) Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems, pp. 2415–2423. Cited by: §1, 1st item, §4.1.
-  (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §6.
-  (2016) Equality of opportunity in supervised learning. In Advances in neural information processing systems, pp. 3315–3323. Cited by: §1, §1, §2, §2, §4.2.
-  (2019) Identifying and correcting label bias in machine learning. arXiv preprint arXiv:1901.04966. Cited by: §1.
-  (2017) Preventing fairness gerrymandering: auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144. Cited by: §1.
-  (2016) Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807. Cited by: §2, §4.1.
-  (2013) UCI machine learning repository. Irvine, CA. Cited by: 1st item, 2nd item.
-  (2018) Delayed impact of fair machine learning. arXiv preprint arXiv:1803.04383. Cited by: §6.
-  (2018) Proper likelihood ratio based roc curves for general binary classification problems. arXiv preprint arXiv:1809.00694. Cited by: Appendix A, §4.2.
-  (2017) Learning non-discriminatory predictors. In Conference on Learning Theory, pp. 1920–1953. Cited by: §4.1.
-  (2015) Fairness constraints: mechanisms for fair classification. arXiv preprint arXiv:1507.05259. Cited by: §1, §1, §3, 1st item, §4.1.
Appendix A Supporting Theoretical Results
Theorem 5 (Convexity of mis-classification error).
Considering all possible normalized thresholds of a Bayes-optimal score function, the mis-classification error is a convex function of the true-positive rate (equal opportunity). The mis-classification error is also convex with respect to positive prediction rate (demographic parity).
We first note that the ROC of a Bayes-optimal score function is a concave one-to-one function . That is, the true positive rate is a concave one-to-one function with respect to the false positive rate , and the false positive rate is a convex one-to-one function with respect to the true positive rate .
The mis-classification error of a thresholding may be expressed as , where are the proportions of positive and negative labelled points in the dataset, respectively. Since the first term of this expression is linear and the second term convex with respect to , we conclude that the mis-classification error is a convex function of , as desired.
To characterize the mis-classification with respect to positive prediction rate , we note that . Since the first term of this expression for is concave one-to-one and the second term linear increasing with respect to , we deduce that is a concave one-to-one function with respect to ; equivalently, is a convex one-to-one function with respect to . The mis-classification error may be expressed as , which is the sum of a constant, linear, and convex function with respect to . Thus, we conclude that the mis-classification error is a convex function of , as desired. ∎