Supervised learning algorithms resilient to discriminatory data perturbations

Supervised learning algorithms resilient to discriminatory data perturbations


The actions of individuals can be discriminatory with respect to certain protected attributes, such as race or gender. Recently, discrimination has become a focal concern in supervised learning algorithms augmenting human decision-making. These systems are trained using historical data, which may have been tainted by discrimination, and may learn biases against the protected groups. An important question is how to train models without propagating discrimination. Such discrimination can be either direct, when one or more of protected attributes are used in the decision-making directly, or indirect, when other attributes correlated with the protected attributes are used in an unjustified manner. In this work, we i) model discrimination as a perturbation of data-generating process; ii) introduce a measure of resilience of a supervised learning algorithm to potentially discriminatory data perturbations; and iii) propose a novel supervised learning method that is more resilient to such discriminatory perturbations than state-of-the-art learning algorithms addressing discrimination. The proposed method can be used with general supervised learning algorithms, prevents direct discrimination and avoids inducement of indirect discrimination, while maximizing model accuracy.

Discrimination consists in treating somebody unfavorably because of their membership to a particular group, characterized by a protected attribute, such as race or gender. Freedom from discrimination is outlined as a basic human right by the Universal Declaration of Human Rights. Legal systems often prohibit discrimination in a number of contexts titlevii (); fairhousing (); eu1 (); eu2 (), for example the Civil Rights Acts of 1866 of the United States outlaw discrimination based on race in employment. Nowadays there is a growing appetite for introducing algorithmic decision-making systems and these systems introduce new concerns regarding discrimination. In principle, algorithmic systems can remove the biases associated with human judgment, increasing accuracy and fairness as well as transparency. In practice, however, there is a concern that these systems can perpetuate existing biases or introduce new ones, in a far from transparent manner Larson2016How (); Dastin2018Amazon (); ONeil2016Weapons (). Given the nature of machine learning methods currently in use, a re-examination and through formalization of discrimination notions is necessary, and a large amount of research on this topic has emerged in computer science Pedreshi2008Discrimination (); Feldman2014Certifying (); Zafar2015Fairness (); Zafar2017Fairnessa (); Zafar2017Fairness (); Hardt2016Equality (); Zafar2017Parity (); Woodworth2017Learning (); Pleiss2017Fairness (); Donini2018Empirical (); Datta2016algorithmic (); Adler2016Auditing (); Kilbertus2017Avoiding (); Kusner2017Counterfactual (); Salimi2019Capuchin ().

In the legal titlevii (); fairhousing () and social science Ture1968Black (); Altman2016Discrimination (); Lippert-Rasmussen2012Badness () contexts, a key consideration serving as the basis for identifying discrimination is whether there is a disparate treatment or unjustified disparate impact on the members of some protected group. To prevent disparate treatment, the law often forbids the use of certain protected attributes, such as race or gender, , in the decision-making, e.g., decisions about hiring, . Thus, these decisions shall be based on a set of relevant attributes, , and should not depend on the protected attribute, for any , ensuring that there is no disparate treatment.1 We refer to this kind of discrimination as direct discrimination, because of the direct use of the protected attribute .

Historically, the prohibition of disparate treatment was circumvented by the use of variables correlated with the protected attribute as proxies. For instance, some banks systematically denied loans and services, intentionally or unintentionally, to certain racial groups based on the areas they lived in Zenou2000Racial (); Hernandez2009Redlining (), what is known as the phenomenon of “redlining”. In order to prevent this indirect discrimination, legal systems sometimes establish that the impact of a decision-making process should be the same across groups differing in protected attributes Lippert-Rasmussen2012Badness (); Altman2016Discrimination (), that is , unless there is a “justified reason” or “business necessity clause” for this disparate impact titlevii (); fairhousing (). If there exists a valid business necessity then disparate impact is deemed legal — this precedence happened in the case of Ricci v. DeStefano ricci (). Indirect discrimination is a particularly acute problem for machine learning data-rich systems, since they often can find surprisingly accurate surrogates for protected attributes when a large enough set of legitimate-looking variables is available, resulting in discrimination via association Wachter2019Affinity (). The main challenge in introducing non-discriminatory learning algorithms lies in preventing the inducement of indirect discrimination, while simultaneously avoiding direct discrimination Zafar2015Fairness ().

In this paper, we consider a prevalent scenario of supervised learning, where a model supporting human decisions is trained on available data, i.e., a set of samples . In principle, this model could represent any decision-making process, for instance: i) assigning a credit score for a customer, given her financial record and her race , or ii) deciding whether a given individual shall be hired to police, given her skills and her gender . The goal of a supervised learning algorithm is to obtain a function that optimizes a given objective, e.g., the empirical risk function, , where the expectation is over the samples in and is a loss function, e.g., quadratic loss, .

If the dataset is not tainted by discrimination, in which case we refer to it as , such that , then standard supervised learning algorithms can be applied to learn a non-discriminatory . If the dataset is tainted by discrimination, then a data science practitioner may desire, and, in principle, be obliged by law, to apply an algorithm that does not perpetuate this discrimination. This practitioner, however, may have no information whether the training dataset was tainted by discrimination () or not (), so supervised algorithms that aim to prevent discrimination operate in a blind setting. A number of such algorithms have been developed by adding a constraint or a regularization to the objective function Pedreshi2008Discrimination (); Feldman2014Certifying (); Zafar2015Fairness (); Zafar2017Fairness (); Hardt2016Equality (); Zafar2017Parity (); Woodworth2017Learning (); Pleiss2017Fairness (); Donini2018Empirical (). Most of these algorithms prevent direct discrimination, but they do not prevent induction of indirect discrimination. For instance, the algorithms that put constraints on the aforementioned disparities in treatment and impact Pedreshi2008Discrimination (); Feldman2014Certifying (); Zafar2015Fairness () induce a discriminatory bias in model parameters, when they are provided a non-discriminatory dataset for training Lipton2018Does (). Even if the designer knew that the training dataset is discriminatory, e.g., that is affected by , there still remains the question of how to drop  from the model without inducing indirect discrimination, that is without increasing the impact of relevant attributes correlated with in an unjustified and discriminatory way.

To address these challenges, we model discrimination as a perturbation of the data-generating process. This perturbation transforms into  and can be represented as a dataset shift Moreno-Torres2012unifying (), such that training and testing datasets come from different distributions due to the added impact of the protected attribute. Then, we propose a measure of resilience to such perturbations, and develop a supervised learning algorithm that is resilient to discrimination.

Resilience to potentially discriminatory perturbations.

Let us first consider a model of the unperturbed non-discriminatory output variable , expressed in terms of relevant variables . Samples of the output variable are drawn from a probability distribution, i.e., , or a corresponding probability density function if the output variables are continuous. If this model has a causal interpretation Pearl2009Causality (); Hernan2012Causal (), then the decisions are not causally influenced by the protected attribute , which is why we call non-discriminatory. By contrast, we refer to the perturbed decisions, , as potentially discriminatory. Perturbations of this kind were proposed before as random swaps of the class labels in a binary classification, i.e.,  Fish2016confidence (), which could depend on the protected attribute in addition to , although that study assumed no access to it. Here, to capture direct and indirect discrimination, we consider discriminatory perturbations that depend, potentially causally, on the protected attribute . We distinguish between direct and indirect discrimination: {outline}[enumerate]


Directly discriminatory perturbations via , , resulting in .2


Indirectly discriminatory perturbations via , , resulting in and .

Note that direct discrimination, defined as , is equivalent to disparate treatment, i.e., for any , what amounts to a direct impact of the protected attribute on the output variable. Interestingly, indirect discrimination requires that , which means that the perturbation modifies the dependence of the output variable on the protected attribute, because the impact of a mediating variable on the output variable is modified. This formulation resembles the aforementioned notion of disparate impact, introduced as . The key insight enabling this formulation of indirect discrimination is its relational nature requiring comparisons of data-generating processes before and after the perturbation. In contrast, the definition of direct discrimination does not necessitate such relational comparisons.

Indirect discrimination can be mediated via an attribute that has either non-zero or zero impact on the unperturbed . The former case is well-established in legal systems and social science Altman2016Discrimination (). For example, the Supreme Court of the United States ruled that the usage of broad aptitude tests in hiring practices that disparately impacted ethnic minorities was irrelevant to job performance and hence illegal griggs (). The latter case is less clear-cut, but the aforementioned redlining Zenou2000Racial (); Hernandez2009Redlining () can be seen as its example. If a bank decides whether to give a loan to a customer, the zip code can contain some scarce information about the wealth of this customer and her ability to repay the loan, so to a small extent it could be used as a relevant attribute. However, the zip code also contains information about the ethnicity of the customer, if neighbourhoods are racially segregated, so denying loan based on the zip code alone causes unjustified disparate impact on ethnic minorities. Both of these kinds of indirect discrimination happen in supervised learning, when the protected attribute is dropped before training and its proxy variables in , whether they have impact on or not, replace the predictive power of on .

If we model discrimination as a perturbation of non-discriminatory data, then we can define the resilience of a supervised learning algorithm to such perturbations by measuring how close are the predictors generated by this algorithm to the unperturbed decisions, even if no information is available about the type of perturbations present in the training data. We refer to the algorithm ’s solution (a predictor) as , which is obtained by training on the dataset . Then, we define the resilience of a supervised learning algorithm to perturbation of data as


which is confined between and , i.e., means that the algorithm is perfectly resilient to the data perturbation, whereas means it is not resilient at all. The is a predictor of the non-discriminatory ground truth, trained on the unperturbed dataset , so the enumerator takes into account that may be intrinsically random and unpredictable.3 The property that is ensured, if the same baseline learning algorithms yielding and optimise the same baseline objective function, e.g., both optimize empirical risk via gradient descent, but the algorithm ads to it a fairness criterion, regularization, or procedure.

The proposed measure of resilience describes how an algorithm trained on potentially discriminatory  performs when it is evaluated on a non-discriminatory . In other words, we measure how well the algorithm deals with a discriminatory dataset shift introducing bias depending on the protected attribute. In general, dataset shifts happen when the training dataset is sampled from a different distribution than the test dataset used for evaluation Moreno-Torres2012unifying (). The introduced discriminatory perturbations are sub-types of “concept shift” that depend on the protected attribute Moreno-Torres2012unifying ().

In supervised learning, typically we constrain the set of models that could explain to a certain family, e.g., generalized linear models. In this manuscript, we focus on the case where is generated by the same family as , and discuss the case where the two belong to different families as a future work. Next, we develop a supervised algorithm that prevents direct discrimination and induction of indirect discrimination. Later, we propose an evaluation framework for supervised algorithms preventing discrimination that is based on the introduced measure of resilience to perturbations. The framework makes the same assumption, although it is straightforward to extend it to the cases where this assumption is not met.

Proposed method for discrimination prevention.

We develop a novel supervised learning procedure that yields predictors resilient to direct discrimination and the inducement of indirect discrimination against the groups defined by the protected attribute. Note that direct discrimination could be prevented by simply removing the protected attribute from the training data. By doing so, however, we could unwillingly induce indirect discrimination, because the relevant attributes that are correlated with the protected attribute would be used in place of the protected attribute, if we applied standard supervised learning algorithms. We conclude that the dependence of the predictor on the relevant attributes shall not come from the relation of these attributes with the protected attribute, i.e., the inducement of indirect discrimination is not allowed.

Overall, the proposed resilient learning algorithm has two steps. In the first step, we train the model using all features, both protected and relevant , without any consideration of fairness. Most importantly, the protected attribute is available during the training, so the model does not use third variables as surrogates of the protected attribute, thus avoiding inducing indirect discrimination via . In this way, we estimate the true values of the parameters unaffected by perturbations that regulate the impact of the relevant variables on . Our estimates of these parameters are unbiased under the assumption that and belong to the same parametric family and there is no model misspecification. In the second step of our method, we eliminate the influence of the protected attribute. This is done by using the model trained with all features but imputing the value of the protected attribute from a weighting distribution that does not depend on any of the relevant features nor the dependent variable.

More specifically, in the case of frequentist decision theory, our method for discrimination prevention is as follows. In the first step, we obtain the full predictor by minimizing the corresponding expected value of our loss function, e.g., the empirical risk . In the second step, we eliminate the dependence on the protected variable , by replacing it with a counterfactual random variable with a mixing distribution independent from other variables, yielding which we refer to as an average predictor imputing the protected attribute. Methods preventing discrimination trade accuracy to fulfill fairness objectives Zafar2017Fairness (). Here, we search for the optimal mixing distribution, , that minimizes the empirical risk, , while all parameters of the full predictor are fixed, i.e.,


This optimization problem is convex for quadratic loss function. Thus, the optimal weighting distribution can be found by applying disciplined convex programming with constraints ensuring that is a distribution, i.e., and for all  Diamond2016CVXPY (). Once the optimal mixing distribution is know, the optimal imputing predictor can be computed,


which is the solution of the proposed learning algorithm. Next, we measure its resilience to discriminatory perturbations of data.

Figure 1: A diagram of the proposed framework for evaluating methods preventing discrimination.

An evaluation framework estimating resilience to perturbations.

In real datasets, we typically have access only to the potentially perturbed decisions and we do not know . The definitions of discriminatory perturbations allow us to generate synthetic datasets perturbed with direct or indirect discrimination for which we know non-discriminatory ground truths. In this setting, we reason that learning algorithms that prevent discrimination should be resilient to such synthetic perturbations and should retrieve predictors that are close to the non-discriminatory ground truth. This is a challenging task, because the training method does not have access to that ground truth, but only to its perturbed version, . In our evaluation framework (Figure 1), we generate random datasets and from the same family of functions and measure the resilience to discriminatory perturbations of various learning algorithms preventing discrimination, including our proposed algorithm.

More specifically, we implement this evaluation framework for the case of generalized linear models as data generating processes. These models govern the expectation of the output variable to be , where is inverse of the link function. For instance, in the case of binary dependent variables, as in logistic regression, the function is a sigmoid function. Next, we define discrimination as a perturbation of that in general can be represented as . Under the assumption that and belong to the same family of models, we can represent the perturbations as

Using this framework, we measure the resilience to discriminatory perturbations of several state-of-the-art learning algorithms for discrimination prevention, which we briefly introduce next.

Figure 2: Average resilience of learning algorithms to non-discriminatory perturbations (the leftmost column) and discriminatory perturbation (the remaining three columns), for logistic regression (upper part) and linear regression models (lower part). On average, the proposed optimal imputing predictor (red bars) is more accurate w.r.t non-discriminatory ground truth than state-of-the-art methods addressing discrimination (orange bars). The error bars correspond to confidence intervals of the expectation, obtained via bootstrapping.
Figure 3: The cumulative distribution function of per-dataset resilience values divided by the resilience of the optimal imputing predictor computed for the same dataset. The vertical red lines correspond to the optimal imputing predictors.

State-of-the-art learning algorithms addressing discrimination.

Several methods have been proposed to train machine learning models that prevent a combination of disparate treatment and impact Pedreshi2008Discrimination (); Feldman2014Certifying (); Zafar2015Fairness (). These methods, however, induce indirect reverse discrimination, by negatively affecting the members of advantaged group Lipton2018Does (). Other studies propose novel mathematical notions of fairness, such as equalized opportunity, , and equalized odds, Donini2018Empirical (); Woodworth2017Learning (); Hardt2016Equality (); Pleiss2017Fairness (), or parity mistreatment, i.e.,  Zafar2017Fairness (). These methods at first look promising, but they too induce indirect discrimination (see Appendix A). Overall, many fairness objectives and their implementations have been proposed fairtutorial (), but recent works expose the impossibility of simultaneously satisfying multiple non-discriminatory objectives, such as equalized opportunity and parity mistreatment Chouldechova2017Fair (); Kleinberg2017Inherent (); Friedler2016impossibility (). In other words, there exist multiple supervised learning methods for preventing discrimination, but they are often mutually exclusive. There is a need to find ways to compare these methods with objective measures.

We evaluate several of these learning algorithms in the following section. For this evaluation, we select a diverse set of methods that aim to prevent discrimination through different objectives: disparate impact Zaremba2015Learning (), disparate mistreatment Zafar2015Fairness (); Zafar2017Fairness (), preferential fairness Zafar2017Parity (), equalized odds Hardt2016Equality (), a convex surrogate of equalized odds Donini2018Empirical (), and a causal database repair Salimi2019Capuchin (). In all cases but one, we use implementations of these algorithms as provided by authors. All of these methods were implemented for the case of discrete decisions . We re-implemented one of these methods so that it works for the case of continuous  Zafar2015Fairness (); Zafar2017Fairness (). Details of the implementations of these methods are listed in Appendix B. The implementation of the models we used for the experiment Zafar2015Fairness (); Zafar2017Fairness (); Zafar2017Parity (); Hardt2016Equality (); Donini2018Empirical () are readily available online456.

Figure 4: Resilience of learning algorithms to discrimination in a relevant attribute (left column) and missing relevant features that are affected (middle column) or not (right column) by discrimination.

Results from the evaluation framework.

We use the proposed evaluation framework to test whether different supervised learning algorithms are resilient to various dataset shifts.

First, we generate a synthetic set of samples from a standard multivariate normal distribution with a random correlation matrix Ghosh2003Behavior (). The variable is converted to a binary value with the sign function. Second, we generate the non-discriminatory ground truth decisions, either as draws from normal distribution with unit variance, , or 0-1 coin tosses, . Here, and is either an identity function or logistic function, respectively. The parameters for . The resulting set of samples constitute the dataset . Third, we sample the perturbed decisions, , which is the same family of distributions as . These perturbed decisions constitute the dataset that will be used by learning algorithms as a training dataset. These perturbations may or may not be discriminatory, depending on how they affect the expected perturbed outcomes:


[enumerate] \1 no discrimination:       , \1 direct discr. via :       , \1 indirect discr. via :    , \1 indirect discr. via :   , where coefficients , , and are drawn from . In the case of each perturbation, we receive a set of potentially discriminatory samples, .

These perturbed datasets are then used to train a model, using various state-of-the-art supervised methods for discrimination prevention Hardt2016Equality (); Donini2018Empirical (); Zafar2015Fairness (); Zafar2017Fairness (); Zafar2017Parity () and the proposed learning algorithm (Equation 3). To compare the effectiveness of different methods preventing discrimination we measure the resilience, , computed for the squared loss function. For each learning algorithm, the procedure of data generation and training is repeated times, each time with a different correlation matrix and model parameters , , , . Then, we report the resilience averaged over these trials, , measured separately for each type of data perturbation (Figure 2).

When the learning algorithms preventing discrimination are applied to non-discriminatory data, they shall fall back to a traditional learning algorithm to avoid biases in inference. However, most of the algorithms tested here do not achieve this result (the leftmost column in Figure 2), except for two algorithms: the game-theoretic method based on envy-freeness (“Zafar EF” in Figure 2Zafar2017Parity () and our algorithm. The methods equalizing overall missclassification rate, false negative rate, or related measures (e.g., “Zafar OMR” in Figure 2Zafar2017Fairnessa () introduce indirect discrimination (see Appendix A), same as the methods that leverage parity treatment and impact Pedreshi2008Discrimination (); Feldman2014Certifying (); Zafar2015Fairness (); Lipton2018Does ().

As expected, the resilience of all methods decreases when they are trained on the datasets with discriminatory perturbations (the three right columns in Figure 2). However, the proposed learning algorithm (the red bars in Figure 2) is more resilient to direct and indirect discriminatory perturbations than other supervised methods aiming to prevent discrimination Hardt2016Equality (); Donini2018Empirical (); Zafar2015Fairness (); Zafar2017Fairness (); Zafar2017Parity (). The second best method is consistently the game-theoretic method based on envy-freeness, however this algorithm allows direct discrimination via .

Interestingly, our learning algorithm has also significantly larger resilience than the traditional learning algorithm (with or without protected attribute; see blue bars in Figure 2), for every type of discriminatory perturbation, except for the indirect discrimination via . This result holds true both for logistic regression model (upper part of Figure 2) and linear regression (lower part of Figure 2). For instance, for the linear regression model, the proposed method achieves maximal resilience to directly discriminatory perturbations. In the case of indirect discrimination via , the proposed algorithm has the same resilience as the traditional algorithm. It is impossible for a learning algorithm to address indirect discrimination via , if impacts the unavailable non-discriminatory . It is easier to address indirect discrimination via , i.e., the attribute that has no impact on , since it can be partially tackled by not including the variable in the training dataset, what results in increased resilience of the optimal over the traditional method (rightmost panels in Figure 2). In practice, it may be difficult to distinguish from , so instead a data science practitioner may choose to include in training every feature that improves the accuracy of the model w.r.t. potentially discriminatory . In this scenario, optimal imputing predictor performs as good as traditional learning and the envy-free approach.

Beyond mean resilience, we also analyse per-dataset resilience values of each learning algorithm. Our results indicate that the optimal imputing predictor is is the most resilient for nearly every generated dataset with direct discrimination and for over of datasets with indirect discrimination (Figure 3).

Perturbed and missing relevant attributes.

Apart from the perturbations of the output variable, , the perturbed dataset, , could also include the perturbations of some of the relevant attributes , in which case we refer to these relevant attributes as . For instance, Jim Crow laws required literacy to decide whether an individual has a voting right, while ethnic minorities had systematically limited access to education klarman2006jim (). If some is suspected to be affected by discriminatory perturbations, then we shall construct a respective model for these variables, in which they are treated as output variables. Then, one can obtain an estimator of based on by applying the proposed algorithm. Then, then computed optimal imputing predictor of can be used to also obtain an estimator of based on . We apply this procedure within our evaluation framework by modeling a perturbation of (see Appendix B). We measure the resilience of the learning algorithms to this perturbation, finding that the proposed learning algorithm prevents direct discrimination in and as a consequence in (left side of Figure 4), under a linear model of and either a logistic or linear model of . Irrespective of these results, usage of the optimal imputing predictor to correct may be debated, since is a historical attribute whose correction may fall outside the responsibility of the entity training a model to make decisions .

In real-world settings, relevant attributes are often unknown or their measurements are unavailable. We model this scenario by removing from the training dataset , while keeping it unchanged in . Then, we measure the resilience of learning algorithms to missing relevant attributes. We distinguish between the case where the missing relevant attribute is discriminatory, , and the case where the missing attribute is not affected by discrimination, ; in both cases there exists an association between that attribute and the protected variable. The proposed learning algorithm is more resilient to missing discriminatory attribute than the other methods (middle column in Figure 4). When the missing attribute is non-discriminatory, the proposed algorithm performs slightly worse than the traditional algorithm (right column in Figure 4), which uses the protected attribute to obtain a more accurate predictor.

Discussion and limitations of the evaluation framework.

The proposed evaluation framework could have other specifications than the ones studied in this manuscript. First, the functional forms of the non-discriminatory ground truth model and its perturbations may influence the results of the proposed evaluation framework. In future work, these perturbations could be measured via experiments or observational studies to generate more realistic perturbations. Second, these results also depend on the distributions of all variables and the parameters of the used models, although our explorations show that the presented results are qualitative robust. Overall, future research shall develop this evaluation framework to make it more comprehensive and realistic, potentially enabling it as a benchmark for novel training methods that are resilient to discriminatory perturbations of data.

Evaluation on real-world datasets.

Figure 5: Performance comparison of different fair models over two real-world datasets.

While we have shown the resilient performance of our method in the evaluation framework, it remains to show whether the performance over synthetic dataset can translate to the empirical performance where we do not know the true data generating process. To evaluate this effect, we conduct the empirical analysis of our method over real-world datasets and compare its performance with the other algorithms addressing discrimination Hardt2016Equality (); Zafar2017Parity (); Zafar2017Fairness (); Zafar2015Fairness (); Donini2018Empirical (). We focus on binary classification task on two real datasets commonly used for the evaluation in fairness literature: the COMPAS recidivism dataset Larson2016How () and German Credit Dataset Dua:2019 () (see Appendix C). For COMPAS, we use the binary labels for race as a protected attribute. Similarly, for German Credit dataset, we use the gender of individuals as a protected attribute.

Since in real-world scenarios we typically do not have access to non-discriminatory ground truth, , as we did in the synthetic evaluation framework, we measure traditional accuracy and demographic disparity as a proxy of discrimination. Demographic disparity is defined as  Zafar2015Fairness (); Salimi2019Capuchin (). Note that even a perfectly non-discriminatory model can produce non-zero demographic disparity if underlying data is unfair, as we argued in the previous sections. While other measures have been proposed and used in the context of real-world applications Larson2016How (), such as disparity in false positive rate or positive predictive value (see Appendix C), these measures and other measures derived from the confusion matrix are determined for any given dataset by accuracy and demographic disparity (or any other such two measures for that matter) Narayanan2018Tutorial (); Chouldechova2017Fair (); Kleinberg2017Inherent (); Friedler2016impossibility (). In this experiment, we report the mean and standard deviation of these measures computed via 5-fold cross-validation Salimi2019Capuchin ().

Similar to the earlier experiment with synthetic data, we compare our method with existing supervised-learning methods with consideration for fairness Hardt2016Equality (); Zafar2017Parity (); Zafar2017Fairness (); Zafar2015Fairness (); Donini2018Empirical (). We report the results in Figure 5. Our method achieves the lowest demographic disparity and the highest accuracy for German Credit data. For the COMPASS data it also achieves the top accuracy, while yielding medium demographic disparity. Methods that achieve lower disparity also have lower accuracy, e.g., ”Zafar (2015)” for COMPAS. Results for other measures of disparity can be found in Figure 7 in Appendix C.


The presented results shed a new light on the problem of discrimination prevention in supervised learning. First, we propose a formal definition of direct and indirect discrimination, inspired by research in humanist fields Altman2016Discrimination (). This allows us to design a new evaluation framework for discrimination prevention in supervised learning by seeking methods that are resilient to various discriminatory perturbations. Second, we show that state-of-the-art methods addressing discrimination often return biased predictors when they are trained on datasets that are not affected by discrimination. Third, we propose a novel learning algorithm, whose solution is an average predictor imputing the protect attributes, which is resilient to direct and indirect discriminatory perturbations, thus performing better than the state-of-the-art methods in the proposed evaluation framework.

It is important to understand how model misspecification influences these results: the two models used to generate the ground truth and to train on observations could differ, going beyond the assumption that they belong to the same parametric family. To this end, it would make sense to measure the resilience to discriminatory perturbations of optimal imputing predictors applied to universal approximators, such as deep neural networks.

The proposed learning algorithm performs better in the evaluation framework than the traditional learning algorithm when there is direct discrimination via the protected attribute, what justifies its use in the circumstances where discrimination could have affected the training dataset. In the scenarios where discrimination does not affect the training data, the proposed learning algorithm returns unbiased predictors, unless relevant attributes are missing. In real-world scenarios, it is often unclear whether all relevant attributes are taken into account — the proposed learning algorithm performs better in these scenarios than traditional learning method if discrimination is present. By contrast, in the scenario where there is no discrimination and attributes are missing, the proposed method returns more biased models than the traditional learning algorithm. Overall, these results suggest that algorithmic learning methods inhibiting discrimination are profitable when evidences of discrimination are found in a society or when data availability is high and discrimination is suspected, but once discrimination is not present any more and the availability of relevant attributes is limited, then traditional learning methods return less biased estimators.

Appendix A

Additional synthetic experiments

Figure 6: Following the synthetic data proposed by Lipton2018Does (), we show how machine learning models under different fairness constraints Zafar2017Fairness () can return biased predictors even when the training data is non-discriminatory. We observe the following: i) none of the ML models (solid lines) found the true data generating process (dashed line) and ii) each triangular region between the decision boundary (solid lines) and true model (dashed line) is where the indirect discrimination happens. In particular, we observe that a group of male candidates are adversely affected by the model under the FPR objective (the center figure). Those candidates are rejected due to their short hair, or male-like characteristics.

Here, we present the results from a synthetic scenario proposed by Lipton2018Does (), modified slightly as follows. Using this example, we show how state-of-the-art learning algorithms addressing discrimination induce it even when the training data is non-discriminatory.

To this end, we sample 1000 observations from the data-generating process below:

This synthetic data represents the historical hiring process where the protected attribute is a candidate’s gender, . The data has the following properties: i) the hiring decision has been made based on the work experience only, thus, it is non-discriminatory data; ii) since women on average have less work experience than men, men have been hired at higher rate than women historically; and iii) women tend to have longer hair than men. Therefore, a model that uses hair length in its decision-making can induce indirect discrimination. Additionally, we introduced modification to this synthetic data with respect to the original scenario Lipton2018Does (). The work experience of male candidates now follows a bi-modal distribution (i.e., a mixture of two normal distributions) with one peak at 10 and another at 15. We trained a method for discrimination prevention Zafar2017Fairness () under three different fairness constraints: equalized missclassification rate, false positive rate (FPR), false negative rate (FNR) 7.

Figure 6 demonstrates the indirect discrimination induced by models under various fairness objectives. We observe the following. First, none of the models found the true data generating process (dashed line) even though the training data is non-discriminatory. Second, each triangle points represent the candidates affected by indirect discrimination: we observe that the model under the FPR objective (the center figure) rejects male candidates due to their shorter hair (male-characteristics). Finally, we present the relative utility of various models Hardt2016Equality (); Donini2018Empirical (); Zafar2015Fairness (); Zafar2017Fairness (); Zafar2017Parity () under this synthetic data in Table 1.

Mixture (ours) 1.000
Zafar (2018) Zafar2017Parity () 0.997
Zafar (2017) Zafar2017Fairness () with FNR 0.838
Zafar (2017) Zafar2017Fairness () with Missclass. 0.777
Donini (2018) Donini2018Empirical () 0.634
Zafar (2017) Zafar2017Fairness () with FPR 0.570
Hardt (2016) Hardt2016Equality () 0.328
Zafar (2016) Zafar2015Fairness () 0.179
Table 1: Relative utility of various fairness models Hardt2016Equality (); Donini2018Empirical (); Zafar2015Fairness (); Zafar2017Fairness (); Zafar2017Parity () trained with the synthetic data

Appendix B

Experiment Setup.

We report the performance of the model by Donini et al Donini2018Empirical () with SVM with linear kernel. The regularization parameter was tuned via grid search with . We report the statistics of Zafar2017Fairness () when the model is optimized to equalize misclassification rates between two groups. The implementation of the models we used for the experiment Zafar2015Fairness (); Zafar2017Fairness (); Zafar2017Parity (); Hardt2016Equality (); Donini2018Empirical () are readily available online8910.

Modeling discrimination in the relevant attributes.

To account for the discrimination in a component of , we generate the dataset in a slightly different way than in the main evaluation framework. Namely, after drawing the correlation matrix , we modify it to ensure that , where is another coefficient and does not influence . From this non-discriminatory , we create its perturbed version, . Finally, the perturbed output variable is formed by using in place of , that is , whereas the non-discriminatory output variable is formed as usual, .

Appendix C

Experiment Setup.

Similar to the synthetic experiment, we report the performance of the model by Donini et al Donini2018Empirical () with SVM with linear kernel. The regularization parameter was tuned via grid search with . We report the statistics of Zafar2017Fairness () when the model is optimized to equalize misclassification rates between two groups.

COMPAS Dataset.

The ProPublica COMPAS dataset Larson2016How () contains the records of 7214 offenders in Broward County, Florida in 2013 and 2014. COMPAS also provides binary label for each data if the individual shows high sign of recidivism. We use the race (African American, Caucasian) as the sensitive features. This dataset also includes information about the severity of charge, the number of prior crimes, and the age of individuals.

German Credit Dataset.

German Credit Dataset Dua:2019 () provides information about 1000 individuals and the corresponding binary labels describing them as creditworthy (= 1) or not (= 0). Each feature includes 20 attributes with both continuous and categorical data. We use the gender of individuals as the sensitive feature. This dataset also includes information about the age, job type, housing type of applicants, the total amount in saving accounts, checking accounts and the total amount in credit, the duration in month and the purpose of loan applications.

Metric Description and Definition
DD Demographic Disparity:
PPD Positive Predictive Disparity:
FPD False Positive Disparity:
Table 2: Summary of discrimination metrics used in our experiments
Figure 7: Additional experiment with real-world dataset using Positive Predictive Disparity (PPD) and False Positive Disparity (FPD). The lower these values are, the more likely that models are fair.


  1. Throught the manuscript we use a shorthand notation for probability: , where are random variables and are their instances.
  2. Here, the expectations are over the corresponding true distributions.
  3. In the rare cases, where is not intrinsically random and unpredictable, can be zero. In such cases, a small value could to be added to the enumerator and denominator of resilience, to prevent it from being always zero. However, these cases are typically not encountered in practice.
  7. We also trained a model while simultaneously optimizing both FPR and FNR; however, the learned model returned trivial predictions where all candidates are rejected.


  1. Title VII of the Civil Rights Act, 1964. 7, 42 U.S.C., 2000e et seq.
  2. The Fair Housing Act, 1968. 42 U.S.C.A., 3601-3631.
  3. European Union, 2000. Council Directive 2000/78/EC of 27 November 2000 establishing a general framework for equal treatment in employment and occupation. Official Journal L 303 , 02/12/2000 P. 0016 - 0022.
  4. European Union, 2000. Council Directive 2000/43/EC of 29 June 2000 implementing the principle of equal treatment between persons irrespective of racial or ethnic origin. Official Journal L 180 , 19/07/2000 P. 0022 - 0026.
  5. J. Larson, S. Mattu, L. Kirchner, and J. Angwin, “How We Analyzed the COMPAS Recidivism Algorithm,” Pro Publica, 2016.
  6. J. Dastin, “Amazon scraps secret AI recruiting tool that showed bias against women,” San Fransico, CA Reuters. Retrieved Oct., vol. 9, 2018.
  7. C. O’Neil, Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2016.
  8. D. Pedreshi, S. Ruggieri, and F. Turini, “Discrimination-aware data mining,” in Proceeding 14th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD 08, (New York, New York, USA), p. 560, ACM Press, 2008.
  9. M. Feldman, S. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian, “Certifying and removing disparate impact,” pp. 259–268, 2014, arXiv:1412.3756.
  10. M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi, “Fairness Constraints: Mechanisms for Fair Classification,” Fairness, Accountability, Transpar. Mach. Learn., jul 2015, arXiv:1507.05259.
  11. M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi, “Fairness Constraints: Mechanisms for Fair Classification,” Artif. Intell. Stat., vol. 54, 2017, arXiv:1507.05259.
  12. M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi, “Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment,” in Proc. 26th Int. Conf. World Wide Web - WWW ’17, (New York, New York, USA), pp. 1171–1180, ACM Press, 2017, arXiv:1610.08452.
  13. M. Hardt, E. Price, and N. Srebro, “Equality of Opportunity in Supervised Learning,” in Adv. Neural Inf. Process. Syst. (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, eds.), pp. 3315–3323, Curran Associates, Inc., oct 2016, arXiv:1610.02413.
  14. M. B. Zafar, I. Valera, M. G. Rodriguez, K. P. Gummadi, and A. Weller, “From Parity to Preference-based Notions of Fairness in Classification,” in Adv. Neural Inf. Process. Syst. 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 229–239, Curran Associates, Inc., 2017, arXiv:1707.00010.
  15. B. Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro, “Learning Non-Discriminatory Predictors,” no. 1, 2017, arXiv:1702.06081.
  16. G. Pleiss, M. Raghavan, F. Wu, J. Kleinberg, and K. Q. Weinberger, “On Fairness and Calibration,” in Adv. Neural Inf. Process. Syst. 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 5680–5689, Curran Associates, Inc., 2017, arXiv:1709.02012.
  17. M. Donini, L. Oneto, S. Ben-David, J. Shawe-Taylor, and M. Pontil, “Empirical risk minimization under fairness constraints,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no. NeurIPS, pp. 2791–2801, 2018.
  18. A. Datta, S. Sen, and Y. Zick, “Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems,” in 2016 IEEE Symp. Secur. Priv., pp. 598–617, IEEE, may 2016.
  19. P. Adler, C. Falk, S. A. Friedler, G. Rybeck, C. Scheidegger, B. Smith, and S. Venkatasubramanian, “Auditing Black-Box Models for Indirect Influence,” in 2016 IEEE 16th Int. Conf. Data Min., pp. 1–10, IEEE, dec 2016.
  20. N. Kilbertus, M. Rojas Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf, “Avoiding Discrimination through Causal Reasoning,” in Adv. Neural Inf. Process. Syst. 30, pp. 656–666, Curran Associates, Inc., jun 2017, arXiv:1706.02744.
  21. M. J. Kusner, J. R. Loftus, C. Russell, and R. Silva, “Counterfactual Fairness,” in Adv. Neural Inf. Process. Syst. 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 4066–4076, Curran Associates, Inc., 2017, arXiv:1703.06856.
  22. B. Salimi, L. Rodriguez, B. Howe, and D. Suciu, “Capuchin: Causal Database Repair for Algorithmic Fairness,” feb 2019, arXiv:1902.08283.
  23. K. Ture, C. V. Hamilton, and S. Carmichael, Black power: The politics of liberation in America: With new afterwords by the authors. Vintage Books, 1968.
  24. A. Altman, “Discrimination,” in Stanford Encycl. Philos. (E. N. Zalta, ed.), Metaphysics Research Lab, Stanford University, 2016 ed., 2016.
  25. K. Lippert-Rasmussen, “The Badness of Discrimination,” vol. 9, no. 2, pp. 167–185, 2012.
  26. Y. Zenou and N. Boccard, “Racial discrimination and redlining in cities,” J. Urban Econ., vol. 48, no. 2, pp. 260–285, 2000.
  27. J. Hernandez, “Redlining revisited: mortgage lending patterns in Sacramento 1930–2004,” Int. J. Urban Reg. Res., vol. 33, no. 2, pp. 291–313, 2009.
  28. Ricci v. DeStefano 557 U.S. 557, Docket No. 07-1428, 2009. Supreme Court of the United States.
  29. S. Wachter, “Affinity Profiling and Discrimination by Association in Online Behavioural Advertising,” SSRN Electron. J., pp. 1–74, 2019.
  30. Z. C. Lipton, A. Chouldechova, and J. McAuley, “Does mitigating ML’s impact disparity require treatment disparity?,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no. ML, pp. 8125–8135, 2018.
  31. J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. V. Chawla, and F. Herrera, “A unifying view on dataset shift in classification,” Pattern Recognit., vol. 45, no. 1, pp. 521–530, 2012.
  32. J. Pearl, Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd ed., 2009.
  33. M. Hernán and J. Robins, Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2012.
  34. B. Fish, J. Kun, and Á. D. Lelkes, “A confidence-based approach for balancing fairness and accuracy,” 16th SIAM Int. Conf. Data Min. 2016, SDM 2016, pp. 144–152, 2016, arXiv:1601.05764.
  35. Griggs v. Duke Power Co. 401 U.S. 424, 91 S. Ct. 849; 28 L. Ed. 2d 158; 1971 U.S. LEXIS 134, 1971. Supreme Court of the United States.
  36. S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” J. Mach. Learn. Res., vol. 17, pp. 1–5, 2016.
  37. A. Narayanan, “Tutorial: 21 fairness definitions and their politics,” in Proc. the Conference on Fairness, Accountability, and Transparency, 2018.
  38. A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments,” Big Data, vol. 5, pp. 153–163, jun 2017, arXiv:1703.00056.
  39. J. Kleinberg, S. Mullainathan, and M. Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores,” in Proc. Innov. Theor. Comput. Sci., 2017, arXiv:1609.05807.
  40. S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian, “On the (im)possibility of fairness,” 2016, arXiv:1609.07236.
  41. W. Zaremba, T. Mikolov, A. Joulin, and R. Fergus, “Learning Simple Algorithms from Examples,” arXiv, vol. 48, pp. 1–12, 2015, arXiv:1511.07275.
  42. S. Ghosh and S. G. Henderson, “Behavior of the NORTA method for correlated random vector generation as the dimension increases,” ACM Trans. Model. Comput. Simul., vol. 13, pp. 276–294, jul 2003.
  43. M. J. Klarman, From Jim Crow to civil rights: The Supreme Court and the struggle for racial equality. Oxford University Press, 2006.
  44. D. Dua and C. Graff, “UCI machine learning repository,” 2017.
  45. A. Narayanan, “Tutorial: 21 fairness definitions and their politics,” 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description