# Interpretable Fairness via Target Labels

in Gaussian Process Models

###### Abstract

Addressing fairness in machine learning models has recently attracted a lot of attention, as it will ensure continued confidence of the general public in the deployment of machine learning systems. Here, we focus on mitigating harm of a biased system that offers much better quality outputs for certain groups than for others. We show that bias in the output can naturally be handled in Gaussian process classification (GPC) models by introducing a latent target output that will modulate the likelihood function. This simple formulation has several advantages: first, it is a unified framework for several notions of fairness (demographic parity, equalized odds, and equal opportunity); second, it allows encoding our knowledge of what the bias in outputs should be; and third, it can be solved by using off-the-shelf GPC packages.

numbersnatbib \PassOptionsToPackageblocksauthblk

## 1 Introduction

Algorithmic assessment methods are used for predicting human outcomes such as recruitment, bail decision, mortgage approval, and insurance premiums. This contributes, in theory, to a world with decreasing human biases. To achieve this, however, we need advanced machine learning models that are free of algorithmic biases (fair models), despite the fact that they are written by humans and trained based on historical and biased data. The harmful effects of a biased system can be in an allocative sense and/or in a representational sense (Barocas et al., 2017). A machine learning system causes allocative harm when it allocates an opportunity or a resource, e.g. being released on bail or getting a mortgage, more favorably to certain groups than to others. Even when a machine learning system is not used as an allocator, it can still cause representational harm whenever it reinforces the subordination of people sharing certain sensitive attributes such as race and gender. In the simplest setup, a machine learning system models the relationship between input data and output decision. Allocative harm is about output, while representational harm is about input features.

There is an active push within the machine learning community to define what fairness means, and to develop models that respect these definitions. One definition is labeled as statistical or demographic parity, in which the allocator (the classifier) and sensitive attributes must be statistically independent. When we have binary sensitive attribute (married/not married) and binary decision (yes/no in getting a mortgage), this independence criterion requires “yes” decisions of married individuals to be at the same rate as “yes” decisions of unmarried individuals. Many models are available to enforce this statistical parity criterion (e.g. (Calders et al., 2009; Kamishima et al., 2011; Zafar et al., 2017a)), however none of them give humans the control to set the rate of “yes” decision. This control will facilitate decision makers to trade-off fairness and utility (e.g. classification accuracy), and to be accountable for it. Importantly, it will make the system transparent to others (e.g. customers, data protection officers as per EU’s General Data Protection Regulation requirement). Another fairness criterion, equalized odds (Hardt et al., 2016), requires the classifier and the sensitive attributes to be independent, conditional on the actual label (yes/no in ability to pay back the mortgage). In our illustrative example, this reads as equal true positive rates (TPR) and false positive rates (FPR) across married and unmarried groups. We face the same question, should we set these TPR and FPR automatically by the algorithm, or manually by the decision makers? We advocate the latter, and propose the first method that enables this.

We propose a simple and interpretable method for incorporating fairness based on the framework of Gaussian process classifiers (GPCs). We assume the existence of unbiased output decision (vide allocative harm), which will modulate the likelihood term of the GPC. It is a simple approach because we can reuse advancements in automated variational inference (Krauth et al., 2016; Bonilla et al., 2016; Adler et al., 2018) for learning the fair GPC model, and for handling a large amount of data. This reuse will not be possible if we follow the typical approach of encoding fairness in machine learning models via constrained optimization technique (e.g. (Zafar et al., 2017b, a; Quadrianto and Sharmanska, 2017; Hardt et al., 2016)). The most interesting aspect of the proposed model is that by this procedure, the influence of the bias in output decisions becomes very interpretable: its role is to encode our knowledge or choice about the bias in the output decisions. Fixing the “yes” decision in the unbiased output decision space to be equal across two groups will deliver statistical parity. Furthermore these “yes” decision rates are free-parameters in our model which need to be set by us, humans. Our model also has free-parameters that can be set to deliver equalized odds criterion, interestingly, once this criterion is chosen, the free-parameter corresponding to the statistical parity criterion will no longer be free Kleinberg et al. (2016). Our experiments on multiple datasets show that this fairness procedure leads not just to more interpretable models, but also to better trade-off between fairness and utility.

##### Related work

There are several ways to enforce fairness in machine learning models: as a pre-processing step (e.g. (Kusner et al., 2017; Louizos et al., 2015; Lum and Johndrow, 2016; Zemel et al., 2013)), as a post-processing step (e.g. (Feldman et al., 2015; Hardt et al., 2016)), or as a constraint during learning phase (e.g. (Calders et al., 2009; Zafar et al., 2017b, a; Quadrianto and Sharmanska, 2017; Woodworth et al., 2017)). Our method enforces fairness during the learning time, but, unlike other approaches, we do not cast fair learning as a constrained optimization problem. Constrained optimization requires a customized optimization procedure. In Goh et al. (2016); Zafar et al. (2017b, a), suitable majorization-minimization/convex-concave procedures Lanckriet and Sriperumbudur (2009) were derived. Our proposed method can be solved by off-the-shelf GPC packages, which only need conditional likelihood evaluation as a black-box function (e.g. Dezfouli and Bonilla (2015); Matthews et al. (2017)). Many of the recently proposed methods Quadrianto and Sharmanska (2017); Madras et al. (2018) attempt to have a unified framework that can be instantiated for either statistical parity or equalized odds criteria. Our method also provides a unified framework. Furthermore, the setting of free parameters in our model transparently highlights the mutual exclusivity Kleinberg et al. (2016); Chouldechova (2017) of statistical parity and equalized odds.

## 2 Target labels

When considering fairness we are faced with a situation where we do not want our algorithm to simply learn the training labels but rather we want to consider other factors when deciding what to learn. This can be conceptualized by introducing a new virtual label which we will call target label. The algorithm will then try to predict this target label instead of the training label . The problem with most datasets is that even if we know that the training label is misleading somewhere, we do not know how to construct the desired target labels . In the following, we discuss a very specific scenario where we have some general statistic information about .

First, we make the assumption that only depends on and a sensitive attribute which takes on discrete values . Second, we assume that all the conditional probabilities are known for all and all . We will show how to compute these conditional probabilities for any dataset when the goal is to enforce Demographic Parity or Equality of Opportunity in Section 3.

We note here that the assumptions specify that all the wrongness (bias) of is captured in . In particular, it does not depend on the input ; that is . This assumption will not be satisfied when certain are more likely than other with the same sensitive attribute to have differing and .

We made no assumptions here about any bias in the input values. We will come back to this topic when we consider whether or not to treat as part of the input.

### 2.1 Changing the target labels in Gaussian Processes

A Gaussian process is a distribution over an infinite collection of random variables, such that the marginal distribution of any finite subset of variables is a multivariate Gaussian Rasmussen and Williams (2006). We consider a function that maps an input space to a set of real numbers as an infinite collection of random variables where the set of values of evaluated at an arbitrary set of points with will have an -variate Gaussian distribution. Because of this, a Gaussian process defines a distribution over functions , that is, a sample from a Gaussian process is a function. To fully characterize a Gaussian process, we need to define its mean function and its covariance function .

Gaussian Process (GP) models are a versatile and useful tool for solving a variety of machine learning problems including classification. For a review of the Gaussian process classification (GPC) model, see for example Rasmussen and Williams (2006). The important thing to note is that posterior inference in the GPC model is very challenging due to nonlinear likelihood functions.

In Gaussian Processes based on variational inference Jordan et al. (1999), we use a variational distribution to approximate the exact posterior of the Gaussian Process. The posterior is given by

(1) |

where is the latent function, the training labels, the inputs and the sensitive attribute. This is equivalent to treating as part of the input.

In variational inference, the only quantities needed for the computation are the likelihood and the prior . We consider the likelihood first. The prior is discussed in more detail in Section 2.2. To keep the calculations short, we will assume that the sensitive attribute is binary, that is ). As before, and are also binary. This is also the structure in our experiments.

We expand the likelihood with as this is what we want to predict:

(2) |

In the last step we have, in addition to previously stated assumptions, made use of the fact that in a GP model, the likelihood does not depend on the input (label is conditionally independent of input given the latent function). We can see now that the initial assumption ensured that the collective likelihood does not depend on the input either. The term has to be understood as the ordinary likelihood of the GPC model. This is usually the logistic function or the probit function for binary classification and the softmax function for multi-class problems.

We also have the following 4 free parameters that describe the relationship between the training labels and the target labels:

(3) | |||

(4) | |||

(5) | |||

(6) |

We are calling them debiasing parameters. In Section 3, we will compute values for them.

When training with the given likelihood, the latent function will be shaped to predict the desired target label which is what our goal was. The predictive distribution is consequently given in terms of not :

(7) |

where is the training set over data points with the labels , the inputs and the sensitive attributes . We note that the term is the ordinary likelihood (e.g. logistic function, softmax). Eq (2) ensures that is predicting and not because during training is evaluated by .

### 2.2 Using the sensitive variable for training only

The formulation in Section 2.1 assumed that the sensitive attribute is known during test time which is often not the case. However, as Eq (7) indicates, only appears in the conditional prior for in the predictive distribution. Thus, the predictive distribution will itself become independent of if the GP prior is independent of .

With a prior that is independent of (), the posterior from Eq (1) for training becomes

(8) |

where we used the assumptions from Fig. 1 that the latent function contains all information about that is needed to make predictions. This conditional independence assumption is generally made in GP models. See also the graphical model representation of a GP in Fig. 2.3 of Rasmussen and Williams (2006). The likelihood is expanded in the same way as before in order to target instead of .

Making the prior independent of , restricts the expressiveness of the model because similar inputs will be treated similarly without regard to the corresponding sensitive attribute. As the experiments show (Section 4), this hurts the models ability to enforce Fairness. However, not using during prediction time can be desirable in order to avoid Disparate Treatment (Zafar et al., 2017a).

## 3 Realization of concrete fairness constraints

### 3.1 Targeting an acceptance rate

Before we consider concrete values, we give a quick overview on the intuition behind the debiasing parameters. Let refer to the disadvantaged group. For this group, we want to make more positive predictions than the dataset labels suggest. is supposed to be our fair label. Thus, in order to make more positive predictions, some of the labels should be associated with . However, we do not know which. So, if our model predicts (high ) while the dataset label is , then we should allow for the possibility that this is actually correct. That is, should not be . If we choose, for example, then that means that 30% of positive virtual labels may correspond to negative dataset labels . This way we can overall have more than . On the other hand, predicting when holds, will always be treated as being incorrect: . This is because we do not want any additional negative labels.

For the advantaged group , we have the exact opposite situation. If anything, we have too many positive labels. (Or the number of positive labels is exactly as it should be, in which case we can just set for all data points with .) So, if our model predicts (high ) while the dataset label is , then we should again allow for the possibility that this is actually correct. That is, should not be . On the other hand, should be because we do not want additional positive labels for .

We now give concrete values for the debiasing parameters in Eqs (3)-(6). In the following, refers to and . We first apply Bayes’ rule to the debiasing parameters:

(9) | |||

(10) |

Here, is the acceptance rate in group of the biased training labels. We call it the biased acceptance rate. This quantity can be estimated from the training data:

(11) |

The term is the target acceptance rate. This can not be estimated from the data; it has to be known about the unbiased data. We will later discuss strategies to choose this. This target acceptance rate is related to the other parameters in the following way:

(12) |

With a given target acceptance rate and a given biased acceptance rate, this constraint still leaves two degrees of freedom that can be represented as for . If we consider the label and the prediction, then this would be the true positive rate. As such, we call these two degrees of freedom the biased TPRs.

The biased TPRs will strongly affect the accuracy with respect to the biased labels. For example, will lead to predictions that look random with respect to the biased labels. In order to minimize the drop in accuracy w.r.t. the biased labels, we need to maximize the biased TPRs.

For , can be set to 1. In the case of , it follows that . Here we set (the biased TNR) to 1 and compute the biased TPR via the constraint in Eq (12) which leads to the maximum possible value for the biased TPR.

#### 3.1.1 Demographic Parity

A simple strategy for the target rate is Demographic Parity where we want to enforce

(13) |

This means we only choose one target rate for both groups. We call this .

When choosing the target rate, we again take into account what the effect is on the accuracy with respect to the biased labels. in the predictions necessarily implies that, for some input , . To keep the drop in accuracy to a minimum, has to be between and .

Natural options are

(14) | ||||

(15) | ||||

(16) |

where is the estimated biased acceptance rate in group . We find that using the mean for the target () is a safer choice than and . It is easier for the targeting mechanism to move both and to than to move to or vice versa. We confirmed this in the experiments.

### 3.2 Targeting a true positive rate

Whereas for Demographic Parity we enforce a constraint on , for Equality of Opportunity the constraint is on the TPR (true positive rate) . By Eqs (9) and (10), the debiasing parameters are fully determined by , and . The last of which can be estimated from the training data. This highlights the mutual exclusivity of statistical parity and equality of opportunity Kleinberg et al. (2016); Chouldechova (2017).

Equality of Opportunity demands that

(17) |

where is the prediction of the classifier. Assuming that the GPC model perfectly learns the target labels , we can fulfill this demand by enforcing (17) on .

At first glance it seems desirable to set both the target TNR (true negative rate) and the target TPR to 1 because any value lower than 1 will directly reduce the accuracy. However, when target TNRs and target TPRs are all set to 1, then the debiasing parameters are all 1 as well which is equivalent to for all datapoints. This is just the regular GPC model.

This problem can be understood in the following way. A perfect predictor would predict all labels correctly, that is . This automatically fulfills Equality of Opportunity. Generally, our predictors are not perfect however, so they make some classification error. What Equality of Opportunity demands is that this classification error is the same for all specified groups. By setting the target TPR to a lower value that is the same for all groups, we purposefully sacrifice some accuracy to make the errors the same. This sacrifice should be as small as possible.

Choosing values for the TPR target rate (and the TNR target rate) is significantly harder than in the case of targeting an acceptance rate because TPR and TNR are inextricably linked to the classifier that is used. We additionally found that the achieved TPR does not just depend on the target TPR but also on the target TNR. More specifically, targeting a lower TNR makes it easier to achieve a higher TPR. This is not surprising, because lowering the TNR will result in more positive predictions () which means that the general threshold for a positive predictions is lowered. This lowered threshold makes it more likely that a given false negative prediction is flipped; i.e., becomes a true positve prediction. A decrease of false negatives coupled with an increase in true positives will increase the TPR. We investigate this trade-off between TPR and TNR in the experiments (e.g. Fig. 8).

We use the following method to find a good value for the target TNR. We train a regular (unfair) GPC model on the training set and evaluate the model on the test set. From this evaluation we compute the achievable TNR and TPR separately for each group. These are the TNRs and TPRs that the GPC model can achieve (the higher the better). Of the two achievable TNRs (one for , one for ), we take the minimum as the target TNR for all groups.

We set the TNR to be the same for both groups so that the effect on the TPR is the same as well. This makes it easier to achieve equal TPRs. Technically, this enforces the fairness criterion Equalized Odds in which both TPR and TNR must be the same:

(18) |

However, Equalized Odds implies Equality of Opportunity.

For the target TNR, we take here the minimum of the two observed TNRs. This is not strictly necessary; it is a choice we made in order to be able to reach higher TPRs. For choosing the target TPR, an equivalent approach to choosing the TNRs is possible. We could choose the maximum of the two observed TPRs, after choosing the minimum for the TNR. In the experiments do not restrict ourselves to this choice but rather we explore different values for the target TPR.

### 3.3 Implementation of the GP model

Recently, there have been several attempts to develop a black-box inference technique for Gaussian process models such as the work of Dezfouli and Bonilla (2015); Hensman et al. (2015); Hernández-Lobato et al. (2016). We use the variational method of Nguyen and Bonilla (2014) due to its statistical efficiency, which means that, in order to approximate a posterior distribution with a mixture of Gaussians as the family of approximating distributions, it only requires expectations over univariate Gaussian distributions regardless of the likelihood of latent variables given observed data. Furthermore, Nguyen and Bonilla (2014) showed that their automated variational inference method can provide posterior distributions that are practically indistinguishable from those obtained by Elliptical Slice Sampling (ESS) Murray et al. (2010), while running orders of magnitude faster. Recently, Dezfouli and Bonilla (2015) showed that the statistical efficiency of Nguyen and Bonilla (2014) is retained while incorporating sparse approximations to GPs via the inducing point approach to scale up the inference technique further.

In the variational inference framework Jordan et al. (1999); Bonilla et al. (2016), all the parameters, including hyperparameters in the covariance function, variational parameters and likelihood parameters, are learned by maximizing evidence lower bound (elbo), lower bound of the marginal likelihood, (), that is

(19) |

Here, , and are the entropy, the cross entropy of variational distribution and expected log likelihood respectively. Since the first two terms corresponding to the (negative) KL-divergence between approximate posterior and prior, they do not rely on the observed data including sensitive attributes. Therefore, in the black-box inference framework, we only need to provide the evaluation of expected log likelihood, which, for our model, is described in Eq (2).

## 4 Experiments

### 4.1 Data

We compare the performance of our fair GP model with other existing models based on two real-world datasets. These datasets have been previously considered in the fairness-aware machine learning literature.

The first dataset is the Adult Income dataset (Dheeru and Karra Taniskidou, 2017). It contains 33,561 data points with census information from US citizens. The labels indicate whether the individual earns more () or less () than $50,000 per year. We use the dataset with race and gender as the sensitive attribute. The input dimension without sensitive attributes is 12. For the experiments, we removed 2,399 instances with missing data and used only the training data which we split randomly for each trial run.

The second dataset is the ProPublica recidivism dataset. It contains data from 6,167 individuals that were arrested. The data was collected for the COMPAS risk assessment tool (Angwin et al., 2016). The task is to predict whether the person was rearrested within two years ( if they were rearrested, otherwise). We again use the dataset with race and gender as the sensitive attributes.

### 4.2 Method

We evaluate two versions of our fair GP: FairGPparity which enforces Demographic Parity and FairGPopp which enforces Equality of Opportunity. Both are used in two ways: only using for training and not for predictions (the default case) and using during training and for predictions (marked by a * after the name). We also train a baseline GP model that does not take fairness into account and does not use as input.

The fair GP models and the baseline GP model are all based on variational inference and use the same settings. The batch size is 500 and the number of inducing inputs is 500 as well. We are using a squared exponential (SE) kernel with automatic relevance determination (ARD). We optimize the hyperparameters and the variational parameters with the Adam method Kingma and Ba (2014) with the default parameters. We use the full covariance matrix for the Gaussian variational distribution.

In addition to the GP baseline, we compare our proposed model with the following methods: Support Vector Machine (SVM), and several methods given by Zafar et al. (2017a) and Zafar et al. (2017b), which include maximizing accuracy under Demographic Parity fairness constraints (zafarFairness), maximizing Demographic Parity fairness under accuracy constraints (zafarAccuracy), and removing disparate mistreatment by constraining the false negative rate (zafarEqOpp). We make use of the comparison framework by Friedler et al. (2018) where every method is evaluated over 10 repeats that each have different splits of the training and test set.

### 4.3 Results

#### 4.3.1 Demographic Parity on Adult dataset

Following Zafar et al. (2017a) we evaluate Demographic Parity on the Adult dataset. Figure 2 shows the accuracy and disparate impact binary (DIbinary), which is defined by (see (Feldman et al., 2015; Zafar et al., 2017b) for more details). Thus, a completely fair model will have DIbinary of . The FairGP variants are both clearly fairer than the baseline GP. The variant that uses for training and prediction (FairGPparity*) performs significantly better here and gives equal or better performance than zafarAccuracy. Figure 3 shows runs of FairGP where we explicitly set a target acceptance rate (marked as “target” in the plot) instead of taking the mean of the rates in the training set. A perfect targeting mechanism will produce a diagonal. The datapoints are not exactly on the diagonal but they show that setting the target rate has the expected effect on the observed acceptance rate. This is the unique aspect of the approach.

Figure 4 shows the same data as Figure 3 but with different axes. What can be seen from this figure is that the target acceptance rate is not a trade-off parameter between accuracy and fairness. Changing the target rate barely affects fairness and it only affects the accuracy because target acceptance rates that are different from the base acceptance rate necessarily lead to “missclassifications”. More details are given in the Appendix (Table 1 and Table 2).

In Figure 5, we investigate which choice of target (, or ) gives the best result. The figure shows results from the Adult dataset with race as sensitive attribute where we have , and . As mentioned before, performs best in our experiment.

#### 4.3.2 Equality of Opportunity on Propublica dataset

For Equality of Opportunity, we again follow Zafar et al. (2017b) and evaluate the algorithm on the Propublica dataset. A measure of Equality of Opportunity should be defined by the difference between the TPRs for the two values of the sensitive attribute ( and ). Due to the fact that the smaller values have smaller relative differences, we use the difference divided by the average of the TPRs of the two groups (“nTPRDiff”):

(20) |

A perfectly fair algorithm would achieve on the measure.

In order to get the target TNR, we run baseline GP and GP* 10 times at first, and obtain the means of 0-TNR, 1-TNR for both GP and GP* shown in Appendix (Table 3) (0-TPR refers to the TPR in the group ; 1-TPR, 0-TNR and 1-TNR are defined analogously). As described in Section 3.2, we take the minimum of the TNRs as the target TNR. We tried other target TNRs as well with similar results.

In order to demonstrate the interpretability of our proposed fair framework, we set different target TPRs: . The TPR target value of 0.6 corresponds approximately to the maximum of the two TPRs (values are shown in the Appendix, Table 3). Therefore, this value would be the default value according to the prodecure described in Section 3.2. The results of 10 runs are shown in Figure 6. Here, for the race attribute, we see that FairGPopp* with TPR = 0.6 using for predictions performs best in terms of nTPRDiff. For the gender attribute, most of our FairGPopp and FairGPopp* (except TPR = 1) achieve better fairness than zafarEqOpp and other baselines at small or no accuracy cost. Furthermore, although our FairGPopp and FairGPopp* cannot always outperform zafarEqOpp and other baseline, we can significantly achieve higher TPRs for each group (0-TPR, 1-TPR) and the whole test dataset (TPR), which are shown in Figures 7 and 8.

It can be seen that higher target TPRs lead to higher 0-TPR, 1-TPR, and TPR. Figure 7 shows the actual TPRs for the two groups ( and ) setting several TPR targets. By setting higher target TPRs, we can easily improve our actual TPRs for each group. Figure 8 shows the same data with different axes (TPR and TNR). A clear trend is that setting higher target TPRs will lead to the higher actual TPRs and lower actual TNR. Thus, in the case of Equality of Opportunity, the target TPR does act like a trade-off parameter between accuracy and fairness. Figure 8 illustrates the trade-off between TPR and TNR of our algorithm, which was discussed in Section 3.2. More detailed results are shown in the Appendix (Table 3).

## 5 Discussion and conclusion

We presented the first treatment of fairness learning under the Gaussian process classification (GPC) framework, and called it FairGP. In the FairGP model, we introduce a latent target label, and we learn to predict this latent label instead of the training label. We assume that the target label only depends on the training label and a sensitive variable. The interplay of target label, training label, and sensitive variable gives rise to a variety of fairness notions, including Demographic Parity, Equalized Odds, and Equality of Opportunity. This unified fairness framework is also supplemented by an important capability for setting a target rate for each definition of fairness. For example, we can set the target true positive rate for equality of opportunity. This capability is unique to our approach and is an important step in re-introducing human accountability into algorithmic decision making. Our framework is general and will be applicable for sensitive variables with binary and multi-level values. The current work focuses on a single binary sensitive variable. For future work, we plan to extend the FairGP framework to take into account the bias in the input data.

#### Acknowledgments

This project is supported by the UK EPSRC project EP/P03442X/1 ‘EthicalML: Injecting Ethical and Legal Constraints into Machine Learning Models’ and the Russian Academic Excellence Project ‘5-100’. We gratefully acknowledge NVIDIA for GPU donation and Amazon for AWS Cloud Credits.

## References

- Barocas et al. [2017] Solon Barocas, Kate Crawford, Aaron Shapiro, and Hanna Wallach. The problem with bias: from allocative to representational harms in machine learning. In Special Interest Group for Computing, Information and Society (SIGCIS), 2017.
- Calders et al. [2009] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Data mining workshops, 2009. ICDMW’09. IEEE international conference on, pages 13–18. IEEE, 2009.
- Kamishima et al. [2011] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 643–650. IEEE, 2011.
- Zafar et al. [2017a] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics, pages 962–970, 2017a.
- Hardt et al. [2016] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.
- Krauth et al. [2016] Karl Krauth, Edwin V Bonilla, Kurt Cutajar, and Maurizio Filippone. Autogp: Exploring the capabilities and limitations of gaussian process models. arXiv preprint arXiv:1610.05392, 2016.
- Bonilla et al. [2016] Edwin V Bonilla, Karl Krauth, and Amir Dezfouli. Generic inference in latent gaussian process models. arXiv preprint arXiv:1609.00577, 2016.
- Adler et al. [2018] Philip Adler, Casey Falk, Sorelle A Friedler, Tionney Nix, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. Auditing black-box models for indirect influence. Knowledge and Information Systems, 54(1):95–122, 2018.
- Zafar et al. [2017b] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180. International World Wide Web Conferences Steering Committee, 2017b.
- Quadrianto and Sharmanska [2017] Novi Quadrianto and Viktoriia Sharmanska. Recycling privileged learning and distribution matching for fairness. In Advances in Neural Information Processing Systems, pages 677–688, 2017.
- Kleinberg et al. [2016] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807, 2016.
- Kusner et al. [2017] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4069–4079, 2017.
- Louizos et al. [2015] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. arXiv preprint arXiv:1511.00830, 2015.
- Lum and Johndrow [2016] Kristian Lum and James Johndrow. A statistical framework for fair predictive algorithms. arXiv preprint arXiv:1610.08077, 2016.
- Zemel et al. [2013] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
- Feldman et al. [2015] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259–268. ACM, 2015.
- Woodworth et al. [2017] Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. Learning non-discriminatory predictors. In Satyen Kale and Ohad Shamir, editors, Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 1920–1953, Amsterdam, Netherlands, 07–10 Jul 2017. PMLR. URL http://proceedings.mlr.press/v65/woodworth17a.html.
- Goh et al. [2016] Gabriel Goh, Andrew Cotter, Maya Gupta, and Michael P Friedlander. Satisfying real-world goals with dataset constraints. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems (NIPS), pages 2415–2423, 2016.
- Lanckriet and Sriperumbudur [2009] Gert R. Lanckriet and Bharath K. Sriperumbudur. On the convergence of the concave-convex procedure. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems (NIPS), pages 1759–1767, 2009.
- Dezfouli and Bonilla [2015] Amir Dezfouli and Edwin V. Bonilla. Scalable inference for gaussian process models with black-box likelihoods. In Advances in Neural Information Processing Systems (NIPS), pages 1414–1422, 2015.
- Matthews et al. [2017] Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke. Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research, 18(40):1–6, apr 2017. URL http://jmlr.org/papers/v18/16-537.html.
- Madras et al. [2018] David Madras, Elliot Creager, Toniann Pitassi, and Richard S. Zemel. Learning adversarially fair and transferable representations. CoRR, abs/1802.06309, 2018.
- Chouldechova [2017] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153–163, 2017.
- Rasmussen and Williams [2006] Carl Edward Rasmussen and Christopher KI Williams. Gaussian process for machine learning. MIT press, 2006.
- Jordan et al. [1999] Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999.
- Hensman et al. [2015] James Hensman, Alexander G. de G. Matthews, Maurizio Filippone, and Zoubin Ghahramani. MCMC for variationally sparse gaussian processes. In Advances in Neural Information Processing Systems (NIPS), pages 1648–1656, 2015.
- Hernández-Lobato et al. [2016] José Miguel Hernández-Lobato, Yingzhen Li, Mark Rowland, Thang D. Bui, Daniel Hernández-Lobato, and Richard E. Turner. Black-box alpha divergence minimization. In International Conference on Machine Learning (ICML), pages 1511–1520, 2016.
- Nguyen and Bonilla [2014] Trung V. Nguyen and Edwin V. Bonilla. Automated variational inference for gaussian process models. In Advances in Neural Information Processing Systems (NIPS), pages 1404–1412, 2014.
- Murray et al. [2010] Iain Murray, Ryan Prescott Adams, and David J. C. MacKay. Elliptical slice sampling. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 541–548, 2010.
- Dheeru and Karra Taniskidou [2017] Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Angwin et al. [2016] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. ProPublica, May, 23, 2016.
- Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Friedler et al. [2018] Sorelle A Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P Hamilton, and Derek Roth. A comparative study of fairness-enhancing interventions in machine learning. arXiv preprint arXiv:1802.04422, 2018.

## Appendix

Adult-race | Adult-gender | ||||
---|---|---|---|---|---|

Algorithm | Dibinary | Accuracy | Algorithm | DIbinary | Accuracy |

FairGPparity | 0.67 0.022 | 0.82 0.004 | FairGPparity | 0.68 0.044 | 0.81 0.003 |

FairGPparity* | 1.01 0.060 | 0.81 0.002 | FairGPparity* | 0.83 0.042 | 0.80 0.002 |

GP | 0.50 0.023 | 0.83 0.003 | GP | 0.24 0.019 | 0.83 0.003 |

SVM | 0.61 0.020 | 0.86 0.002 | SVM | 0.26 0.016 | 0.86 0.003 |

ZafarAccuracy | 1.31 0.355 | 0.80 0.012 | ZafarAccuracy | 1.22 0.395 | 0.79 0.009 |

ZafarFairness | 0.58 0.136 | 0.85 0.003 | ZafarFairness | 0.35 0.099 | 0.85 0.003 |

Adult-race | ||||
---|---|---|---|---|

Algorithm | Dibinary | Accuracy | 0-PR | 1-PR |

FairGPparity, target=0.1 | 0.62 0.038 | 0.81 0.003 | 0.10 0.008 | 0.16 0.006 |

FairGPparity, target=0.2 | 0.66 0.024 | 0.82 0.003 | 0.13 0.007 | 0.20 0.007 |

FairGPparity, target=0.3 | 0.68 0.036 | 0.81 0.004 | 0.19 0.011 | 0.27 0.008 |

FairGPparity, target=0.4 | 0.67 0.021 | 0.79 0.004 | 0.25 0.007 | 0.37 0.006 |

FairGPparity, target=0.5 | 0.69 0.016 | 0.76 0.004 | 0.30 0.006 | 0.43 0.004 |

FairGPparity*, target=0.1 | 0.71 0.046 | 0.81 0.003 | 0.12 0.007 | 0.17 0.005 |

FairGPparity*, target=0.2 | 0.94 0.061 | 0.81 0.004 | 0.19 0.011 | 0.20 0.006 |

FairGPparity*, target=0.3 | 1.03 0.046 | 0.80 0.005 | 0.28 0.014 | 0.27 0.008 |

FairGPparity*, target=0.4 | 0.92 0.040 | 0.78 0.006 | 0.33 0.014 | 0.36 0.006 |

FairGPparity*, target=0.5 | 0.87 0.015 | 0.75 0.006 | 0.37 0.005 | 0.42 0.007 |

Adult-gender | ||||

Algorithm | Dibinary | Accuracy | 0-PR | 1-PR |

FairGPparity, target=0.1 | 0.50 0.042 | 0.81 0.002 | 0.09 0.005 | 0.17 0.006 |

FairGPparity, target=0.2 | 0.69 0.039 | 0.81 0.002 | 0.15 0.008 | 0.21 0.007 |

FairGPparity, target=0.3 | 0.68 0.038 | 0.81 0.004 | 0.18 0.008 | 0.26 0.008 |

FairGPparity, target=0.4 | 0.59 0.018 | 0.79 0.003 | 0.22 0.005 | 0.37 0.006 |

FairGPparity, target=0.5 | 0.55 0.019 | 0.76 0.004 | 0.25 0.008 | 0.46 0.008 |

FairGPparity*, target=0.1 | 0.52 0.036 | 0.81 0.002 | 0.09 0.005 | 0.18 0.004 |

FairGPparity*, target=0.2 | 0.85 0.054 | 0.80 0.003 | 0.18 0.009 | 0.21 0.005 |

FairGPparity*, target=0.3 | 0.96 0.069 | 0.79 0.004 | 0.25 0.012 | 0.26 0.007 |

FairGPparity*, target=0.4 | 0.76 0.045 | 0.77 0.005 | 0.27 0.013 | 0.36 0.006 |

FairGPparity*, target=0.5 | 0.67 0.033 | 0.75 0.005 | 0.30 0.011 | 0.45 0.009 |

propublica-race | ||||||||
---|---|---|---|---|---|---|---|---|

Algorithm | 0-TPR | 1-TPR | 0-TNR | 1-TNR | TPR | TNR | nTPRDiff | Accuarcy |

FairGPopp, TPR=1.0 | 0.85 0.017 | 0.69 0.029 | 0.44 0.017 | 0.60 0.025 | 0.80 0.018 | 0.50 0.017 | -0.21 0.035 | 0.64 0.006 |

FairGPopp*, TPR=1.0 | 0.84 0.018 | 0.77 0.030 | 0.46 0.019 | 0.50 0.019 | 0.82 0.017 | 0.48 0.016 | -0.09 0.038 | 0.63 0.007 |

FairGPopp, TPR=0.9 | 0.79 0.017 | 0.60 0.034 | 0.54 0.015 | 0.68 0.028 | 0.74 0.018 | 0.59 0.015 | -0.28 0.051 | 0.66 0.007 |

FairGPopp*, TPR=0.9 | 0.78 0.017 | 0.69 0.030 | 0.56 0.014 | 0.59 0.025 | 0.75 0.018 | 0.57 0.013 | -0.12 0.036 | 0.65 0.006 |

FairGPopp, TPR=0.8 | 0.73 0.021 | 0.54 0.033 | 0.61 0.022 | 0.74 0.029 | 0.68 0.023 | 0.66 0.020 | -0.31 0.046 | 0.67 0.009 |

FairGPopp*, TPR=0.8 | 0.71 0.021 | 0.62 0.024 | 0.62 0.024 | 0.65 0.028 | 0.69 0.019 | 0.64 0.015 | -0.14 0.037 | 0.66 0.009 |

FairGPopp, TPR=0.7 | 0.68 0.021 | 0.47 0.028 | 0.66 0.022 | 0.78 0.020 | 0.62 0.020 | 0.70 0.016 | -0.37 0.052 | 0.66 0.009 |

FairGPopp*, TPR=0.7 | 0.66 0.021 | 0.57 0.028 | 0.67 0.026 | 0.70 0.031 | 0.64 0.018 | 0.68 0.014 | -0.14 0.053 | 0.66 0.009 |

FairGPopp, TPR=0.6 | 0.63 0.019 | 0.43 0.028 | 0.69 0.016 | 0.81 0.016 | 0.57 0.017 | 0.74 0.012 | -0.39 0.063 | 0.66 0.009 |

FairGPopp*, TPR=0.6 | 0.60 0.020 | 0.54 0.025 | 0.70 0.019 | 0.71 0.028 | 0.58 0.017 | 0.70 0.009 | -0.10 0.054 | 0.65 0.009 |

GP | 0.63 0.017 | 0.41 0.022 | 0.72 0.018 | 0.83 0.017 | 0.57 0.014 | 0.76 0.012 | -0.42 0.053 | 0.67 0.008 |

GP* | 0.65 0.020 | 0.37 0.020 | 0.70 0.018 | 0.87 0.016 | 0.57 0.018 | 0.76 0.014 | -0.55 0.043 | 0.67 0.011 |

SVM | 0.62 0.028 | 0.38 0.021 | 0.74 0.028 | 0.86 0.020 | 0.55 0.022 | 0.79 0.024 | -0.47 0.060 | 0.68 0.010 |

ZafarEqOpp | 0.57 0.019 | 0.51 0.012 | 0.76 0.012 | 0.74 0.024 | 0.55 0.013 | 0.75 0.008 | -0.13 0.043 | 0.66 0.005 |

propublica-gender | ||||||||

Algorithm | 0-TPR | 1-TPR | 0-TNR | 1-TNR | TPR | TNR | nTPRDiff | Accuarcy |

FairGPopp, TPR=1.0 | 0.71 0.020 | 0.80 0.013 | 0.59 0.042 | 0.51 0.016 | 0.79 0.010 | 0.53 0.012 | 0.12 0.037 | 0.65 0.007 |

FairGPopp*, TPR=1.0 | 0.71 0.028 | 0.82 0.016 | 0.61 0.050 | 0.48 0.019 | 0.81 0.013 | 0.51 0.018 | 0.15 0.049 | 0.64 0.008 |

FairGPopp, TPR=0.9 | 0.63 0.027 | 0.73 0.014 | 0.66 0.031 | 0.60 0.016 | 0.72 0.011 | 0.62 0.011 | 0.16 0.051 | 0.66 0.005 |

FairGPopp*, TPR=0.9 | 0.64 0.036 | 0.75 0.019 | 0.68 0.036 | 0.58 0.014 | 0.73 0.014 | 0.61 0.012 | 0.16 0.073 | 0.66 0.005 |

FairGPopp, TPR=0.8 | 0.54 0.021 | 0.67 0.017 | 0.73 0.033 | 0.67 0.013 | 0.65 0.014 | 0.68 0.012 | 0.22 0.054 | 0.67 0.007 |

FairGPopp*, TPR=0.8 | 0.57 0.030 | 0.69 0.017 | 0.73 0.026 | 0.65 0.013 | 0.67 0.014 | 0.67 0.009 | 0.19 0.065 | 0.67 0.007 |

FairGPopp, TPR=0.7 | 0.48 0.023 | 0.61 0.015 | 0.78 0.018 | 0.72 0.014 | 0.59 0.012 | 0.73 0.012 | 0.25 0.055 | 0.67 0.008 |

FairGPopp*, TPR=0.7 | 0.52 0.025 | 0.63 0.014 | 0.77 0.023 | 0.69 0.014 | 0.62 0.012 | 0.71 0.009 | 0.19 0.056 | 0.67 0.007 |

FairGPopp, TPR=0.6 | 0.42 0.028 | 0.56 0.017 | 0.81 0.019 | 0.75 0.017 | 0.54 0.013 | 0.76 0.013 | 0.28 0.084 | 0.66 0.007 |

FairGPopp*, TPR=0.6 | 0.48 0.023 | 0.59 0.017 | 0.78 0.022 | 0.72 0.018 | 0.57 0.014 | 0.73 0.012 | 0.20 0.064 | 0.66 0.007 |

GP | 0.43 0.033 | 0.58 0.018 | 0.82 0.028 | 0.75 0.017 | 0.56 0.013 | 0.77 0.017 | 0.31 0.094 | 0.67 0.007 |

GP* | 0.28 0.040 | 0.63 0.018 | 0.93 0.023 | 0.72 0.017 | 0.57 0.014 | 0.77 0.016 | 0.77 0.137 | 0.68 0.008 |

SVM | 0.44 0.031 | 0.57 0.019 | 0.85 0.029 | 0.78 0.017 | 0.55 0.016 | 0.80 0.013 | 0.28 0.081 | 0.69 0.007 |

ZafarEqOpp | 0.42 0.036 | 0.56 0.016 | 0.82 0.038 | 0.76 0.015 | 0.54 0.012 | 0.77 0.016 | 0.29 0.101 | 0.67 0.008 |