Agnostic data debiasing through a local sanitizer learnt from an adversarial network approach

Agnostic data debiasing through a local sanitizer learnt from an adversarial network approach

Ulrich Aïvodji Université du Québec à Montréal405 Sainte-Catherine Street EastMontréalQuébecCanadaH2L 2C4 François Bidet Ecole PolytechniqueRoute de SaclayPalaiseau CedexFrance91128 Sébastien Gambs Université du Québec à Montréal405 Sainte-Catherine Street EastMontréalQuébecCanadaH2L 2C4 Rosin Claude Ngueveu ngueveu.rosin˙ Université du Québec à Montréal405 Sainte-Catherine Street EastMontréalQuébecCanadaH2L 2C4  and  Alain Tapp Université de Montréalsuccursale Centre-villeMontréalQuébecCanadaH3C 3J7

The widespread use of automated decision processes in many areas of our society raises serious ethical issues concerning the fairness of the process and the possible resulting discriminations. In this work, we propose a novel approach called GANSan whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our sanitization algorithm GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular the Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions.

In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. As a consequence, once the sanitizer is trained, it can be applied to new data, such as for instance, locally by an individual on his profile before releasing it. Finally, experiments on a real dataset demonstrate the effectiveness of the proposed approach as well as the achievable trade-off between fairness and utility.

Sanitization, Fairness, Generative Adversarial Network.
copyright: rightsretaineddoi: 10.475/123_4isbn: 123-4567-24-567/08/06conference: Machine Learning Research; 2020; Barcelona, Spainjournalyear: 2020

1. Introduction

In recent years, the availability and the diversity of large-scale datasets of personal information, the algorithmic advancements in machine learning and the increase in computational power have led to the development of personalized services and prediction systems to such an extent that their use is now ubiquitous in our society. For instance, machine learning-based systems are now used in banking for assessing the risk associated with loan applications (CredScoPatent), in hiring system (FalRecr2012) and in predictive justice to quantify the recidivism risk of an inmate (Center2016).

Despite their usefulness, the predictions performed by these algorithms are not exempt from biases, and numerous cases of discriminatory decisions have been reported over the last years. For instance, going back on the case of predictive justice, a study conducted by ProPublica showed that the recidivism prediction tool COMPAS, which is currently used in Broward County (Florida), is strongly biased against black defendants, by displaying a false positive rate twice as high for black persons than for white persons (JuliaAng2016). If the dataset exhibits strong detectable biases towards a particular sensitive group (e.g., an ethnic or minority group), the straightforward solution of removing the attribute that identified the sensitive group would only prevent direct discrimination. Indeed, indirect discrimination can still occur due to correlations between the sensitive attribute and other attributes.

In this article, we propose a novel approach called GANSan (for Generative Adversarial Network Sanitizer) to address the problem of discrimination due to the biased underlying data.

In a nutshell, our approach aims to learn a sanitizer (in our case a neural network) transforming the input data in a way that maximize the following two metrics : (1) fidelity, in the sense that the transformation should modify the data as little as possible, and (2) non-discrimination, which means that the sensitive attribute should be difficult to predict from the sanitized data.

A possible use case could be the recruitment process of referees for an amateur sport organization. In particular, in this situation, the selection should be primarily based on the merit of applicants, but at the same time, the institution might be aware that the data used to train a model to automatize this recruitment process might be highly biased according to race.

In practice, approaches such as the Rooney Rule have been proposed and implemented to foster diversity for the recruitment of the coaches in the National Football League (NFL-USA) as well as in other industries.

To address this issue, the institution could use our approach to sanitize the data before applying a merit-based algorithm to select the referee on the sanitized data.

Another typical use case might be one in which a company during its recruitment process offers to job applicants a tool to remove racial correlation in their personal information before submitting their sanitized profile on the job application platform. If built appropriately, this tool would make the recruitment process of the company free from racial discrimination as it never had access to the original profile.

Overall, our contributions can be summarized as follows.

  • We propose a novel approach in which a sanitizer is learned from the original data. The sanitizer can then be applied on a profile in such way that the sensitive attribute is removed, as well as existing correlations with other attributes while ensuring that the sanitized profile is modified as little as possible, preventing both direct and indirect discriminations. Our sanitizer is strongly inspired by Generative Adversarial Networks (GANs) (NIPS2014_5423), which have been highly successful in terms of applications.

  • Rather than building a fair classifier, our objective is more generic in the sense that we aim at debiasing the data with respect to the sensitive attribute. Thus, one of the main benefits of our approach is that the sanitization can be performed without having any knowledge regarding the tasks that are going to be conducted in the future on the sanitized data. In addition, as the sensitive attribute can refer to any characteristic of the user, we believe GANSan  applies to the broader context of data anonymization.

  • Another strength of our approach is that once the sanitizer has been learned, it can be used directly by an individual to generate a modified version of his profile that still lives in the same representation space, but from which it is very difficult to infer the sensitive attribute. In this sense, our method can be considered to fall under the category of randomized response techniques (RanResp1965) as it can be used locally by a user to sanitize his data and thus does not require his true profile to be sent to a trusted third party. Of all of the approaches that currently exist in the literature to reach algorithmic fairness (ACompFairEnInter2018), we are not aware of any other work that has considered the local sanitization with the exception of (romanelli2019generating), which focuses on the protection of privacy but could also be applied to enhance fairness.

  • To demonstrate its usefulness, we have evaluated our approach on a real dataset by analyzing the achievable trade-off between fairness and utility measured both in terms of the perturbations introduced by the sanitization framework but also with respect to the accuracy of a classifier learned on the sanitized data.

The outline of the paper is as follows. First, in Section 2, we introduce the system model before reviewing the background notions on fairness and GANs. Afterwards, in Section 3, we review the related work on methods for enhancing fairness belonging to the preprocessing approach like ours before describing GANSan in Section 4. Finally, we evaluate experimentally our approach in Section 5 before concluding in Section 6.

2. Preliminaries

In this section, we first present the system model used in this paper. Then, we review background notions on fairness metrics and generative adversarial networks.

2.1. System model

In this paper, we consider the generic setting of a dataset composed of records. Each record typically corresponds to the profile of the individual and is made of attributes, which can be categorical, discrete or continuous. Amongst those, two attributes are considered as being special. First, the sensitive attribute S (e.g., gender, ethnic origin, religious belief, …) should remain hidden to prevent discrimination. Second, the decision attribute is typically used for a classification task (e.g., accept or reject an individual for a loan). The other attributes of the profile, which are neither S nor , will be referred hereafter as A.

For simplicity, in this work we restricted ourselves to the situations in which these two attributes are binary (i.e., and ). However, our approach could also be generalized easily to multivalued attributes, although quantifying fairness for multivalued attributes is much more challenging than for binary ones (kearns2017preventing). Our main objective is to prevent the possibility of inferring the sensitive attribute from the sanitized data.

2.2. Fairness metrics

First, we would like to point out that there are many different definitions of fairness existing in the literature (narayanan21def2018; ACompFairEnInter2018; verma2018fairness; corbett2017algorithmic; dwork2012fairness; joseph2016fairness) and that the choice of the appropriate definition is highly dependent of the context considered.

For instance, one natural approach for defining fairness is the concept of individual fairness (dwork2012fairness), which states that individuals that are similar except for the sensitive attribute should be treated similarly, and thus should receive similar decisions. This notion relates to the legal concept of disparate treatment (barocas2016big), which occurs if the decision process was made based on sensitive attributes. This definition is relevant when discrimination is due to prejudices caused by the decision process. Therefore, it cannot be used in the situation in which the objective is to directly redresses biases in the data.

In contrast to individual fairness, group fairness relies on statistic of outcomes of the subgroups indexed by S and can be quantified in several ways, such as demographic parity (berk2018fairness) and equalized odds (hardt2016equality). More precisely, the demographic parity corresponds to the absolute difference of rates of positive outcomes in the sensitive and default groups (for which respectively and ):


while equalized odds is the absolute difference of odds in each subgroup:


Equalized odds (hardt2016equality) requires the equality of true and false positives: . Compared to demographic parity, the equalised odds is more suitable when the base rates in both groups differ (). Note that these definitions are agnostic to the cause of the discrimination and are based solely on the assumption that statistics of outcomes should be similar between subgroups.

In our work, we follow a different line of research by defining fairness in terms of the inability to infer S from other attributes (Feldman2014; xu2018fairgan). This approach stems from the observation that it is impossible to discriminate based on the sensitive attribute if the latter is unknown and cannot be predicted. Thus, our approach aims at altering the data in such a way that no classifier should be able to infer the sensitive attribute from the sanitized data.

The inability to infer the attribute S is measured by the accuracy of a predictor Adv trained to recover the hidden S (sAcc), as well as the balanced error rate (BER) introduced in (Feldman2014):


The BER captures the predictability of both classes and a value of can be considered optimal for protecting against inference in the sense that it means that the inferences made by the predictor are not better than a random guess. In addition, the BER is more relevant than the accuracy of a classifier at predicting the sensitive attribute for datasets with imbalanced proportions of sensitive and default groups. To summarize, the objective of a successful sanitization is to cause a significant drop of the sensitive accuracy while raising the BER close to its optimal value of .

2.3. Generative adversarial network

Generative Adversarial Network (GAN) is a relatively novel approach in machine learning introduced to solve the difficult problem of modelling and learning high dimensional distributions (e.g., pictures). Typically, in a GAN, two neural networks compete against each other in a zero-sum game framework (GanIan2014). The first neural network hereafter referred to as the generator aims to learn to generate from noise data close enough to a given distribution , its production being the final objective of the GAN. The second neural network hereafter referred to as the discriminator learn to discriminate whether a given sample originates from the generator or from the training data.

Despite its intuitive aspect and the fact that GANs are powerful tools for modelling distributions, training a GAN can be difficult and often require important engineering efforts for ensuring its success (zhu2017unpaired). For instance, during the training phase, if the discriminator (respectively the generator) outperforms its counterpart by a large margin, the later will not be able to catch-up and improve its performance.

Mirza and Osindero (mirza2014conditional) propose an extension of GAN, called CycleGAN, in which the goal is to learn to translate between two similar distributions. Our approach, GANSan , is inspired by the framework of GANs and CycleGANs in the sense that our objective is to learn to remove the dependency between the sensitive attribute and the other attributes without having an explicit description of these dependencies and by solely relying on the ability of an adversary to distinguish (or not) between the sensitive and default groups.

3. Related work

In recent years, many approaches have been developed to enhance the fairness of machine learning algorithms. Most of these techniques can be classified into three families of approaches, namely (1) the preprocessing approach (CRWAA-ALFR2016; Feldman2014; VFAE2015; Zemel2013) in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), (2) the algorithmic modification approach (also sometimes called constrained optimization) in which the learning algorithm is adapted to ensure that it is fair by design (FairnessConsZafar2015; kamishima2012fairness) and (3) the postprocessing approach that modifies the output of the learning algorithm to increase the level of fairness (Kamiran2010; hardt2016equality). We refer the interested reader to (ACompFairEnInter2018) for a recent survey comparing the different fairness enhancing methods. Due to the limited space and as our approach falls within the preprocessing approach, we will review afterwards only the main methods of this category.

Among the seminal works in fairness enhancement, in (Feldman2014) the authors have developed a framework that consists in translating conditional distributions of each of the datasets’ attributes by shifting them towards a median distribution. While this approach is straightforward, it does not take into account unordered categorical attributes as well as correlations that might arise due to a combination of attributes, which we address in this work.

Zemel and co-authors (Zemel2013) have proposed to learn a fair representation of data based on a set of prototypes, which preserves the outcome prediction accuracy and allows an accurate reconstruction of original profiles. Each prototype can equally identify groups based on sensitive attribute values. This technique has been one of the pioneering work in mitigating fairness by changing the representation space of the data.

However, for this approach to work, the definition of the set of prototypes is highly critical. In the same direction, the authors in (calmon2017optimized) have learned an optimal randomized mapping for removing group-based discrimination while limiting the distortion introduced at profiles and distributions levels to preserve utility. Similarly, Louizos and co-authors (VFAE2015) used a variational auto-encoder (kingma2013auto) to improve fairness by choosing a prior distribution independently of the group membership and removing differences across groups with the maximum mean discrepancy (gretton2007kernel).

In addition, several approaches have been explored to enhance fairness based on adversarial learning. For instance, Edwards and Storkey (CRWAA-ALFR2016) have trained an encoder to output a representation from which an adversary is unable to predict the group membership accurately, but from which a decoder can reconstruct the data and on which decision predictor still performs well. Madras, Creager, Pitassi and Zemel (madras2018learning) extended this framework to satisfy the equality of opportunities (hardt2016equality) constraint and explored the theoretical guarantees for fairness provided by the learned representation as well as the ability of the representation to be used for different classification tasks. Beutel, Chen, Zhao and Chi (beutel2017data) have studied how the choice of data affects the fairness in the context of adversarial learning. One of the interesting results of their study is the relationship between statistical parity and the removal of the sensitive attribute, which demonstrates that learning a representation independent of the sensitive attribute with a balanced dataset (in terms of the sensitive and default groups) ensures statistical parity. Zhang, Lemoine and Mitchell (zhang2018mitigating) have designed a decision predictor satisfying group fairness by ensuring that an adversary is unable to infer the sensitive attribute from the predicted outcome. Afterwards, Wadsworth, Vera and Piech (wadsworth2018achieving) have applied the latter framework in the context of recidivism prediction, demonstrating that it is possible to significantly reduce the discrimination while maintaining nearly the same accuracy as on the original data. Sattigeri and co-authors (sattigeri2018fairness) have developed a method to cancel out bias in high dimensional data, such as multimedia data, using adversarial learning. Finally, McNamara, Ong and Williamson (mcnamara2019costs) have investigated the benefits and drawbacks of fair representation learning. In particular, they demonstrated that techniques building fair representations restrict the space of possible decisions, hence providing fairness but also limiting the possible usages of the resulting data.

While these approaches are effective at addressing fairness, one of their common drawbacks is that they do not preserve the interpretability of the data. One notable exception is the method proposed by Xu, Yuan, Zhang and Wu (xu2018fairgan) called FairGan, which is the closest to ours, even though their objective is to learn a fair classifier on a dataset that has been generated such that it is discrimination-free and whose distribution on attributes is close to the original one. Our approach further diverges from this work by the fact that their approach is a direct application of the original GAN framework coupled with a second adversary (whose task is to reconstruct the sensitive attribute from samples that successfully fooled the first discriminator), while ours can be rightfully compared to an auto-encoder coupled with the same adversary. Following a similar line of work, there is a growing body of research investigating the use of adversarial training to protect the privacy of individuals during the collection or disclosure of data. For instance, Feutry, Piantanida, Bengio and Duhamel (feutry2018learning) have proposed an anonymization procedure based on the learning of three sub-networks: an encoder, an adversary and a label predictor. The authors have ensured the convergence of these three networks during training by proposing an efficient optimization procedure with bounds on the probability of misclassification. Pittaluga, Koppal and Chakrabarti (pittaluga2019learning) have designed a procedure based on adversarial training to hide a private attribute of a dataset. While the aforementioned approaches do not consider the interpretability of the representation produced, Romanelli, Palamidessi and Chatzikokolakis (romanelli2019generating) have designed a mechanism to create a dataset that preserves the original representation. They have developed a method for learning an optimal privacy protection mechanism also inspired by GAN (tripathy2017privacy), which they have applied to location privacy. Here, the objective is to minimize the amount of information (measured by the mutual information) preserved between the sensitive attribute and the prediction made on the decision attribute by a classifier while respecting a bound on the utility of the dataset.

Drawing from existing works in the privacy field, Ruggieri (Ruggieri2014) showed that the -closeness anonymization technique (t_closeness) can be used as a preprocessing approach to control discrimination as there is a close relationship between -closeness and group fairness. In addition, local sanitization approaches (also called randomized response techniques) has been investigated for the protection of privacy. More precisely, one of the benefits of local sanitization is that there is no need to centralize the data before sanitizing it, thus limiting the trust assumptions that an individual has to make on external entities when sharing his data. For instance, Wang, Hu and Wu (wang2016using) have applied randomized response techniques achieving differential privacy during the data collection phase to avoid the need to have an untrusted party collecting sensitive information. Similarly to our approach, the protection of information takes place at the individual level as the user can randomize his data before publishing it. The main objective is to produce a sanitized dataset in which global statistical properties are preserved, but from which it is not possible to infer the sensitive information of a specific user. In the line of work, Du and Zhan (du2003using) have proposed a method for learning a decision tree classifier on this sanitized data. However, none of these previous works have taken into account the fairness aspect. Thus, while our method also falls within the local sanitization approaches, in the sense that the sanitizer can be applied locally by a user, our initial objective is quite different as we aim at preventing the risk of discrimination. Nonetheless, at the same time, our method also protects against attribute inference with respect to the sensitive attribute.

4. Local sanitization for data debiasing

As explained previously, removing the sensitive attribute from the data is rarely sufficient to guarantee fairness as correlations are likely to exist between other attributes and the sensitive one. Those correlations can be straightforward like attributes including direct information on the sensitive one but can also very well be more complex such as a non-linear combination of several attributes. In general, detecting complex correlations between attributes as well as suppressing them is a difficult task.

To address this issue, our approach GANSan  relies on the modelling power of GANs to build a sanitizer that can cancel out correlations with the sensitive attribute without requiring an explicit model of those correlations. In particular, it exploits the capacity of the discriminator to distinguish the subgroups indexed by the sensitive attribute. Once the sanitizer has been trained, any individual can locally apply it on his profile before disclosing it to ensure that the sensitive information is hidden. The sanitized data can then be safely used for any subsequent task.

4.1. Generative adversarial network sanitization

High level overview.

Formally, given a dataset , the objective of GANSan is to learn a function , called the sanitizer that perturbs individual profiles of the dataset , such that a distance measure called the fidelity (in our case we will use the norm) between the original and the sanitized datasets (), is minimal, while ensuring that S cannot be recovered from . Our approach differs from classical conditional GAN (mirza2014conditional) by the fact that the objective of our discriminator is to reconstruct the hidden sensitive attribute from the generator output, whereas the discriminator in classical conditional GAN has to discriminate between the generator output and samples from the true distribution.

The high-level overview of the training of GANSan is as follows:

  • The first step corresponds to the training of the sanitizer (Algorithm 1, Lines ). The sanitizer can be seen as the generator similarly to standard GAN but with a different purpose. In a nutshell, it learns the empirical distribution of the sensitive attribute and generate a new distribution that concurrently respects two objectives: (1) finding a perturbation that will fool the discriminator in predicting S while (2) minimizing the damage introduced by the sanitization. More precisely, the sanitizer takes as input the original dataset (including S and Y) plus some noise . The noise is used to prevent the over-specialization of the sanitizer on the training set while making the reverse mapping of sanitized profiles to their original versions more difficult.

  • The second step consists in training the discriminator for predicting the sensitive attribute from the data produced by the sanitizer (Algorithm 1, Lines ). The rationale of our approach is that the better the discriminator is at predicting the sensitive attribute S, the worse the sanitizer is at hiding it and thus the higher the potential risk of discrimination.

These two steps are used iteratively until convergence of the training. Figure 1 presents the high-level overview of the training procedure, while Algorithm 1 describes it in details.

: S Original.

Original data: (Starting point)

Sanitizer (Generator )

Sanitized data ()


: S predicted.





Figure 1. Sanitization framework. The objective of the discriminator is to predict S  from the output of the sanitizer . The two objective functions that the framework aims at minimizing are respectively the discriminator and sanitizer losses, namely and .
1:Inputs: , , , ,
2:Output: ,
3: Initialization
4:, ,
6:for e  do
7:     for i  do
8:         Sample batch B of size from
9:         : extract S column from
12: Compute the reconstruction loss vector
14: compute the sensitive loss
16: concatenate the previously computed loss
18:         for   do
19: Back-propagation using
20:              Backpropagate
21:              Update weights
22:         end for
23:         for l  do
24:              Sample batch of size from
25:              : extract S column from
26:               = MSE(, )
27:              Backpropagate
28:              Update weights
29:         end for
30:     end for
31:     Save and states
32:end for
Algorithm 1 GANSan Training Procedure

Training GANSan .

Let be the prediction of S by the discriminator (). Its objective is to accurately predict S, thus it aims at minimizing the loss . In practice in our work, we instantiate as the Mean Squared Error (MSE).

Given an hyperparameter representing the desired trade-off between the fairness and the fidelity, the sanitizer minimizes a loss combining two objectives:


in which is on the sensitive attribute. The term is due to the objective of maximizing the error of the discriminator (i.e., recall that the optimal value of the BER is ).

Concerning the reconstruction loss , we have first tried the classical Mean Absolute Error (MAE) and MSE losses. However, our initial experiments have shown that these losses produce datasets that are highly problematic in the sense that the sanitizer always outputs the same profile whatever the input profile, thus making it unusable. Therefore, we had to design a slightly more complex loss function. More precisely, we chose not to merge the respective losses of these attributes (), yielding a vector of attribute losses whose components are iteratively used in the gradient descent. Hence, each node of the output layer of the generator is optimized to reconstruct a single attribute from the representation obtained from the intermediate layers. The vector formulation of the loss is as follows: and the objective is to minimize all its components. We are planning to conduct a more in-depth analysis of the vector formulation as well as its interactions with different optimization techniques used in future works. The details of the parameters used for the training are given in Appendices D and E.

4.2. Performance metrics

The performance of GANSan will be evaluated by taking into account the fairness enhancement and the fidelity to the original data. With respect to fairness, we will quantify it primarily with the inability of a predictor , hereafter referred to as the adversary, in inferring the sensitive attribute (cf. Section 2) using primarily its Balanced Error Rate (BER) (Feldman2014) and its accuracy sAcc (cf., Section 2.2). We will also assess the fairness using metrics (cf. Section 2) such as the demographic parity (Equation 1) and the equalized odds (Equation 2).

To measure the fidelity between the original and the sanitized data, we have to rely on a notion of distance. More precisely, our approach does not require any specific assumption on the distance used, although it is conceivable that it may work better with some than others. For the rest of this work, we will instantiate by the -norm as it does not differentiate between attributes.

Higher fidelity is not a sufficient condition to imply a good reconstruction of the dataset. In fact, early experiments showed that the sanitizer might find a “median” profile to which it will map all input profiles. Thus, to quantify the ability of the sanitizer to preserve the diversity of the dataset, we introduce the diversity measure, which is defined in the following way :


While quantifies how different the original and the sanitized datasets are, the diversity measures how diverse the profiles are in each dataset. We will also quantitatively discuss the amount of damage for a given fidelity and fairness to provide a better understanding of the qualitative meaning of the fidelity.

Finally, we evaluate the loss of utility induced by the sanitization by relying on the accuracy of prediction on a classification task. More precisely, the difference in between a classifier trained on the original data and one trained on the sanitized data can be used as a measure of the loss of utility introduced by the sanitization with respect to the classification task.

5. Experimental evaluation

In this section, we describe the experimental setting used to evaluate GANSan  as well as the results obtained.

5.1. Experimental setting

Dataset description.

We have evaluated our approach on Adult Census Income, available at the UCI repository111 Adult Census reports the financial situation of individuals, with 45222 records after the removal of rows with empty values. Each record is characterized by 15 attributes among which we select, the gender (i.e., male or female) as the sensitive one and the income level (i.e., over or below 50K$) as the decision.

Dataset Adult Census
Group Sensitive (, Female) Default (, Male)
Table 1. Distribution of the different groups with respect to the sensitive attribute and the decision one on Adult Census Income.

Training process.

We will evaluate GANSan  using metrics among which the fidelity , the as well as the demographic parity (cf. Section 4.2). For this, we have conducted a -fold cross-validation during which the dataset is divided into ten blocks. During each fold, 8 blocks are used for the training, while another one is retained as the validation set and the last one as the test set.

We computed the and using the discriminator of GANSan  and three external classifiers independent of the GANSan framework, namely Support Vector Machines (SVM) (Cortes1995), Multilayer Perceptron (MLP) (PopescuMLP) and Gradient Boosting (GB) (FriedmanGB). For all these external classifiers and all epochs, we report the space of achievable points with respect to the fidelity/fairness trade-off.

For each fold and each value of , we train the sanitizer during epochs. At the end of each epoch, we save the state of the sanitizer and generate a sanitized dataset on which we compute the , and . Afterwards, is used to select the sanitized dataset that is closest to the ideal point ().

More precisely, is defined as follows: with referring to the minimum value of obtained with the external classifiers. For each value of the hyper-parameter , selects among the sanitizers saved at the end of each epoch, the one achieving the highest fairness in terms of for the lowest damage.

We will use the same families of external classifiers for computing the metrics , and .

We also used the same chosen test set to conduct a detailed analysis of its reconstruction’s quality ( and quantitative damage on attributes).

5.2. Evaluation scenarios

Recall that GANSan  takes as input the whole sanitized dataset (including the sensitive and the decision attributes) and outputs a sanitized dataset (without the sensitive attribute) in the same space as the original one, but from which it is impossible to infer the sensitive attribute. In this context, the overall performance of GANSan  can be evaluated by analyzing the reachable space of points characterizing the trade-off between the fidelity to the original dataset and the fairness enhancement. More precisely, during our experimental evaluation, we will measure the fidelity between the original and the sanitized data, as well as the , both in relation with the and , computed on this dataset.

However, in practice, the sanitized dataset can be used in several situations. In the following, we detail four scenarios that we believe as representing most of the possible use cases of GANSan . To ease the understanding, we will use the following notation: the subscript (respectively ) will denote the data in the training set (respectively test set). For instance, in which can either be , , or , represents respectively the attributes of the original training set (not including the sensitive and the decision attributes), the decision in the original training set, the attributes the sanitized training set and the decision attribute in the sanitized training set. Table 6 (cf. Appendix B) summarizes the notation used while in Table 2 we describe the composition of the training and the testings sets for these four scenarios.

In details, the scenarios that we considered for our evaluation are the following.

Scenario 1 : complete data debiasing.

This procedure corresponds to the typical use of the sanitized dataset, which is the prediction of a decision attribute through a classifier. The decision attribute is also sanitized as we assumed that the original decision holds information about the sensitive attribute. Here, we quantify the accuracy of prediction of as well as the discrimination represented by the demographic parity gap (Equation 1) and the equalized odds gap (Equation 2) defined in Section 2.

Scenario 2 : partial data debiasing.

In this scenario, just like the previous one, the training and the test sets are sanitized with the exception that the sanitized decision in both these datasets is replaced with the original one . This scenario is generally the one considered in the majority of paper on fairness enhancement (Zemel2013; CRWAA-ALFR2016; madras2018learning), the accuracy loss in the prediction of the original decision between this classifier and another trained on the original dataset without modifications is a straightforward way to quantify the utility loss due to the sanitization.

Scenario 3 : building a fair classifier.

This scenario was considered in (xu2018fairgan) and is motivated by the fact that the sanitized dataset might introduce some undesired perturbations (e.g. changing the education level from Bachelor to PhD). Thus, a third party might build a fair classifier but still apply it directly on the unperturbed data to avoid the data sanitization process and the associated risks. More precisely in this scenario, a fair classifier is obtained by training it on the sanitized dataset to predict the sanitized decision . Afterwards, this classifier is tested on the original data () by measuring its fairness through the demographic parity (Equation 1, Section 2). We also compute the accuracy of the fair classifier with respect to the original decision of the test set .

Scenario 4 : local sanitization.

The local sanitization scenario corresponds to a private use of the sanitizer. For instance, the sanitizer could be used as part of a mobile phone application providing individuals with a mean to remove some sensitive attributes from their profile before disclosing it to an external entity. In this scenario, we assume the existence of a biased classifier, trained to predict the original decision on the original dataset . The user has no control on this classifier, but he is allowed nonetheless to perform the sanitization locally on his profile before submitting it to the existing classifier. This classifier is applied on the sanitized test set and its accuracy is measured with respect to the original decision as well as the fairness enhancement quantified by the .

Scenario Train set composition Test set composition
Baseline Original Original Original Original
Scenario 1 Sanitized Sanitized Sanitized Sanitized
Scenario 2 Sanitized Original Sanitized Original
Scenario 3 Sanitized Sanitized Original Original
Scenario 4 Original Original Sanitized Original
Table 2. Scenarios envisioned for the evaluation of GANSan . Each set is composed of either the original attributes (not taking into account the sensitive or decision attributes) or their sanitized versions, coupled with either the original decision or its sanitized counterpart.

All scenarios require the use of a sanitized version of the dataset (either or or both) for either training the model or computing results (the decision accuracy , and ). We use to select the version of the dataset to use. In fact, for each value of , we generate a new version of the sanitized dataset at the end of each epoch.

5.3. Experimental results

General results.

Figure 2 describes the achievable trade-off between fairness and fidelity obtained using the sanitizer. First of all, we can observe that fairness improves with the increase in the value of the as expected. Even with (i.e., maximum utility regardless of the fairness), we cannot reach a perfect fidelity to the original data as we get at most (cf. Figure 2). Increasing the value of from to a low value such as provides a fidelity close to the highest possible (), but leads to a BER that is poor (i.e., not higher than . Nonetheless, we still have a fairness enhancement, compared to the original data (, ).

At the other extreme of the spectrum in which , the data is sanitized without any consideration on the fidelity. In this case, the is optimal as expected and the fidelity is lower than the maximum achievable (). However, slightly decreasing the value of , for instance setting , allows the sanitizer to completely remove the unwarranted correlations () with a cost of on fidelity ().

Figure 2. Fidelity/fairness trade-off on Adult census income. Each point represents the minimum possible of all the external classifiers. The fairness improves with the increase of , a small value providing a low fairness guarantee while a high one introduce greater damage to the sanitize data.
Figure 3. Boxplots of the quantitative analysis of sanitized datasets selected using . These metrics are computed on the whole sanitized dataset. Modified records correspond to the proportion of records with categorical attributes affected by the sanitization.

With respect to , the accuracy drops significantly when the value of increases (see Appendix A for details). Here, the optimal value is the proportion of the majority class, which GANSan  renders the accuracy of predicting S from the sanitized set closer to that value. However, even at the extreme , it is nearly impossible to reach this optimal value. Similarly to BER, slightly decreasing from this extreme value by setting improves the sanitization of the dataset while preserving a fidelity closer to the maximum achievable.

To summarize, GANSan  makes the inference of a S nearly impossible for all of the external classifiers we have tested. However, this sanitization has a cost on the fidelity of the resulting dataset compared to the original one.

The quantitative analysis with respect to the impact on diversity is shown in Figure 3. More precisely, the smallest drop of diversity obtained is , which is achieved when we set . Among all values of , the biggest drop observed is . The application of GANSan , therefore introduces an irreversible perturbation as observed with the fidelity. This loss of diversity implies that the sanitization reinforces the similarity between sanitized profiles as increases, rendering them almost identical or mapping the input profiles to a small number of stereotypes.

When is in the range (i.e., complete sanitization), of categorical attributes have a proportion of modified records between and (cf. Figure 3). With the exception of the extreme sanitization (), at least of records in the dataset have a relative change lower than for most of the numerical attributes. For lower values of (), we observe that at least of records in the sanitzed dataset have a relative change of .

Selecting leads to of records being modified with a relative change less than .

In Table 3, the same profile (which was the most damaged one in fold 1) was tracked across different folds and we show how the sanitization process affects it. We can see that the modifications applied to the profile accross different folds are not deterministic.

Attrs Original Fold 1 Fold 3 Fold 4
age 42 49.58 32.17 50.5
workclass State Federal Self-emp-not-inc Self-emp-not-inc
fnlwgt 218948 192102.77 250047 214678
education Doctorate Bachelors Doctorate HS-grad
education-num 16 9.393 10.89 10.3191
marital-status Divorced Married-civ-spouse Married-civ-spouse Married-civ-spouse
occupation Prof-specialty Adm-Clerical Transport-moving Adm-clerical
relationship Unmarried Husband Husband Husband
race Black White White White
Capital Gain 0 0 0 0
Capital Loss 0 0 0 0
hours-per-week 36 47.04 40.50 38.7
native-country Jamaïca Peru United-States United-States
Income 0 0 0 0
Damage Value
Table 3. Most damaged profile for for the first fold and the same profile obtained at the end of the sanitization for fold 3 and fold 4.
Figure 4. Accuracy (blue), demographic parity gap (orange) and equalized odds gap (true positive rate in green and false positive rate in red) computed for scenarios 1, 2, 3 and 4 (top to bottom), with the classifiers GB, MLP and SVM (left to right).

To sum up, GANSan  is able to maintain an important part of the dataset structure during its sanitization, making it usable for many other analysis tasks. For the different scenarios investigated hereafter, we fixed the value of to as we observed that in our general analysis, such value provides nearly a perfect level of sensitive attribute protection, while leading to an acceptable damage.

Scenario 1 : complete data debiasing.

In this scenario, we observe that GANSan  preserves the accuracy of the dataset. More precisely, it increases the accuracy of the decision prediction on the sanitized dataset for all classifiers (cf. Figure 4, Scenario S1), compared to the original one which is , and respectively for GB, MLP and SVM. This increase can be explained by the fact that GANSan  modifies the profiles to make them more coherent with the associated decision, by removing correlations between the sensitive attribute and the decision one. As a consequence, this sets the same decision to similar profiles in both the protected and the default groups. As a matter of fact, nearly the same distributions of decision attribute are observed before and after the sanitization but some record’s decisions are shifted ( of decision shifted in the sanitized whole set, of decision shifted in the sanitized sensitive group for ). Such decision shift could be explained by the similarity between those profiles to others with the opposite decisions in the original dataset.

We also believe that the increase of accuracy is correlated with the drop of diversity. More precisely, if profiles become similar to each other, the decision boundary might be easier to find. We present in Table 4 the shift of decision proportion across the different folds for . We observe that in some cases, the sanitizer transforms the binary decision column almost into a single-valued one. We leave the study of how GANSan  affects the decision boundary as future work.

Original Max Min Mean Std
Table 4. Proportion of the positive decision attribute across the different folds for .

The discrimination is reduced as observed through , and , which all exhibit a negative slope. When correlations with the sensitive attribute are significantly removed (), those metrics also significantly decrease. For instance, at , , , , for GB; whereas as the original demographic parity gap and equalised odds gap are respectively , . See Tables 10 and 12 in Appendices for more detailed results. In this setup, FairGan (xu2018fairgan) achieves a BER of an accuracy of and a demographic parity of .

To summarize, GANSan  increases the consistency between the sanitized set and the sanitized decision, thus diminishing the discrimination. However, a careful analysis of these results should be carried out as some unwanted effect could arise due to the loss of diversity.

Scenario 2 : partial data debiasing.

Unexpectedly, we observe an increase in accuracy for most values of alpha. The demographic parity gap also decreases while the equalized odds remains nearly constant (, green line on Figure 4). Table 5 compare the results obtained to other existing work from the state-of-the-art. We include the classifier with the highest accuracy (MLP) and the one with the lowest one (SVM). From these results, we can observe that our method outperforms the others in terms of accuracy, but the demographic parity is best achieved with the work done in (zhang2018mitigating) (), which is not surprising as this method is precisely tailored to reduce this metric.

Even though our method is not specifically constrained to mitigate the demographic parity, we can observe that it significantly enhance it. This while partial data debiasing is not the best application scenario for our approach as the original decision still incorporates correlation with the sensitive attribute, it still mitigates its effect to some extent.

Authors yAcc DemoParity
LFR  (Zemel2013)
MUBAL (zhang2018mitigating) 0.01
LATR (madras2018learning) 0.84
GANSan  (S2) - MLP, 0.91 0.01
GANSan  (S2) - SVM, 0.85 0.04
Table 5. Comparison with other works on the basis of accuracy and demographic parity on Adult Census.

Scenario 3 : building a fair classifier.

The sanitizer helps to reduce discrimination based on the sensitive attribute, even when using the original data on a classifier trained on the sanitized one. As presented in the third row of Figure 4, as we force the system to completely remove the unwarranted correlations, the discrimination observed when classifying the original unperturbed data are reduced. On the other hand, the accuracy exhibits here the highest negative slope with respect to all the scenarios investigated. We observe a drop of for the best classifier in terms of accuracy on the original set. This decrease of accuracy is explained by the difference of correlations between and and between and . As the fair classifiers are trained on the sanitized set ( and ), the decision boundary obtained is not relevant for and .

This scenario raises the question of the legitimacy of perturbations induced by the sanitization in critical situations.

Further investigations are required to asses this issue.

FairGan (xu2018fairgan), which also investigated this scenario achieved and whereas our best classifier in accuracy (GB) achieves and for .

Scenario 4 : local sanitization.

Just as in the other scenarios, the more the correlations with the sensitive attribute are removed, the higher the drop of discrimination as quantified by the , as well as , and the lower the accuracy on the original decision attribute. For instance, with GB, we obtain , at (the original values were and ). With MLP which has the best DemoParity, we observe: , This proves that GANSan  can be used locally, for instance, deployed on a smartphone, allowing users to contribute to large datasets by sanitizing and sharing their information without relying on any third-party, with the guarantee that the sensitive attribute GANSan  has been trained for is removed. The drop of accuracy due to the local sanitization is on GB ( with MLP). Thus, for application requiring a time-consuming training phase, using GANSan  to sanitize profiles without retraining the classifier seems to be a good compromise.

6. Conclusion

In this work, we have introduced GANSan , a novel preprocessing method inspired by GANs achieving fairness by removing the correlations between the sensitive attribute and the other attributes of the profile. Our experiments demonstrate that GANSan  can prevent the inference of the sensitive attribute while limiting the loss of utility as measured in terms of the accuracy of a classifier learned on the sanitized data as well as by the damage on the numerical and categorical attributes. In addition, one of the strengths of our approach is that it offers the possibility of local sanitization, by only modifying the attributes as little as possible while preserving the space of the original data (thus preserving interpretability). As a consequence, GANSan  is agnostic to subsequent use of data as the sanitized data is not tied to a particular data analysis task.

While we have relied on three different types of external classifiers for capturing the difficulty to infer the sensitive attribute from the sanitized data, it is still possible that a more powerful classifier exists that could infer the sensitive attribute with higher accuracy. Note that this is an inherent limitation of all the preprocessing techniques and not only our approach. Nonetheless, as future work we would like to investigate other families of learning algorithms to complete the range of external classifiers. Much work still needs to be done to assess the relationship between the different notions of fairness, namely the impossibility of inference and the individual and group fairness.

Finally, to assess the strength of the protection level provided by our approach, and other preprocessing techniques in general, we are planning to develop a reconstruction attack whose objective is to infer an original record from its sanitized version and possibly some auxiliary information such as the structure of the sanitizer. As we mentioned in the related work, there has been some previous works done on the concept of local sanitization. However, none of those approaches have been investigated with the fairness point of view. As future work, we will apply such local privacy technique and compare the performances of privacy techniques for fairness with this current work. Conversely, we will compare the performances of GANSan  with respect to privacy metrics.



Appendix A Sensitive attribute accuracy

Figure 5. Fidelity-Fairness trade-off on Adult census income. Each point represents the minimum possible of all the external classifiers. The decreases with the increase of , a small value providing a low fairness guarantee while a larger one usually introduced a higher damage.
Figure 6. Cumulative distribution of the relative change (x-axis) for numerical attributes, versus the proportion of records affected in the dataset (y-axis). Further details about these results are available in Appendix G.

Appendix B Notations used for scenarios

original decision
original decision in the training set.
original decision in the test set.
original attributes (not including the sensitive and the decision attributes).
original attributes in the training set.
original attributes in the test set.
sanitized decision.
sanitized decision in the training set.
sanitized decision in the test set.
sanitized attributes (not including the sensitive and the decision attributes).
sanitized attributes in the training set.
sanitized attributes in the test set.
Table 6. Notations used to differentiate the evaluation scenarios.

Appendix C Qualitative observation of GANSan  output

In Tables 7 and 8, we present the records that have been maximally and minimally damaged due to the sanitization.

Attrs Original Fold 1
age 42 49.58
workclass State Federal
fnlwgt 218948 192102.77
education Doctorate Bachelors
education-num 16 9.393
marital-status Divorced Married-civ-spouse
occupation Prof-specialty Adm-Clerical
relationship Unmarried Husband
race Black White
hours-per-week 36 47.04
native-country Jamaica Peru
damage value
Attrs Original Fold 4
age 29 49.01
workclass Self-emp-not-inc Without-pay
fnlwgt 341672 357523.5
education HS-grad Doctorate
education-num 9 7.674
marital-status Married-spouse-absent Married-civ-spouse
occupation Transport-moving Protective-serv
relationship Other-relative Husband
race Asian-Pac-Islander Black
hours-per-week 50 40.37
native-country India Thailand
damage value
Attrs Original Fold 3
age 38 31.65
workclass Federal-gov Self-emp-not-inc
fnlwgt 37683 245776.230
education Prof-school Doctorate
education-num 15 13
marital-status Never-married Married-civ-spouse
occupation Prof-specialty Handlers-cleaners
relationship Not-in-family Husband
race Asian-Pac-Islander White
capital-gain 11.513 0
hours-per-week 57 43.5
native-country Canada Portugal
damage Value
Table 7. Most damaged profiles for on the first three folds. Only the perturbed attributes are shown.
Attrs Original
age 49 49.4
workclass Federal-gov Federal-gov
fnlwgt 157569 193388
education HS-grad HS-grad
education-num 9 9.102
marital-status Married-civ-spouse Married-civ-spouse
occupation Adm-Clerical Adm-Clerical
relationship Husband Husband
race White White
capital-gain 0 0
capital-loss 0 0
hours-per-week 46 44.67
native-country United-States United-States
income 0 0
Attrs Original
age 35 29.768
workclass Private Private
fnlwgt 241998 179164
education HS-grad HS-grad
education-num 9 8.2765
marital-status Never-married Never-married
occupation Sales Farming-fishing
relationship Not-in-Family Not-in-Family
race White White
capital-gain 8.474 0
capital-loss 0 0
hours-per-week 40 42.434
native-country United-States United-States
income 1 0
Attrs Original
age 42 49.58
workclass State Federal
fnlwgt 218948 192102.77
education Doctorate Bachelors
education-num 16 9.393
marital-status Divorced Married-civ-spouse
occupation Prof-specialty Adm-Clerical
relationship Unmarried Husband
race Black White
capital-gain 8.474 0
capital-loss 0 0
hours-per-week 36 47.04
native-country Jamaïca Peru
income 0 0
Table 8. Minimally damaged profile, profile with damage at of the max and most damaged profile for for the first fold.

Appendix D Preprocessing of the dataset

Because our approach relies on neural networks, we need to apply standard preprocessing methods on the data to ensure that the training of GANSan will converge.

This preprocessing consists first in transforming categorical and numerical attributes with less than 5 values into binary ones, which is call the one-hot encoding in machine learning. For instance, the categorical attribute becomes and with the corresponding binary value being respectively and . Each other attribute is also normalized between the possible minimum and maximum values. Afterwards a scaling between and . In addition on the Adult dataset, we need to apply first a logarithm on two columns: namely the and . This step is required by the fact that those attributes exhibit a distribution close to a Dirac delta (Dirac1930-DIRTPO), with the maximal values being respectively and , and a median of for both (respectively and of records have a value of ). Since most values are equal to , the sanitizer will always nullify both attributes and the approach will not converge.

When applying GANSan a postprocessing step also needs to be performed on the output of the sanitizer (i.e., neural network) that mostly consists in undoing the preprocessing steps, plus remapping the generated data to the original space. This remapping ensures that the values generated by the sanitizer will fit in the original range of the attribute.

Appendix E Hyper-parameters tuning

Table 9 details the parameters of the classifiers that have yielded the best results respectively on the Adult and German credit datasets. The training ratio represents the number of iterations on which each instance has been trained during the sanitization of a single batch. More precisely for a given iteration , the discriminator is trained with records while the sanitizer is trained with records. The number of iterations is determined by the ratio of the dataset size with respect to the batch size. In simple terms, iterations is the number of batches needed to complete only one epoch (). Our experiments were run for a total of 40 epochs, each epoch represent a complete presentation of the dataset to be learned (the entire dataset is passed forward and backward through the classifier only once). We varied the value using a geometric progression:

Sanitizer Discriminator
Layers 3x Linear 5 x Linear
Learning Rate (LR)
Hidden Activation ReLU ReLU
Output Activation LeakyReLU LeakyReLU
Losses VectorLoss MSE
Training ratio 1 50
Batch size
Optimizers Adam Adam
Table 9. Hyper parameters tuning for Adult dataset.

Appendix F GANSan  numerical attributes relative change

Numerical attributes differs from the categorical ones in the sense that the damage is not total, thus we cannot compute the proportion of records whose values are changed by the sanitization. For those numerical attributes, we compute the relative change (RC) normalized by the mean of the original and sanitized values:


We normalize the RC using the mean (since all values are positives) as it allows us to handle situations in which the original values are equal to . If both the sanitized and the original values are equal to , we simply set the to . This would have not been possible using only the deviation (percentage of change).

Appendix G Evaluation of group-based discrimination

We present our results of group-based discrimination in table 10. We computed both the demographic parity and the equalized odds metrics as presented in the system model and fairness definitions section. In table 12, we present the protected attribute level (scenario S1) for all classifiers.

All these results are computed with

Dataset Classifier
Baseline S1 S2 S3 S4
Adult GB
Dataset Classifier
Baseline S1 S2 S3 S4
Adult GB
Dataset Classifier DemoParity
Baseline S1 S2 S3 S4
Adult GB
Table 10. Equalized odds and demographic parity.

Appendix H GANSan  utilities

Dataset Classifier yAcc
Baseline S1 S2 S3 S4
Adult Census GB
fid diversity
Dataset Baseline S1 Baseline S1
Adult Census
Table 11. Evaluation of GANSan ’s utility.
Dataset Classifier BER sAcc
Baseline Sanitized Baseline Sanitized
Adult GB
Table 12. Evaluation of GANSan ’s sensitive attribute protection.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description