Adversarial Removal of Gender from Deep Image Representations
In this work we analyze visual recognition tasks such as object and action recognition, and demonstrate the extent to which these tasks are correlated with features corresponding to a protected variable such as gender. We introduce the concept of “natural leakage” to measure the intrinsic reliance of a task on a protected variable. We further show that machine learning models of visual recognition trained for these tasks tend to exacerbate the reliance on gender features. To address this, we use adversarial training to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network. Experiments on two datasets: the COCO dataset (objects), and the imSitu dataset (actions), show reductions in the extent to which models rely on gender features while maintaining most of the accuracy of the original models. These results even surpass a strong baseline that blurs or removes people from images using ground-truth annotations. Moreover, we provide convincing interpretable visual evidence through an autoencoder-augmented model showing that this approach is performing semantically meaningful removal of gender features, and thus can also be used to remove gender attributes directly from images.
Visual recognition systems have made great progress toward practical applications that directly affect people. However, these systems often make unwarranted implicit associations, and present a risk of amplifying societal stereotypes about people. Negative outcomes can range from issues concerning representation harm (e.g., male software engineers are being over-represented in image search results ), issues concerning inclusiveness and awareness (e.g., facial recognition software does not work for a subset of the population ), to life-threatening situations (e.g., the recognition rates of pedestrian detection in autonomous vehicles are not as accurate for all groups of people). As computer vision techniques have been widespread in human-centric applications, it is crucial to understand different biases encoded in the formulation of visual recognition models and design appropriate approaches to make them agnostic to protected variables such as gender, race, or age.
In this paper, we find that some visual tasks such as object recognition, and action recognition from static images, exhibit intrinsic biases with respect to gender. We introduce the notion of natural leakage by the degree to which a classifier with varying degrees of access to the true annotations for a task is a good predictor of a protected variable such as gender. Using this measure, we demonstrate that some tasks inherently correlate with protected variables due to the difference of label distribution conditioning on protected variables in the training data. Moreover, we measure the model leakage for a particular visual recognition model by similarly measuring the extent to which its outputs are correlated with a protected variable. We find that models trained on a biased task tend to amplify those biases under leakage measures and not merely replicate them. While our work focuses on gender due to the availability of annotations, it can be applied to any arbitrary variable.
As a solution, we propose to use convolutional neural networks (CNNs) to build feature representations that capture as much task specific information from its inputs (such as the appearance of objects or actions), but at the same time keep out information about a protected variable (such as information about gender). Our approach is based on the adversarial training approach, which has been used to hide protected variables in other machine learning tasks [24, 2, 28], or more generally to train generative adversarial networks (GANs) . Our model is successful in removing model leakage while maintaining the accuracy of a similar model trained without any adversarial constraints. Specific to images, we further proposed an approach based on autoencoder to visualize the removal of visual features associated with gender (see example in Figure 1), and find that the model can successfully remove things such as faces, body parts, and even some contextual cues such as pink objects that are strongly correlated with gender in the dataset.
In the image domain, adversarial removal of high-level information such as the gender of people depicted in images is challenging, as people are often interacting with other objects and their surroundings in complex ways. A successful approach would remove features associated with a person but not enough to not be able to classify the action depicted, or the objects interacting with the person on the scene. This is in contrast to other domains where categorical features are explicitly provided to the model. In those cases, although not ideal, it is possible to apply the naive approach of directly removing gender from the input feature set. We demonstrate that adversarial removal works in removing features with respect to a protected variable but works best when applied to specific intermediate representations output by the last convolutional layers.
Our contributions stand as follows:
We introduce the notion of natural leakage to analyze the extent to which complex prediction tasks in visual recognition are intrinsically correlated with a protected variable such as gender.
We perform extensive experiments showing that adversarial training for removing leakage is effective and provide concrete recommendations for its use with visual recognition models such as ResNet .
We propose an autoencoder-augmented model that allows removal of gender features directly from the input representation in order to obtain interpretable visual results.
2 Related Work
In recent years, researchers have demonstrated that machine learning models tend to replicate the societal biases present in the training data that they learn from. Concerns have been raised for applications such as recommender systems , credit score prediction , online news , and many others  and in response various approaches have been proposed to mitigate bias [1, 13]. However, most previous work deals with issues of resource allocation [7, 9] and the focus is on improving the calibration of predictions. Furthermore, works often assume that the protected variable is explicitly defined as a feature and that the goal of the calibration is more clearly defined (e.g., equal odds or equal opportunity). However in object recognition, we must infer multiple variables jointly in order to make coherent decisions and the representations for protected attributes are automatically inferred from raw data.
More recently, there has been work addressing different types of biases in image data [22, 29, 23, 4, 18, 5]. Xie et al  propose a setup for removing image brightness as the protected variable in an image classification task. Our work is, to our knowledge, the first attempt to remove attributes in images for high-level feature such as gender and in the more general scenario of multi-label prediction. Zhao et al  addresses bias in the COCO and imSitu dataset but the focus is on structured prediction models where predicting gender is a target task of these models. Moreover, the proposed method in  still focuses on calibrating the predictions of the structured model and not representation learning. Similar in spirit to our work, Burns et al  attempt to calibrate gender predictions of a captioning system by modifying the input image. In contrast, in our work we do not aim at predicting gender, which is the more common scenario, therefore calibration methods would not be effective to debias the predictions in our proposed setup, as gender is not one of the outputs.
There has also been considerable work in learning fair representations in other domains. For example, word embedding models have been shown to carry gender stereotypes [6, 3] that affect downstream applications such as coreference resolution [30, 21]. Bolukbasi et al  debiased word embeddings by finding a gender direction in the embedding space in order to reproject and build gender-neutral representations. Zemel et al  similarly propose reprojecting the input representations into a set of prototype feature embeddings to mitigate bias. Our method is more related to the type of debiasing approaches that use adversarial training [28, 2, 24, 8, 31, 10] during the learning of the representations. We provide further details about this family of methods in the body of the paper.
In terms of evaluation, researchers have proposed different measurements for quantifying fairness in machine learning [12, 16, 7]. In contrast to these works, we try to address removal of bias in the feature space, therefore we adopt and further develop the idea of leakage as an evaluation criteria, as proposed for the debiasing of text representations used by Elazar and Goldberg . We explore the leakage formulation and further test its capacity for measuring the extent of removal of protected attributes from feature representations and further propose dataset leakage, predictive leakage, natural leakage, and model leakage as measures of bias in learned representations.
Many problems in computer vision inadvertently reveal demographic information (e.g., gender) about people in images. For example, in COCO, images of plates are significantly more common with women than men and so if a model predicts that a plate is in the image, we can infer there is likely a woman in the image. We refer to this notion as leakage. In this section, we present (1) formal definitions of leakage for datasets and models, and (2) an adversarial method for reducing it. We will show in Section 5 that both imSitu and COCO leak significant information and in Section 6 show how to construct smaller versions of these datasets that do not exhibit dataset leakage.
In this section, we discuss four types of leakage: (1) dataset leakage, (2) prediction leakage, (3) natural leakage, and (4) model leakage.
We assume we are given an annotated dataset containing instances , where is an image annotated with a set of task-specific labels (e.g., objects), and a protected attribute (e.g., the image contains a person with perceived gender male or female).111In this paper, we assume gender as binary due to the available annotations, but the work could be extended to non-binary, as well as a broader set of protected attributes, such as race or age. We say that a particular annotation leaks information about if there exists a function such that . We refer to this as an attacker because it tries to reverse engineer information about protected attributes in the input image only from its task-specific labels . To measure the leakage across a dataset, we train such an attacker and evaluate it on held out data. The performance of the attacker, the fraction of instances in that leak information about through , yields an estimate of leakage:
where is the indicator function.
Similar to dataset leakage, we would like to measure the degree to which the predictions of some model, , leak information about the protected variable . We define model leakage as the percentage of examples in , that given a predictor, which produces task label , leaks information about . To measure such prediction leakage, we train a different attacker to extract patterns in that reveal information about :
where , and is an attacker function trained to predict gender from the outputs of model .
A model may not be perfectly accurate in predicting , as the attacker learns to extract from the model prediction . If the accuracy of is lower, the performance of the attacker will be naturally lower as well (i.e., leakage is smaller). In fact, if were totally random, we expect that an attacker will not be able to extract any information about (and later in our experiments, we use a baseline based on random perturbations). Therefore, directly comparing the leakage of two models with different accuracy is not fair. To appropriately calibrate our measures, we define natural leakage as the expected leakage if were randomly corrupted to achieve a similar accuracy as . This corrupted measure reflects the amount of information a model would leak about a protected attribute if its mistakes were randomly generated. Formally we define natural leakage at performance level as follows:
where in this case is an attacker for a corrupted random predictor with accuracy .
Formally, we define model leakage with respect to its performance level, , as the difference between the prediction leakage minus the natural leakage at performance level .
A model for which is greater than zero leaks more information about gender than we would expect even from simply accomplishing the task defined by the dataset. This represents a type of amplification on the reliance on protected attributes to accomplish the prediction task. We show later in Section 5 that all models we evaluated leak more information than we would expect and even leak information when the dataset does not.
Creating an Attacker
Ideally, the attacker should be a Bayes optimal classifier, which makes the best possible prediction of by . However, in practice, we need to train a model to do this prediction for every model, and we use a deep neural network to do so. Yet, we are not guaranteed that we have obtained the best possible function for mapping to . As such, it is important to consider reported leakage as a lower bound on true leakage. In practice, we find that we can robustly estimate (see Section 5).
3.2 Adversarial Reduction of Leakage
We propose a simple formulation for reducing the amount of leakage in a model, summarized in Figure 2. We hypothesize that models leak extra information about protected attributes because the underlying representation is overly sensitive to features related to those attributes. As such, we encourage models to build representations from which protected attributes can not be predicted.
Our methods rely on the construction of a critic, , which attempts to predict protected information from an intermediate representation, for a given image , of a predictor, . The critic attempts to minimize a loss over the amount of information it can extract:
While the predictor tries to minimize its loss over the task specific predictions while increasing the critic’s loss:
In both cases, is the cross-entropy loss, and when optimizing we do not update , and trade-off task performance with sensitivity to protected attributes with .
We also experiment with directly generating images from which an adversary could not extract protected information by using the output of an autoencoder as the input to our model. This is similar to the proposed experiment in Palacio et al  where the outputs of an autoencoder were fed to a convolutional neural network trained to recognize objects in order to interpret the patterns learned by the network. In these cases, we add an additional loss to the predictor:
Where is the output of the autoencoder. So the first term is encouraging to maintain the information in the original image, the second term is trying to obtain correct task-specific predictions, and the third term is adversarially trying to obscure gender.
4 Experimental Setup
We consider two datasets. First, the COCO dataset , consisting of images annotated with objects corresponding to categories, and textual descriptions for each image. In this dataset we perform multi-label object classification. Additionally we adopt the imSitu dataset  where each image is annotated with a structured tuple consisting the name of an activity (a verb) and the participants in the activity (roles). In this dataset, we perform activity recognition in the form of a multi-class classification task.
This paper follows the setup of existing work for studying bias in COCO and imSitu . In COCO, we examine object classification. We use image captions from COCO as a proxy for annotations of gender of the main entity in the image: an image is considered “male” if the word “man” is present in the caption or “female” if word “woman” is in the caption. In imSitu, we consider situations (activities are annotated in addition to objects and semantics roles, a classification of how objects are participating in activities). We say an image is “male” if we can find the word “man” in the gloss of the “agent” of an activity and female if we can find the word “woman” in the gloss. Finally, for the purpose of our analysis, we exclude “person” category from COCO categories. For imSitu, we filter non-human oriented activity categories that do not contain more than images associated with men or women.
For both COCO object classification and imSitu activity recognition, we use a standard ResNet-50 convolutional neural network pretrained on Imagenet (ILSVRC) as the underlying model by replacing the last linear layer as appropriate. When constructing attackers, we use a 4-layer multi-layer perceptron (MLP) with BatchNorm and LeakyReLU in between for both dataset and model leakage estimates. Prediction leakage was predicted from pre-activation logits while dataset leakage was predicted from binary labels. Attackers were evaluated on an equal quantity of male and female images, sampled from the original development sets.
We evaluate using mAP, or the mean across categories of the area under the precision-recall curve, and F1 for both object and activity classification by using the discrete output predictions of the model.
For comparability, all models are developed and evaluated on dev and test sets from the original data (even when we modify the composition of a training set). In adversarial training, we always first train linear layers for classification with learning rate and batch size until the performance plateaus. We incorporate adversarial training when we fine-tune the model end to end using a learning rate . Before we start adversarial training, we first train the gender classification branch so that its gradients provide useful guidance for feature removal during adversarial training. To compute leakage in COCO, we randomly sample male images and female imags for training, and male and female images as dev and test set respectively. For imSitu, we randomly sample male images and female images for training, and male and female images as dev and test set respectively. If the dataset size is smaller than , we use all of them as training data. We use learning rate throughout the training of the attacker. As training data is not always gender balanced, we sample the same amount of male and female images in every batch to encourage the gender classifier focus on gender features.
5 Data and Model Leakage
|1 layer , ———- , all data|
|2 layer , 100 dim , all data|
|2 layer , 300 dim , all data|
|4 layer , 300 dim , all data|
|4 layer , 300 dim , 75% data|
|4 layer , 300 dim , 50% data|
|4 layer , 300 dim , 25% data|
In this section we summarize our findings showing that both imSitu and COCO leak information about gender. We also show that models trained on these datasets not only leak information but actually leak more information than would be expected. Finally, we show a method for constructing a leakage free dataset by removing examples. Unfortunately, we find that models trained on such datasets still leak significant amounts of information about gender, while performing significantly worse on the underlying classifications problems. Table 1 summarizes our results.
Dataset leakage measures the degree to which ground truth labels can be used to estimate gender. The rows corresponding to “original” in Table 1 summarize dataset leakage in imSitu and COCO. Both datasets leak significant information about gender: the gender of a main entity in the image is extractable from ground truth annotations 70% and 75% of the time for COCO and imSitu respectively. The skew in numbers of men and woman in the dataset does not alone account for the dataset leakage implying that not only are men more represented in these datasets but that the labels also exhibit unequal associations with gender.
Model leakage measures the degree to which model outputs can be used to estimate gender. Model leakage needs to calibrated with respect to the underlying accuracy of the predictor. To do so, we compute natural leakage by randomly changing ground truth labels to simulate models at different accuracy. Figure 3 shows natural leakage at different performance levels in COCO and imSitu. The relationship between F1 and leakage is roughly linear. For all models, we compare the leakage with respect to natural leakage at the appropriate F1 score, taking the difference between prediction leakage and natural leakage. In all models trained on original datasets, prediction leakage is high. Surprisingly, imSitu is more gender balanced and has lower natural leakage than COCO but models trained on imSitu actually leak significantly more information than those trained on COCO, suggesting that the models are over relying on gender cues to make predictions.
Alternative Data Splits
While the original imSitu and COCO datasets leak information about gender through their labels, it is possible to construct datasets which leak less through subsampling. We obtain splits more balanced in male and female co-occurrences with labels by imposing the constraint that neither gender occurs more frequently with any output label by a ratio greater than :
where and are the number of occurrences of men with label and of women with label respectively. Enforcing this constraint in imSitu is trivial because each image is only annotated with one verb: we simply sample the over-represented gender until it fails the above constraints. For COCO dataset, since each image contains multiple object annotations, we must heuristically enforce this constraint. We try to make every object satisfy this constraint one at a time, removing images from the dataset that have the smallest number of objects. But doing so may make some other labels violate the constraint, so we iterate through all objects until this process converges and all objects satisfy the constraint. We create splits for for both datasets.222Practically satisfying is in-feasible, but our heuristic is able to find a set where .
As we expect, in Table 1, decreasing values of yields smaller datasets with less dataset leakage. Unsurprisingly, models trained on these datasets yield worse predictors because there is less data. Yet model leakage does not reduce as quickly as dataset leakage. In fact, in cases where data leakage is nearly zero, models still leak information. Likely this is because it is impossible to balance unlabeled co-occurring features with gender (e.g. COCO only has annotations for objects) and the models still rely on these features to make predictions.
Learning Attackers is Robust
Measuring leakage relies on being able to consistently estimate an attacker. To verify that leakage estimates are robust to different architectures and data settings on the attacker side, we conduct an ablation study in Table 2. We train to measure prediction leakage on the original COCO dataset, varying attacker architecture, and the amount of training data used. Beyond prediction with an attacker that is just 1 layer, none of the others vary in their estimation of leakage by more than 2 points.
6 Adversarial Reduction of Leakage
|adv @ image|
|adv @ conv4|
|adv @ conv5|
In this section we evaluate reducing leakage through adversarial training (as described in Section 3.2). We also present some qualitative examples of what our methods choose to remove from images.
We consider three different types of adversaries which try to remove leakage at different stages in a ResNet-50 classification network.
adv @ image Removing gender information directly at the image. To do this, we train an additional U-Net which predicts a mask over the image. The original image is point-wise multiplied with this mask and then given to two branches. The first branch is a ResNet-18 which attempts to detect gender (the adversary) and the second branch is a ResNet-50 that attempts to classify. See Figure 2 for more details.
adv @ conv4 Removing gender information from an intermediate hidden representation of ResNet-50 (on the 4th convolutional block). We use a an adversary with 3 convolutional layers and 5 linear layers.
adv @ conv5 Removing gender information from the final layer before object classification of ResNet-50. We use a linear adversary which takes as input a vectorized form of the output feature map and uses a 5 layer MLP for classification.
|adv @ image|
|adv @ conv4|
|adv @ conv5|
We consider several alternatives to adversarial training to reduce leakage, including some that have access to ground truth detections or segmentations.
Original: the original model trained on the original data, without any attempt to reduce leakage.
Randomization: adding random noise to the pre-classification embedding layer of the original model. We consider adding Gaussian noise at increasing magnitudes. We expect larger perturbations to remove more leakage while preventing the model from effectively classifying images.
Alternative Datasets: we also consider constructing new datasets for imSitu and COCO through downsampling approaches that reduce dataset leakage (as defined in Section 5). We refer to this as , as defined in that section.
Blur: blurring people masks in the image when ground truth segments are available (COCO only). Many features of the people will still be visible but harder to extract.
Blackout - Segm: blacking out people masks in the image when ground truth segments are available (COCO only). This obscures features such as skin, clothing while leaving pose. It may also obscure objects people are closely interacting with.
Blackout - Box: blacking out people bounding boxes in the images (COCO and imSitu). This removes large regions of the image around people, likely removing many objects and body pose cues.
Our results on COCO object classification and imSitu activity recognition are in Table 3 and Table 4. Adversarially trained methods offer significantly better trade-offs between leakage and performance than any other method. We are able to reduce model leakage by over 75% and 60% percent on COCO and imSitu, respectively while suffering only 1.2 and 3.04 mAP performance degradation. Figure 4 and Figure 5 further highlight that our methods are making extremely favorable trade-offs between leakage and performance. Adversarial training is the only method that consistently improves upon simply adding noise to the model representation before prediction (the red curves).333The red curves do not pass through the green star (original) perfectly because the red line is a quadratic regression over samples.
6.3 Qualitative Results
In order to obtain interpretable results, we also proposed using a U-Net autoencoder as input to our model so that gender features can be removed in image space. Fig. 6 shows original images paired with their version after gender removal. In some instances our method removes the entire person, in some instances only the face, in other cases clothing, and garments that might be strongly associated with gender. Our approach learns to selectively obscure pixels enough to make gender prediction hard but leaving sufficient information to predict other things, especially objects that need to be recognized such as hot dog, tie, umbrella, surfboard, etc. This is in contrast to our strong baselines that remove the entire person instances using ground-truth segmentation masks. A more sensible compromise is learned through the adversarial removal of gender.
In this paper we introduced dataset leakage, prediction leakage, natural leakage, and model leakage as measures of the encoded bias with respect to a protected variable in either datasets or trained models. We also demonstrated a method for the adversarial removal of features associated with a protected variable from the intermediate representations learned by a convolutional neural network. Our approach is superior to applying various forms of random perturbations in the representations, and to applying image manipulations that have access to significant privileged information about the protected variable. We expect that the setup, methods, and results in this paper will be useful for further studies of representation bias in computer vision.
-  A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach. A reductions approach to fair classification. Conference on Fairness, Accountability and Transparency, 2017.
-  A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. Conference on Fairness, Accountability and Transparency, 2017.
-  T. Bolukbasi, K. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 4349–4357, 2016.
-  J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In S. A. Friedler and C. Wilson, editors, Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pages 77–91, New York, NY, USA, 23–24 Feb 2018. PMLR.
-  K. Burns, L. A. Hendricks, T. Darrell, and A. Rohrbach. Women also snowboard: Overcoming bias in captioning models. European Conference on Computer Vision (ECCV), 2018.
-  A. Caliskan, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186, 2017.
-  C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012.
-  Y. Elazar and Y. Goldberg. Adversarial removal of demographic attributes from text data. Empirical Methods in Natural Language Processing (EMNLP), 2018.
-  M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. pages 259–268, 2015.
-  Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  M. Hardt, E. Price, N. Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.
-  T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning, pages 1929–1938, 2018.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 770–778, 2016.
-  M. Kay, C. Matuszek, and S. A. Munson. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 3819–3828. ACM, 2015.
-  M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems (NIPS), pages 4069–4079, 2017.
-  T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014.
-  I. Misra, C. Lawrence Zitnick, M. Mitchell, and R. Girshick. Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels. In CVPR, pages 2930–2939, 2016.
-  S. Palacio, J. Folz, J. Hees, F. Raue, D. Borth, and A. Dengel. What do deep networks like to see? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3108–3117, 2018.
-  K. Ross and C. Carter. Women and news: A long and winding road. Media, Culture & Society, 33(8):1148–1165, 2011.
-  R. Rudinger, J. Naradowsky, B. Leonard, and B. V. Durme. Gender bias in coreference resolution. In North American Chapter of the Association for Computational Linguistics (NAACL), June 2018.
-  H. J. Ryu, M. Mitchell, and H. Adam. Improving smiling detection with race and gender diversity. Proceedings of FAT/ML 2018, 2017.
-  P. Stock and M. Cisse. Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and model criticism. arXiv preprint arXiv:1711.11443, 2017.
-  Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig. Controllable invariance through adversarial feature learning. In Advances in Neural Information Processing Systems (NIPS), pages 585–596, 2017.
-  S. Yao and B. Huang. Beyond parity: Fairness objectives for collaborative filtering. In Advances in Neural Information Processing Systems (NIPS), pages 2925–2934, 2017.
-  M. Yatskar, L. Zettlemoyer, and A. Farhadi. Situation recognition: Visual semantic role labeling for image understanding. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-  R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
-  B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. Proceedings of AIES, 2018.
-  J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2017.
-  J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang. Gender bias in coreference resolution: Evaluation and debiasing methods. In North American Chapter of the Association for Computational Linguistics (NAACL), 2018.
-  J. Zhao, Y. Zhou, Z. Li, W. Wang, and K.-W. Chang. Learning gender-neutral word embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2018.