Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

Balanced Datasets Are Not Enough:
Estimating and Mitigating Gender Bias in Deep Image Representations

Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez
University of Virginia, University of California Los Angeles,
Allen Institute for Artificial Intelligence,,,,

In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables –such as gender– in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets. Surprisingly, we show that even when datasets are balanced such that each label co-occurs equally with each gender, learned models amplify the association between labels and gender, as much as if data had not been balanced! To mitigate this, we adopt an adversarial approach to remove unwanted features corresponding to protected variables from intermediate representations in a deep neural network – and provide a detailed analysis of its effectiveness. Experiments on two datasets: the COCO dataset (objects), and the imSitu dataset (actions), show reductions in gender bias amplification while maintaining most of the accuracy of the original models.

1 Introduction

While visual recognition systems have made great progress toward practical applications, they are also sensitive to spurious correlations and often depend on these erroneous associations. When such systems are used on images containing people, they risk amplifying societal stereotypes by over associating protected attributes such as gender, race or age with target predictions, such as object or action labels. Known negative outcomes have included representation harms (e.g., male software engineers are being over-represented in image search results [11]), harms of opportunity, (e.g., facial recognition is not as effective for people with different skin tones [3]), to life-threatening situations (e.g., recognition rates of pedestrians in autonomous vehicles are not equally accurate for all groups of people [32]).

Figure 1: On the top we illustrate our newly introduced concept of Dataset Leakage which measures the extent to which gender –or more generally a protected variable– can be inferred from randomly perturbed ground-truth labels. On the bottom we illustrate our concept of Model Leakage which measures the extent to which gender can be inferred from the outputs of a model. A model amplifies bias if model leakage exceeds dataset leakage.

In this paper we study gender bias amplification: the effect that trained models exaggerate gender stereotypes that are present in the training data. We focus on the tasks of recognizing objects in the COCO dataset [16] and actions in the imSitu dataset [36], where training resources exhibit gender skew and models trained on these datasets exhibit bias amplification [39].111For example women are represented as cooking twice as often as men in imSitu, but after models are trained and evaluated on similarly distributed data, they predict cooking for women three times as often as men. In an effort to more broadly characterize bias amplification, we generalize existing measures of bias amplification. Instead of measuring the similarity between training data and model prediction distributions, we compare the predictability of gender from ground truth labels (dataset leakage, Figure 1 on the top) and model predictions (model leakage, Figure 1 on the bottom). Each of these measures is computed using a classifier that is trained to predict gender from either ground truth labels or models predictions. We say a model exhibits bias amplification if it leaks more information about gender than a classifier of equivalent accuracy whose errors are only due to chance.

Our new leakage measures significantly expand the types of questions we can ask about bias amplification. While previously it was shown that models amplify bias when they are required to predict gender alongside target variables [39], our empirical findings indicate that when models are not trained to predict gender, they also amplify gender bias. Surprisingly, we find that if we additionally balance training data such that each gender co-occurs equally with each target variable, models amplify gender bias as much as in unbalanced data! This strongly argues that naive attempts to control for protected attributes when collecting datasets will be ineffective in preventing bias amplification.

Figure 2: In our bias mitigation approach, we learn a task-specific model with an adversarial loss that removes features corresponding to a protected variable from an intermediate representation in the model – here we illustrate our pipeline to visualize the removal of features in image space through an auto-encoder network.

We posit that models amplify biases in the data balanced setting because there are many gender-correlated but unlabeled features that cannot be balanced directly. For example in a dataset with equal number of images showing men and women cooking, if children are unlabeled but co-occur with the cooking action, a model could associate the presence of children with cooking. Since children co-occur with women more often than men across all images, a model could label women as cooking more often than we expect from a balanced distribution, thus amplifying gender bias.

To mitigate such unlabeled spurious correlations, we adopt an adversarial debiasing approach [34, 2, 38, 6]. Our goal is to preserve as much task specific information as possible while eliminating gender cues either directly in the image or intermediate convolutional representations used for classification. As seen in Figure 2, models are trained adversarially to trade off a task-specific loss while trying to create a representation from which it is not possible to predict gender. For example, in Figure 3 in the bottom right image, our method is able to hide regions that indicate the gender of the main entity while leaving enough information to determine that she is weight lifting.

Evaluation of our adversarial debiased models show that they are able to make significantly better trade-offs between task accuracy and bias amplification than other methods. We consider strong baselines that include masking or blurring out entities by having access to ground truth mask annotations for people in the images. We also propose a baseline that simply adds noise to intermediate representations – thus reducing the ability to predict gender from features, but often at a significant compromise in task accuracy. Of all methods considered, only adversarial debiasing provided a better trade-off compared to randomizing model predictions, and we were able to reduce bias amplification by 53-67% while only sacrificing 1.2 - 2.2 points in accuracy.

Figure 3: Images after adversarial removal of gender when applied to the image space. The objective was to preserve as much information about objects, e.g. scissors, banana (COCO) or vaulting, lifting (imSitu) while removing gender correlated features. (Left side: woman & man; right side: man & woman.)

2 Related Work

Recently, researchers have demonstrated that machine learning models tend to replicate societal biases present in training datasets. Concerns have been raised for applications such as recommender systems [35], credit score prediction [9], online news [24], and others [11] and in response various approaches have been proposed to mitigate bias [1, 10]. However, most previous work deals with issues of resource allocation [5, 7] where the focus is on calibrating predictions. Furthermore, works in this domain often assume protected variables are explicitly specified as features, making the goal of calibration more clearly defined. However in visual recognition, representations for protected attributes are automatically inferred from raw data.

More recently, there has been work addressing different types of biases in images [25, 39, 27, 3, 20, 4]. Zhao et al [39] addresses bias in the COCO and imSitu datasets but the focus is on structured prediction models where gender is part of the target variables. Burns et al [4] attempt to calibrate gender predictions of a captioning system by modifying the input image. In contrast, our work focuses on models that are not aimed at predicting gender, which is a more common scenario, therefore calibration methods would not be effective to debias the predictions in our proposed setup, as gender is not one of the outputs.

Our work is motivated by previous efforts on adversarial debiasing in various other tasks and domains [38, 2, 34, 6, 40, 8]. We provide further details about this family of methods in the body of the paper, and adopt this framework for debiasing the intermediate results of deep neural networks. Our work advances the understanding of this area by exploring what parts of deep representations are the most effective to debias under this approach, and we are the first to propose a way to visualize such debiased representations.

Issues of dataset bias have been addressed in the past the computer vision community [30, 12, 29]. Torralba and Efros [30] showed that it was possible to identify the source dataset given image samples for a wide range of standard datasets, and [12] addresses this issue by learning shared parameters across datasets. More recently, Tommasi et al [29] provided a fresher perspective on this issue using deep learning models. There are strong connections with these prior works when dataset source is to be taken as a protected variable. Our notion of bias is more closely related to the notion of bias used in the fairness in machine learning literature, where there is protected variable (e.g. gender) for which we want to learn unbiased representations (e.g. [37]).

In terms of evaluation, researchers have proposed different measurements for quantifying fairness in machine learning [9, 15, 5]. In contrast to these works, we try to address removal of bias in the feature space, therefore we adopt and further develop the idea of leakage as an evaluation criteria, as proposed in the debiasing of text representations used by Elazar and Goldberg [6]. We significantly expand the leakage formulation and propose dataset leakage, and model leakage as measures of bias in learned representations.

Building models under fairness objectives is also more generally related to feature disentangling methods [28, 22, 17, 18, 19]. However, most research in feature disentangling has focused on the more restricted domain of facial analysis – where there is generally more well aligned features. This general area of work is also related to efforts in building privacy preserving methods [31, 26, 33, 13], where the objective is to obfuscate the input while still being able to perform a recognition task. In contrast, in fairness methods, there is no requirement to obfuscate the inputs, and in particular the method proposed in this paper is most effective when applied to intermediate feature representations.

3 Leakage and Amplification

Many problems in computer vision inadvertently reveal demographic information (e.g., gender) about people in images. For example, in COCO, images of plates are significantly more common with women than men, so if a model predicts that a plate is in the image, we can infer there is likely a woman in the image. We refer to this notion as leakage. In this section, we present formal definitions of leakage for a dataset and models, and a measure for quantifying bias amplification as summarized in Figure 1.

Dataset Leakage: We assume we are given an annotated dataset containing instances , where is an image annotated with a set of task-specific labels (e.g., objects), and a protected attribute (e.g., the image contains a person with perceived gender male or female).222In this paper, we assume gender as binary due to the available annotations, but the work could be extended to non-binary, as well as a broader set of protected attributes, such as race or age. We say that a particular annotation leaks information about if there exists a function such that . We refer to this as an attacker because it tries to reverse engineer information about protected attributes in the input image only from its task-specific labels . To measure leakage across a dataset, we train such an attacker and evaluate it on held out data. The performance of the attacker, the fraction of instances in that leak information about through , yields an estimate of leakage:


where is the indicator function. We extend this definition of leakage to assess how much gender is revealed at different levels of accuracy, where errors are due entirely to chance. We define dataset leakage at a performance by perturbing ground truth labels, with some function , such that the overall accuracy of the changed labels with respect to the ground truth achieves an accuracy :


This allows us to measure the leakage of a model whose performance is and whose mistakes cannot be attributed to systematic bias. Across all experiments, we use F1 as the performance measure, and , by definition.

Model Leakage: Similar to dataset leakage, we would like to measure the degree a model, produces predictions, , that leak information about the protected variable . We define model leakage as the percentage of examples in that leak information about through . To measure prediction leakage, we train a different attacker on to extract information about :


where is a attacker function trained to predict gender from the outputs of model which has an accuracy score .

Bias Amplification: Formally, we define the bias amplification of a model , as the difference between the model leakage and the dataset leakage at the same accuracy .


Note that measures the leakage of an ideal model which achieves a performance level but only makes mistakes randomly, not due to systematic bias. A model with larger than zero leaks more information about gender than we would expect even from simply accomplishing the task defined by the dataset. This represents a type of amplification on the reliance on protected attributes to accomplish the prediction task. In equation 4, could be any performance measurement but we use F1 score throughout our experiments. We show later in Section 4 that all models we evaluated leak more information than we would expect and even leak information when the dataset does not.

Creating an Attacker: Ideally, the attacker should be a Bayes optimal classifier, which makes the best possible prediction of using . However, in practice, we need to train a model to do this prediction for every model, and we use a deep neural network to do so. Yet, we are not guaranteed that we have obtained the best possible function for mapping to . As such, it is important to consider the reported leakage as a lower bound on true leakage. In practice, we find that we can robustly estimate (see Section  4: Attacker Learning is Robust).

4 Bias Analysis

Statistics Leakage Performance
dataset split #men #women mAP F1
COCO [16] original CRF
no gender
imSitu [36] original CRF
no gender
Table 1: In this table we show for different splits in COCO and imSitu, (1) , dataset leakage or the accuracy obtained by predicting gender from ground truth annotations, showing that our data balancing approach successfully achieves significantly reducing this type of leakage (2) , model leakage or the accuracy obtained by a model trained to predict gender on the outputs of a model trained on the target task, the last two columns show the mAP and F1 score of the model, and (3) , dataset leakage at a certain performance leverl, or the leakage of a model with access to ground truth annotations but with added noise so that its accuracy matches that of a model trained on this data, i.e. same F1 as shown in the last column. (4) , bias amplification, the difference between model leakage and dataset leakage at the same performance level, indicating how much more leakage the model is exhibiting over chance.

In this section we summarize our findings showing that both imSitu and COCO leak information about gender. We show that models trained on these datasets leak more information than would be expected (1) when models are required to predict gender through a structured predictor that jointly predicts labels and gender, (2) when models are required to predict labels but not gender, and (3) even when not predicting gender and datasets were balanced such that each gender co-occurs equally with target labels. Table 1 summarizes our results.

4.1 Experiment Setup

We consider two tasks: (1) multi-label classification in the COCO dataset [16], including the prediction of gender, and (2) imSitu activity recognition, a multi-classification task for people related activities.

Datasets: This paper follows the setup of existing work for studying bias in COCO and imSitu [39], deriving gender labels from captions in COCO and “agent” roles in imSitu. For the purpose of our analysis, we exclude “person” from the categories in COCO and only use images that contain people. We have , , and , , images in the training, validation and testing set for COCO and imSitu respectively.

Models: For both COCO object classification and imSitu activity recognition, we use a standard ResNet-50 convolutional neural network pretrained on Imagenet (ILSVRC) as the underlying model by replacing the last linear layer. We also consider the Conditional Random Field (CRF) based models in [39] when predicting gender jointly with target variables. Attackers models were a 4-layer multi-layer perceptron (MLP) with BatchNorm and a LeakyReLU.

Metrics: We evaluate using mAP, or the mean across categories of the area under the precision-recall curve, and F1 score for both object and activity classification by using the discrete output predictions of the model.

Computing Leakage: Model leakage was predicted from pre-activation logits while dataset leakage was predicted from binary labels. Attackers were trained and evaluated with an equal amount images of men and women. To train the attacker, we sample , , male and females images, on COCO and imSitu respectively. For validation and testing, we sample , male and female images from validation and testings sets of COCO and imSitu, respectively.

Training Details: For a fair comparison, all models are developed and evaluated on the same dev and test sets from the original data. We optimize using Adam [14] with a learning rate of and a minibatch size of to train the linear layers for classification. We then fine-tune the model with a learning rate of . We train all attackers for epochs with learning rate of and a batch size of , keeping the snapshot that performs best on the validation set.

4.2 Results

Dataset Leakage: Dataset leakage measures the degree to which ground truth labels can be used to estimate gender. The rows corresponding to “original CRF” in Table 1 summarize dataset leakage in imSitu and COCO (). Both datasets leak information: the gender of a main entity in the image is extractable from ground truth annotations 67.72% and 68.26% for COCO and imSitu, respectively.

Bias Amplification: Bias amplification () captures how much more information is leaked than what we expect from a similar model which makes mistakes entirely due to chance. Dataset leakage needs to be calibrated with respect to model performance for computing bias amplification. To do so, we randomly flip ground truth labels to reach various levels of accuracy. Figure 4 shows dataset leakage at different performance levels in COCO and imSitu. The relationship between F1 and leakage is roughly linear. In Table 1, we report adjusted leakage for models at appropriate levels (). Finally, bias amplification () can be computed by taking the difference between adjusted dataset leakage () and model leakage (.

Models trained on standard splits of both COCO and imSitu that jointly predict gender and target labels (the original rows in Table 1), all leak significantly more information about gender than we would expect by chance. Surprisingly, imSitu is more gender balanced than COCO but actually leaks significantly more information than models trained on COCO. When models are no longer required to predict gender, they leak less information than before but still more than we would expect (refer to the no gender rows in Table 1).

Figure 4: Dataset leakage in COCO and imSitu as function of F1 score. Ground truth labels were randomly flipped to simulate a method that performs at different levels of F1 score. We refer to this accuracy adjusted leakage as , or the amount we would expect a method to leak given its performance level.

Alternative Data Splits: It is possible to construct datasets which leak less through subsampling. We obtain splits more balanced in male and female co-occurrences with labels by imposing the constraint that neither gender occurs more frequently with any output label by a ratio greater than :


where and are the number of occurrences of men with label and of women with label respectively. Enforcing this constraint in imSitu is trivial because each image is only annotated with one verb: we simply sample the over-represented gender until it passes the above constraints. For COCO, since each image contains multiple object annotations, we must heuristically enforce this constraint. We try to make every object satisfy this constraint one at a time, removing images from the dataset that have the smallest number of objects. We iterate through all objects until this process converges and all objects satisfy the constraint. We create splits for .333Practically satisfying is in-feasible, but our heuristic is able to find a set where .

Table 1 rows summarize results for rebalancing data with respect to gender. As we expect, decreasing values of yields smaller datasets with less dataset leakage but worse predictors because there is less data. Yet model leakage does not reduce as quickly as dataset leakage, resulting in nearly no change in bias amplification. In fact, when there is nearly no dataset leakage, models still leak information. Likely this is because it is impossible to balance unlabeled co-occurring features with gender (e.g. COCO only has annotations for objects) and the models still rely on these features to make predictions. In summary, balancing the co-occurance of gender and target labels does not reduce bias amplification in a meaningful way.

1 layer , ———- , all data
2 layer , 100 dim , all data
2 layer , 300 dim , all data
4 layer , 300 dim , all data
4 layer , 300 dim , 75% data
4 layer , 300 dim , 50% data
4 layer , 300 dim , 25% data
Table 2: Varying attacker architecture and training data when estimating model leakage on the original COCO. The leakage estimate is robust to significant changes, showing that estimation of leakage with our adversaries is largely easy and stable.

Attacker Learning is Robust: Measuring leakage relies on being able to consistently estimate an attacker. To verify that leakage estimates are robust to different architectures and data settings on the attacker side, we conduct an ablation study in Table 2. We train to measure model leakage () on the original COCO dataset, varying attacker architecture, and the amount of training data used. Beyond prediction with an attacker with 1-layer, none of the others vary in their estimation of leakage by more than 2 points.

5 Adversarial Debiasing

In this section we show the effectiveness of a method for reducing leakage through training with an auxiliary adversarial loss. This auxiliary loss will effectively remove gender information from intermediate representations. We additionally propose a way to visualize the effects of this approach on the input space, to inspect the type of information being removed.

5.1 Method Overview

We propose a simple formulation for reducing the amount of leakage in a model, summarized in Figure 2. We hypothesize that models leak extra information about protected attributes because the underlying representation is overly sensitive to features related to those attributes. As such, we encourage models to build representations from which protected attributes can not be predicted.

Our methods rely on the construction of a critic, , which attempts to predict protected information from an intermediate representation, for a given image , of a predictor, . The critic attempts to minimize a loss over the amount of information it can extract:


while the predictor tries to minimize its loss over the task specific predictions while increasing the critic’s loss:


In both cases, is the cross-entropy loss, and when optimizing we do not update , and trade-off task performance with sensitivity to protected attributes with .

We also experiment with optimizing the adversarial loss on the input feature space by leveraging an encoder-decoder model that auto-encodes the input image . In order to accomplish this goal, we add an additional loss with a weight parameter to the predictor as follows:

Figure 5: Bias amplification as a function of F1 score on COCO object classification and imSitu action recognition. Models in the top left have low leakage and high F1 score. The blue dashed line indicates bias and performance of adding progressively more noise to the original model representation. Our adversarial methods (circled) are the ones which make a better trade-off between performance and bias amplification than randomization and other baselines.

Where , which is the original image element-wise multiplied with a mask generated by an encoder-decoder bottleneck network with input . So the first term is encouraging the mask to maintain the information in the original image, the second term is trying to obtain correct task-specific predictions from the masked input, and the third term is adversarially trying to obscure gender by modifying the mask. This is similar to the proposed experiment in Palacio et al [21] where instead, the outputs of an autoencoder are directly fed to a convolutional neural network trained to recognize objects in order to interpret the patterns learned by the network. In contrast, our objective is to visualize what the adversary learned to obfuscate while trying to preserve accurate results.

5.2 Implementation Details

We first train the classification layers (linear classifiers) with as learning rate and a batch size of until the performance plateaus. We then incorporate the adversarial loss, and fine-tune the model end-to-end using a learning rate . Before activating the adversarial loss, we first train the gender classification branch so that its gradients provide useful guidance for feature removal during adversarial training. In every batch, we sample the same amount of male and female images for training this adversary.

5.3 Models

Adversarial Methods

We consider three different types of adversaries which try to remove leakage at different stages in a ResNet-50 classification network.

  • adv @ image, or removing gender information directly at the image. We use U-Net [23] as our encoder-decoder network to predict a mask . The original image is point-wise multiplied with this mask and then fed to two branches. The first branch is a ResNet-18 which attempts to detect gender (the adversary) and the second branch is a ResNet-50 for classifying the target categories.

  • adv @ conv4, removes gender information from an intermediate hidden representation of ResNet-50 (on the 4th convolutional block). We use an adversary with 3 convolutional layers and 4 linear layers.

  • adv @ conv5, removes gender information from the final convolutional layer of ResNet-50. We use a linear adversary which takes as input a vectorized form of the output feature map and uses a 4-layer MLP for classification.

Leakage Performance
mAP F1
original CRF
CRF + adv
Table 3: Model Leakage and performance trade-offs for RBA (Reducing Bias Amplification, proposed in [39]) and our adversarial training methods. We adopt the CRF based model [39] to predict COCO objects as well as the gender. Our method reduces more than bias amplification while RBA fails to do so.
Leakage Performance
mAP F1
adv @ image
adv @ conv4
adv @ conv5
adv @ conv5
Table 4: Model leakage and performance trade-offs for different baselines (rows 1-5) and our adversarial training methods (rows 6-8) on COCO object classification. Our methods make significantly better trade-offs than baselines, even improving on methods which use ground truth detection and segmentation. Applying adversarial training on balanced dataset reaches lowest model leakage () and bias amplification ().
Leakage Performance
mAP F1
adv @ image
adv @ conv4
adv @ conv5
adv @ conv5
Table 5: Model leakage and performance trade-offs for different baselines (rows 1-3) and our adversarial training methods (rows 4-6) on imSitu activity recognition. Our methods make significantly better trade-offs than baselines. Applying adversarial training on balanced dataset reaches lowest model leakage () and bias amplification ().

Baselines: We consider several alternatives to adversarial training to reduce leakage, including some that have access to face detectors and ground truth segment annotations.

  • Original: The basic model for object or action recognition, trained on the original data, without any attempt to reduce leakage.

  • Randomization: Adding random noise to the pre-classification embedding layer of the original model. We consider adding Gaussian noise at increasing magnitudes. We expect larger perturbations to remove more leakage while preventing the model from effectively classifying images.

  • Alternative Datasets: We also consider constructing alternative data splits for imSitu and COCO through downsampling approaches that reduce dataset leakage. We refer to this alternative data splits as , as defined in section 4.2.

  • Blur: Consists of blurring people in images when ground truth segments are available (COCO only).

  • Blackout - Face: Consists of blacking out the faces in the images using a face detector.

    Figure 6: Images after adversarial removal of gender in image space by using a U-Net based autoencoder as inputs to the recognition model. While people are clearly being obscured from the image, the model selectively chooses to obscure only parts that would reveal gender such as faces but tries to keep information that is useful to recognize objects or verbs. 1st row: WWWM MMWW; 2nd row: MWWW WMWW; 3rd row: MMMW MMWM; 4th row: MMMW WWMM. W: woman; M: man.
  • Blackout - Segm: Consists of blacking out people in images when ground truth segments are available (COCO only). This aggressively removes features such as skin and clothing. It may also obscure objects with which people are closely interacting with.

  • Blackout - Box: Consists of blacking out people using ground truth bounding boxes (COCO and imSitu). This removes large regions of the image around people, likely removing many objects and body pose cues.

5.4 Quantitative Results

Our results on COCO and imSitu are in Table 4 and Table 5. Adversarially trained methods offer significantly better trade-offs between leakage and performance than any other method. We are able to reduce model leakage by over 53% and 67% on COCO and imSitu respectively, while suffering only 1.21 and 2.26 F1 score degradation. We also compare our method with RBA [39], a debiasing algorithm proposed to maintain the similarity between the training data and model predictions. As shown in Table 3, the original CRF model predicts gender and objects, RBA fails to have reduce bias amplification. Figure 5 further highlights that our methods are making extremely favorable trade-offs between leakage and performance, even when compared to methods that blur, black-out, or completely remove people from the images using ground truth segment annotations. Adversarial training is the only method that consistently improves upon simply adding noise to the model representation before prediction (the blue curves).

5.5 Qualitative Results

While adversarial removal works best when applied to representations in intermediate convolutional layers. In order to obtain interpretable results, we apply gender removal in the image space and show results in Fig. 6. In some instances our method removes the entire person, in some instances only the face, in other cases clothing, and garments that might be strongly associated with gender. Our approach learns to selectively obscure pixels enough to make gender prediction hard but leaving sufficient information to predict other things, especially objects that need to be recognized such as frisbee, bench, ski, as well as actions such as cooking, biking, etc. This is in contrast to our strong baselines that remove the entire person instances using ground-truth segmentation masks. A more sensible compromise is learned through the adversarial removal of gender without the need for segment-level supervision.

6 Conclusion

We introduced dataset leakage, and model leakage as measures of the encoded bias with respect to a protected variable in either datasets or trained models. We demonstrated that models amplify the biases in existing datasets for tasks that are not related to gender recognition. Moreover, we show that balanced datasets do not lead to unbiased predictions and that more fundamental changes in visual recognition models are nedeed. We also demonstrated an adversarial approach for the removal of features associated with a protected variable from the intermediate representations learned by a convolutional neural network. Our approach is superior to applying various forms of random perturbations in the representations, and to applying image manipulations that have access to significant privileged information such as people segments. We expect that the setup, methods, and results in this paper will be useful for further studies of representation bias in computer vision.


  • [1] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach. A reductions approach to fair classification. Conference on Fairness, Accountability and Transparency, 2017.
  • [2] A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. Conference on Fairness, Accountability and Transparency, 2017.
  • [3] J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In S. A. Friedler and C. Wilson, editors, Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pages 77–91, New York, NY, USA, 23–24 Feb 2018. PMLR.
  • [4] K. Burns, L. A. Hendricks, T. Darrell, and A. Rohrbach. Women also snowboard: Overcoming bias in captioning models. European Conference on Computer Vision (ECCV), 2018.
  • [5] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012.
  • [6] Y. Elazar and Y. Goldberg. Adversarial removal of demographic attributes from text data. Empirical Methods in Natural Language Processing (EMNLP), 2018.
  • [7] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. pages 259–268, 2015.
  • [8] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
  • [9] M. Hardt, E. Price, N. Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315–3323, 2016.
  • [10] T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning, pages 1929–1938, 2018.
  • [11] M. Kay, C. Matuszek, and S. A. Munson. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 3819–3828. ACM, 2015.
  • [12] A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba. Undoing the damage of dataset bias. In European Conference on Computer Vision, pages 158–171. Springer, 2012.
  • [13] T.-h. Kim, D. Kang, K. Pulli, and J. Choi. Training with the invisibles: Obfuscating images to share safely for learning visual recognition models. arXiv preprint arXiv:1901.00098, 2019.
  • [14] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  • [15] M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems (NIPS), pages 4069–4079, 2017.
  • [16] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014.
  • [17] P. Liu, J. T. Zhou, I. W.-H. Tsang, Z. Meng, S. Han, and Y. Tong. Feature disentangling machine-a novel approach of feature selection and disentangling in facial expression analysis. In European Conference on Computer Vision, pages 151–166. Springer, 2014.
  • [18] X. Liu, B. Vijaya Kumar, J. You, and P. Jia. Adaptive deep metric learning for identity-aware facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 20–29, 2017.
  • [19] Y. Liu, F. Wei, J. Shao, L. Sheng, J. Yan, and X. Wang. Exploring disentangled feature representation beyond face identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2080–2089, 2018.
  • [20] I. Misra, C. Lawrence Zitnick, M. Mitchell, and R. Girshick. Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels. In CVPR, pages 2930–2939, 2016.
  • [21] S. Palacio, J. Folz, J. Hees, F. Raue, D. Borth, and A. Dengel. What do deep networks like to see? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3108–3117, 2018.
  • [22] S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza. Disentangling factors of variation for facial expression recognition. In European Conference on Computer Vision, pages 808–822. Springer, 2012.
  • [23] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  • [24] K. Ross and C. Carter. Women and news: A long and winding road. Media, Culture & Society, 33(8):1148–1165, 2011.
  • [25] H. J. Ryu, M. Mitchell, and H. Adam. Improving smiling detection with race and gender diversity. Proceedings of FAT/ML 2018, 2017.
  • [26] J. Sokolic, Q. Qiu, M. R. Rodrigues, and G. Sapiro. Learning to succeed while teaching to fail: Privacy in closed machine learning systems. arXiv preprint arXiv:1705.08197, 2017.
  • [27] P. Stock and M. Cisse. Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and model criticism. arXiv preprint arXiv:1711.11443, 2017.
  • [28] J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural computation, 12(6):1247–1283, 2000.
  • [29] T. Tommasi, N. Patricia, B. Caputo, and T. Tuytelaars. A deeper look at dataset bias. In Domain Adaptation in Computer Vision Applications, pages 37–55. Springer, 2017.
  • [30] A. Torralba and A. Efros. Unbiased look at dataset bias. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pages 1521–1528. IEEE Computer Society, 2011.
  • [31] M. Upmanyu, A. M. Namboodiri, K. Srinathan, and C. Jawahar. Efficient privacy preserving video surveillance. In 2009 IEEE 12th international conference on computer vision, pages 1639–1646. IEEE, 2009.
  • [32] B. Wilson, J. Hoffman, and J. Morgenstern. Predictive inequity in object detection. arXiv preprint arXiv:1902.11097, 2019.
  • [33] Z. Wu, Z. Wang, Z. Wang, and H. Jin. Towards privacy-preserving visual recognition via adversarial training: A pilot study. In The European Conference on Computer Vision (ECCV), September 2018.
  • [34] Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig. Controllable invariance through adversarial feature learning. In Advances in Neural Information Processing Systems (NIPS), pages 585–596, 2017.
  • [35] S. Yao and B. Huang. Beyond parity: Fairness objectives for collaborative filtering. In Advances in Neural Information Processing Systems (NIPS), pages 2925–2934, 2017.
  • [36] M. Yatskar, L. Zettlemoyer, and A. Farhadi. Situation recognition: Visual semantic role labeling for image understanding. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  • [37] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
  • [38] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. Proceedings of AIES, 2018.
  • [39] J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2017.
  • [40] J. Zhao, Y. Zhou, Z. Li, W. Wang, and K.-W. Chang. Learning gender-neutral word embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description