Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases

Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases


Recent advances in machine learning leverage massive datasets of unlabeled images from the web to learn general-purpose image representations for tasks from image classification to face recognition. But do unsupervised computer vision models automatically learn implicit patterns and embed social biases that could have harmful downstream effects? For the first time, we develop a novel method for quantifying biased associations between representations of social concepts and attributes in images. We find that state-of-the-art unsupervised models trained on ImageNet, a popular benchmark image dataset curated from internet images, automatically learn racial, gender, and intersectional biases. We replicate 8 of 15 documented human biases from social psychology, from the innocuous, as with insects and flowers, to the potentially harmful, as with race and gender. For the first time in the image domain, we replicate human-like biases about skin-tone and weight. Our results also closely match three hypotheses about intersectional bias from social psychology. When compared with statistical patterns in online image datasets, our findings suggest that machine learning models can automatically learn bias from the way people are stereotypically portrayed on the web.



implicit bias, unsupervised representation learning, computer vision ]

1 Introduction

Can machines learn social biases from the way people are portrayed in image datasets? Companies and researchers regularly use machine learning models trained on massive datasets of images scraped from the web tasks from face recognition [41] to image classification [66]. To reduce costs, many practitioners use state-of-the-art models “pre-trained” on large datasets to help solve other machine learning tasks, a powerful approach called transfer learning [68]. For example, HireVue uses similar state-of-the-art computer vision and natural language models to evaluate job candidates’ video interviews, potentially discriminating against candidates based on race, gender, or other social factors [36]. In this paper, we show how models trained on unlabeled images scraped from the Internet embed human-like biases, including racism and sexism.

Where most bias studies focus on supervised machine learning models, we seek to quantify learned patterns of implicit social bias in unsupervised image representations. Studies in supervised computer vision have highlighted social biases related to race, gender, ethnicity, sexuality, and other identities in tasks including face recognition, object detection, image search, and visual question answering [11, 43, 20, 76, 47, 52]. These algorithms are used in important real-world settings, from applicant video screening [36, 61] to autonomous vehicles [29, 52], despite the fact that their harmful downstream effects have been documented in applications such as online ad delivery [67] and image captioning [39].

Figure 1: Unilever using AI powered job candidate assessment tool HireVue [36].

Our work examines the growing set of computer vision methods in which no labels are used during model training. Recently, pre-training approaches adapted from language models have dramatically increased the quality of unsupervised representations for image recognition [22, 4, 37, 16, 14, 15, 50, 13]. With fine-tuning, practitioners can pair these general-purpose representations with labels from their domain to accomplish specific supervised tasks like face recognition or image captioning. We hypothesize that 1) like their counterparts in language, these unsupervised image representations also contain human-like social biases, and 2) these biases correspond to stereotypical portrayals of social group members in training images.

Results from natural language support this hypothesis. Several studies show that word embeddings, or representations, learned automatically from the way words co-occur in large text corpora exhibit human-like biases [8, 12, 27]. Word embeddings acquire these biases via statistical regularities in language that are based on the co-occurrence of stereotypical words with social group signals. Recently, new deep learning methods [21, 58, 59] for learning multiple, context-specific representations sharply advanced the state-of-the-art in natural language processing (NLP). Embeddings from these pre-trained models can be fine-tuned to boost performance in downstream tasks such as translation [25, 24]. As with static embeddings, researchers have shown that embeddings extracted from contextualized language models also exhibit downstream racial and gender biases [79, 5, 69, 35].

Recent advances in NLP architectures have inspired similar unsupervised computer vision models. We focus on two state-of-the-art, pre-trained models for image representation, iGPT [15] and SimCLRv2 [14]. We chose these models because they hold the highest unsupervised classification scores, were pre-trained on the same large dataset of Internet images, and are publicly available. iGPT, or Image GPT, borrows its architecture from GPT-2 [59], a state-of-the-art unsupervised language model. iGPT learns representations for pixels (rather than for words) by pre-training on many unlabeled images [15]. SimCLRv2 uses deep learning to construct image representations from ImageNet by comparing transformed versions of the training images [16, 14].

Do these unsupervised computer vision models embed human biases like their counterparts in natural language? If so, what are the origins of this bias? In NLP, embedding biases have been traced to word co-occurrences and other statistical patterns in text corpora used for training [12, 10, 7]. Both our models are pre-trained on ImageNet 2012, the most widely-used dataset of curated images scraped from the web [64]. In image datasets and image search results, researchers have documented clear correlations between the presence of individuals of a certain gender and the presence of stereotypical objects; for instance, the category “male” co-occurs with career and office related content such as ties and suits whereas “female” more often co-occurs with flowers in casual settings [43, 73]. As in NLP, we expect that these biased contextual portrayals in the pre-training dataset will result in corresponding, implicitly-embedded patterns of bias, even without the use of any labels during training. This paper presents the Image Embedding Assocation Test (iEAT), a systematic and principled approach to studying social bias learned automatically from image datasets.

  • We develop the first systematic method for detecting and quantifying social bias, including intersectional bias, in unsupervised image models. With a sensitivity test, we show that it reports appropriate significance values.

  • We find that statistically significant racial, gender, weight, and intersectional biases embedded in two state-of-the-art unsupervised image models pre-trained on ImageNet [64], iGPT [15] and SimCLRv2 [14].

  • We replicate 15 previously documented human and machine bias tests with two different model architectures pre-trained on ImageNet, including the first machine replication of Implicit Association Tests (IATs) with picture stimuli, that have been studied for decades and validated in social psychology [32]. In 8 of these tests (including 4 of the 5 bias tests also replicated in natural language [12]), our machine results match documented human biases. All 7 tests which did not show significant human-like biases are from IATs with only small samples of picture stimuli.

  • With 16 novel tests, we show how embeddings from our model confirm several hypotheses about intersectional bias from social psychology [30].

  • We compare our results to statistical findings about the stereotypical portrayal of race and gender in image datasets. Our results suggest that unsupervised models learn bias from the way people are portrayed in images on the web.

  • We present a qualitative case study of how image generation, a downstream task utilizing unsupervised representations, exhibits a bias towards the sexualization of women.

2 Related Work

Various tests have been constructed to quantify bias in natural language models, including contextualized word representations [12, 79, 5, 48], but to our knowledge there are no principled tests for measuring bias embedded in unsupervised computer vision models. In NLP, there are several systematic approaches to measuring bias in contextualized word embeddings [48, 69, 35, 9], some model-specific [45]. Most of these tests take inspiration from the well-known IAT [32, 34]. Participants in the IAT are asked to rapidly associate stimuli, or exemplars, representing two target concepts (e.g. “flowers” and “insects”) with stimuli representing evaluative attributes (e.g. “pleasant” and “unpleasant”) attribute [32]. Assuming that the cognitive association task will be facilitated by the strength of association between the target concept and attributes, the IAT quantifies bias as the latency of participant response [32] or the rate of classification error [55]. Stimuli may take the form of words, pictures, or even sounds [53], and there are several IATs with picture-only stimuli [53].1

Notably, Caliskan et al. [12] adapt the heavily-validated IAT [32] from social psychology to machines by testing for mathematical association of word embeddings rather than response latency. They present a systematic method for measuring language biases associated with social groups, the Word Embedding Association Test (WEAT). Like the IAT, the WEAT measures the effect size of bias in static word embeddings by quantifying the relative associations of two sets of target stimuli (e.g., {“woman,” “female”} and {“man,” “male”}) that represent social groups with two sets of evaluative attributes (e.g., {“science,” mathematics”} and {“arts,” “literature”}). For validation, two WEATs quantify associations towards flowers vs. insects and towards musical instruments vs. weapons, both accepted baselines Greenwald et al. [32]. Greenwald et al. [32] refer to these baseline biases as “universally” accepted stereotypes since they are widely shared across human subjects and are not potentially harmful to society. Other WEATs measure social group biases such as sexist and racist associations or negative attitudes towards the elderly or people with disabilities. In any modality, implicit biases can potentially be prejudiced and harmful to society. If downstream applications use these representations to make consequential decisions about human beings, such as automated video job interview evaluations, artificial intelligence (AI) may perpetuate existing biases and exacerbate historical injustices [60, 19].

The original WEAT [12] uses static word embedding models such as word2vec [49] and GloVe [57], each trained on Internet-scale corpora composed of billions of tokens. Recent work extends the WEAT to contextualized embeddings: dynamic representations based on the context in which a token appears. May et al. [48] insert targets and attributes into sentences like “This is a[n] <word>” and applying WEAT to the vector representation for the whole sentence, with the assumption that the sentence template used is “semantically bleached” such that the only semantically meaningful content in the sentence is the token corresponding to the target concept or attribute. Tan and Celis [69] extract the contextual word representation for the token of interest before pooling to avoid confounding effects at the sentence level; in contrast, Bommasani et al. [9] find that pooling tends to improve representational quality for bias evaluation. Guo and Caliskan [35] dispense with sentence templates entirely, pooling across word-level contextual embeddings for the same token extracted from random sentences. Our approach is closest to these latter two methods, though we pool over images rather than words.

3 Approach

In this paper, we adapt bias tests designed for contextualized word embeddings to the image domain. While language transformers produce contextualized word representations to solve the next token prediction task, an image transformer model like iGPT generates image representations to solve the next pixel prediction task [15]. Unlike words and tokens, pixels do not explicitly correspond to semantic concepts (objects or categories) as words do. In language, a single token (e.g. “love”) corresponds to the target concept or attribute (e.g. “pleasant”). But in images, no single pixel corresponds to a semantically meaningful concept.

To address the abstraction of semantic representation in the image domain, we propose the Image Embedding Association Test (iEAT), which modifies contextualized word embedding tests to compare pooled image-level embeddings. The goal of the iEAT is to measure the biases embedded during unsupervised pre-training by comparing the relative association of image embeddings in a systematic process. Chen et al. [15] and Chen et al. [16] show through image classification that unsupervised image features are good representations of object appearance and categories; we expect they will also embed information gleaned from the common co-occurrence of certain objects and people and therefore contain related social biases.

Our approach is summarized in Figure 2. The iEAT uses the same formulas for the test statistic, effect size , and -value as the WEAT [12], described in Section 3.3. Section 3.1 summarizes our approach to replicating several different IATs; Section 3.2 describes several novel intersectional iEATs. Section 3.3 describes our test statistic, drawn from embedding association tests like the WEAT.

Figure 2: Example iEAT replication of the Insect-Flower IAT [32], which measures the differential association between flowers vs. insects and pleasantness vs. unpleasantness.

3.1 Replication of Bias Tests

In this paper, we validate the iEAT by replicating as closely as possible several common IATs. These tests fall into two broad categories: valence tests, in which two target concepts are tested for association with “pleasant” and “unpleasant” images; and stereotype tests, in which two target concepts are tested for association with a pair of stereotypical attributes (e.g. “male” vs. “female” “career” vs. “family”). So as to closely match the ground-truth human IAT data and validate our method, our replications use exactly the same concepts as in the original IATs (listed in Table 1). Because some IATs rely on verbal stimuli, we adapt them to images, using image stimuli from the IATs when available. When no previous studies use image stimuli, we map the non-verbal stimuli to images using the data collection method described in Section 5.

Many of these bias tests have been replicated for machines in the language domain; for the first time, we also replicate tests with image-only stimuli, including the Asian and Native American IATs. Most of these tests were originally administered in controlled laboratory settings [32, 34], and all except for the Insect-Flower IAT have also been tested on the Project Implicit website at [54, 34, 33]. Project Implicit has been available world-wide for over 20 years; in 2007, the site had collected more than 2.5 million IATs. The average effect sizes (which are based on samples so large the power is nearly 100%) for these tests are reproduced in Table 1. To establish a principled methodology, all the IAT verbal and original image stimuli for our bias tests were replicated exactly from this online IAT platform [56]. We will treat these results, along with the laboratory results from the original experiments [32], as ground-truth for human biases that serve as validation benchmarks for our methods (Section 6).

3.2 Intersectional iEATs

We also introduce several new tests for intersectional valence bias and for bias at the intersection of gender stereotypes and race. Intersectional stereotypes are often even more severe than their constituent minority stereotypes [18]. Following Tan and Celis [69], we anchored comparison on White males, the group with the most privilege, and compared against White females, Black males, and Black females, respectively (Table 2). Drawing on social psychology [30], we pose three hypotheses about intersectional bias:

  • Intersectionality hypothesis: tests at the intersection of gender and race will reveal emergent biases not explained by the sum of biases towards race and gender alone

  • Race hypothesis: biases between racial groups will be more similar to differential biases between the men than between the women (because samples from men as the majority group dominate the representation)

  • Gender hypothesis: biases between men and women will be most similar to biases between White men and White women (because samples from White people as the majority group dominate the representation)

3.3 Embedding Association Tests

Though our stimuli are images rather than of words, we can use the same statistical method for measuring biased associations between image representations [12] to quantify a standardized effect size of bias. We follow Caliskan et al. [12] in describing the WEAT here.

Let and be two sets of target concepts embeddings of size , and let and be two sets of attribute embeddings of size . For example, the Gender-Career IAT tests for the differential association between the concepts “male” () and “female” () and the attributes “career” () and “family” (). Generally, experts in social psychology and cognitive science select stimuli which are typically representative of various concepts. In this case, contains embeddings for verbal stimuli such as “boy,” “father,” and “man,” while contains embeddings for verbal stimuli like “office” and “business.” These linguistic, visual, and sometimes auditory stimuli are proxies for the aggregate representation of a concept in cognition. Embedding association tests use these unambiguous stimuli as semantic representations to study biased associations between the concepts being represented. Since the stimuli are chosen by experts to most accurately represent concepts, they are not polysemous or ambiguous tokens. We use these expert-selected stimuli as the basis for our tests in the image domain.

The test statistic measures the differential association of the target concepts and with the attributes and


where is the differential association of with the attributes, quantified by the cosine similarity of vectors

We test the significance of this association with a permutation test2 over all possible equal-size partitions of to generate a null hypothesis as if no biased associations existed. The one-sided -value measures the unlikelihood of the null hypothesis

The effect size (Cohen’s ), a standardized measure of the separation between the relative association of and with and , is effect size

A larger effect size indicates a larger differential association; for instance, the large effect size in Table 1 for the gender-career bias example above indicates that in human respondents, “male” is strongly associated with “career” attributes compared to “female,” which is strongly associated with “family” attributes. Note that these effect sizes cannot be directly compared to effect sizes in human IATs, but the significance levels are uniformly high. Human IATs measure individual people’s associations; embedding association tests measure the aggregate association learned in the representation space from the entire population of data points contributing to the training set of the model.

One important assumption of the iEAT is that categories can be meaningfully represented by groups of images, such that the association bias measured refers to the categories of interest and not some other, similar-looking categories. Thus, a positive test result indicates only that there is association bias between the corresponding samples’ sets of target images and attribute images. To generalize to association between abstract social concepts requires that the samples meaningfully represent the categories of interest. Section 5 details our procedure for selecting multiple, representative stimuli, following validated approaches from prior work [32].

We use an adapted version of May et al. [48]’s Python WEAT implementation. All code, pre-trained models, and data used to produce the figures and results in this paper can be accessed at

4 Computer Vision Models

To explore what kinds of biases may get embedded in image representations generated in unsupervised settings, where class labels are not available for images, we focus on two computer vision models published in summer 2020, iGPT and SimCLRv2. We extract representations of image stimuli with these two pre-trained, unsupervised image representation models. We choose these particular models because iGPT and SimCLRv2 represent state-of-the-art performance in linear evaluation (a measure of the accuracy of a linear image classifier trained on embeddings from each model). iGPT is the first model to learn from pixel co-occurrences to generate image samples and perform image completion tasks.

Pre-training Data

Both models are pre-trained on ImageNet 2012, a large benchmark dataset for computer vision tasks [64].3 ImageNet 2012 contains 1.2 million annotated images of 200 object classes, including a person class; even if the annotated object is not a person, a person may appear in the image. For this reason, we expect the models to be capable of generalizing to stimuli containing people [63, 64]. While there are no publicly available pre-trained models with larger training sets and the “people” category of ImageNet is no longer available, this dataset is a widely used benchmark containing a comprehensive sample of images scraped from the web, primarily Flickr [64]. We assume that the portrayals of people in ImageNet are reflective of the portrayal of people across the web at large, but more contemporary study is left to future work. CIFAR-100, a smaller classification database, was also used for linear evaluation and stimuli collection [44].

Image Embeddings

Both models are unsupervised: neither uses any labels during training. Unsupervised models learn to produce embeddings based on the implicit patterns in the entire training set of image features. Both models utilize neural networks with multiple hidden layers, each learning a different level of abstraction, and a projection layer for some downstream task. For linear classification tasks, features can be drawn directly from layers in the base neural network. As a result, there are various ways to extract image representations, each encoding a different set of information. We follow Chen et al. [15] and Chen et al. [16] in choosing the features for which linear evaluation scores are highest such that the features extracted contain high quality, general-purpose information about the objects in the image. The remainder of this section describes the architecture and feature extraction method for each model.

4.1 iGPT

The image generative pre-training (iGPT) model is a novel, NLP-inspired approach to unsupervised image representation. We chose iGPT for its high linear evaluation scores, minimalist architecture, and strong similarity to GPT-2, which has already been tested for bias in the language domain. iGPT belongs to a broader class of transformer models, including the original GPT-2 model [59], that have found great success in the language domain. Transformers learn patterns in the way individual tokens in an input sequence appear with other tokens in the sequence [71]. Chen et al. [15] apply a structurally simple, highly parameterized version of the GPT-2 generative language pre-training architecture [59] to the image domain for the first time. GPT-2 uses the “contextualized embeddings” learned by a transformer to predict the next token in a sequence and generate realistic text [59]. Rather than autoregressively predict the next entry in a sequence of tokens as GPT-2 does, iGPT predicts the next entry in a flattened sequence of pixels. iGPT is trained to autoregressively complete cropped images, and feature embeddings extracted from the model can be used to train a state-of-the-art linear classifier [15].

We use the largest open-source version of this model, iGPT-L 32x32, with layers and embedding size . All inputs are restricted to 32x32 pixels; the largest model, which takes 64x64 input, is not available to the public. Original code and checkpoints for this model were obtained from its authors at Just like the GPT-2 transformer architecture, iGPT is composed of blocks

where is the input tensor to the th block. In the final layer, called the projection head, Chen et al. [15] learn a projection from to a set of logits parameterizing the conditional distributions across the sequence dimension. Because this final layer is designed for autoregressive pixel prediction, the final layer may not contain the optimal representations for object recognition tasks (and thereby semantic content). In fact, Chen et al. [15] obtain the best linear classification results using embeddings extracted from a middle layer - specifically, somewhere near the 20th layer [15]. A linear classifier trained on these features is much more accurate than one trained on the next-pixel embeddings [15]. Such “high-quality” features from the middle of the network are obtained by average-pooling the layer norm across the sequence dimension:


Chen et al. [15] then learn a set of class logits from for their fine-tuned, supervised linear classifier, but we will just use the embeddings . In general, we prefer these embeddings over embeddings from other layers for two reasons: 1) they can be more closely compared to the SimCLRv2 embeddings, which are also optimal for fine-tuning a linear classifier; 2) we hypothesize that embeddings with higher linear evaluation scores will also be more likely to embed biases, since stereotypical portrayals typically incorporate certain objects and scenes (e.g. placing men with sports equipment). In Section 7.3.2, we try another embedding extraction strategy and show that this hypothesis is correct.

4.2 SimCLR

The Simple Framework for Contrastive Learning of Visual Representations (SimCLR) [16, 17] is another state-of-the-art unsupervised image classifier. We chose SimCLRv2 because it has a state-of-the-art open source release and for variety in architecture: unlike iGPT, SimCLRv2 utilizes a traditional neural network for image encoding, ResNet [38]. SimCLRv2 extracts representations in three stages: 1) SimCLRv2 applies several different data augmentations (random cropping, random color distortions, and Gaussian blur) to training images; 2) SimCLRv2 extracts representations with an encoder network, ResNet [38]; 3) SimCLRv2 maps the representations to a latent space for contrastive learning, which maximizes agreement between the different augmented views [16]. These representations can be used to train state-of-the-art linear image classifiers [16, 17].

We use the largest pre-trained open-source version (the model with the highest linear evaluation scores) of SimCLRv2 [17], obtained from its authors at This pre-trained model uses a 50-layer ResNet with width and selective kernels (which have been shown to increase linear evaluation accuracy), and it was also pre-trained on ImageNet [64].

As with iGPT, we extract the embeddings identified by Chen et al. [16] as “high quality” features for linear evaluation. Following [16], let and be two data augmentations (random cropping, random color distortion, and random Gaussian blur) of the same image. The base encoder network is a network of layers


where is the output after the average pooling layer. During pre-training, SimCLRv2 utilizes an additional layer: a projection head that maps to a latent space for contrastive loss. The contrastive loss function can be found in [16].

After pre-training, Chen et al. [16] throw away the projection head , using the average pool output for linear evaluation. Note that the projection head is still necessary for pre-training high quality representations (it improves linear evaluation accuracy by over 10%); but [16] find that training on rather than also improves linear evaluation accuracy by more than 10%.4 We follow suit, and use (the average pool output of ResNet) to represent our image stimuli, which has dimensionality . The high dimensionality does not hinder our approach; association tests have been used with embeddings as large as dimensions [48].

5 Stimuli

To replicate the IATs, we systematically compiled a representative set of image stimuli for each of the categories listed in Table 1. For each category (e.g. “male” or “science”) in each IAT (e.g. Gender-Science), we drew representative images from either 1) the original IAT stimuli, if the IAT used picture stimuli [56], 2) the CIFAR-100 dataset [44], or 3) a Google Image Search.

This section describes how we obtained a set of images that meaningfully represent some target concept (e.g. “male”) or attribute (e.g. “science”) as it is normally, or predominantly, portrayed in society and on the web. We follow the stimuli selection criteria outlined in foundational prior work to collect the most typical and accurate exemplars [32, 34]. For picture-IATs with readily available image stimuli, we accept those stimuli as representative and exactly replicate the IAT conditions, with two exceptions: 1) the weapon-tool IAT picture stimuli include outdated objects (e.g. cutlass, Walkman), so we chose to collect an additional, modernized set of images; 2) the disability IAT utilizes abstract symbols, so we collected a replacement set of real images of disabled people for consistency. For IATs with verbal stimuli, we use Google Image Search as a proxy for the predominant portrayal of words (expressed as search terms) on the web (described in Section 5.1). Human IATs employ the same philosophy: for example, the Gender-Science IAT uses common European American names to represent male and female, because the majority of names in the U.S. are European American [54]. We follow the same approach in replicating the human IATs for machines in the vision domain.

One consequence of the stimuli collection approach is that our test set will be biased towards certain demographic groups, just as the Human IATs are biased towards European American names. For example, Kay et al. [43] showed that in 2015, search results for powerful occupations like CEO systematically under-represented women. In a case like this, we would expect to underestimate bias towards minority groups. For example, since we expect Gender-Science biases to be higher with respect to non-White women, a test set containing more White women than non-White would exhibit lower overall bias than a test set containing equal number of stimuli from white and non-White women. Consequently, tests on Google Image Search stimuli would be expected to result in under-estimated stereotype-congruent bias scores. While under-representation in the test set does not pose a major issue for measuring normative concepts, we cannot use the same datasets to test for intersectional bias. For those iEATs, we collected separate, equal-sized sets of images with search terms based on the categories White male, White female, Black male, and Black female, since none of the IATs specifically target these intersectional groups.

5.1 Verbal to Image Stimuli

One key challenge of our approach is representing social constructs and abstract concepts such as “male” or “pleasantness” in images. A Google Image Search for “pleasantness” returns mostly cartoons and pictures of the word itself. We address this difficulty by adhering as closely as possible to the verbal IAT stimuli, to ensure the validity of our replication. In verbal IATs, this is accomplished with “buckets” of verbal exemplars that include a variety of common-place and easy-to-process realizations of the concept in question. For example, in the Gender-Science IAT, the concept “male” is defined by the verbal stimuli “man,” “son,” “father,” “boy,” “uncle,” “grandpa,” “husband,” and “male” [77]. To closely match the representations tested by these IATs, we use exactly these sets of words to search for substitute image stimuli.5 For the vast majority of exemplars, we were able to find direct visualizations of the stimuli as an isolated person, object, or scene.

We collected images for each verbal stimulus using the following procedure:6

  1. If there is a CIFAR-100 category corresponding to the stimulus, we selected a random sample of images from that category in CIFAR [44].7

  2. Otherwise, we searched for the verbal stimuli verbatim on Google Image Search in private Chrome window with SafeSearch off on September 5th, September 18th and October 1st, 2020. We accepted the first results of the search meeting the following criteria:

    • Includes only the object, person, or scene specified by the stimulus.8

    • Has no watermark or other text. Watermarks and text could confound the verbal stimulus being represented.

    • Shows a real object, person, or scene - is not a cartoon or sketch. ImageNet does not include a great quantity of cartoons or sketches, so we do not expect our models to generalize well to these kinds of objects/scenes [62].

  3. If no images in the first 50 results from the verbatim search met these criteria, we added a clarifying search term (e.g. “biology lab” instead of “biology”). A full list of stimuli and corresponding search terms can be found in Appendix B.

  4. Crop each image squarely (iGPT accepts only square images as input), centering the object or person of interest to ensure the entire object, person, or scene is included in the image.

This procedure controls for image characteristics that might confound the category we are attempting to define (e.g. lighting, background, dominant colors, placement) in several ways: 1) we collected more than one for each verbal stimulus, in case of idiosyncrasies in the images collected; 2) for stimuli referring to an object or person, we chose images that isolated the object or person of interest against a plain white or gray background, unless the object filled the whole image;910 3) when an attribute stimulus refers to a group of people, we chose only images where the target concepts were evenly represented in the attribute images;11 4) for the picture-IATs, we accepted the original image stimuli to exactly reconstruct the original test conditions. We did not control for any confounding effects in the original stimuli, relying instead on experts’ judgment.

5.2 Choosing Valence Stimuli

Valence, the intrinsic pleasantness or goodness of things, is one of the principal dimensions of affect and cognitive heuristics that shape attitudes and biases [32]. More than half of the IATs compare two social groups to the valence attributes “pleasant” vs. “unpleasant.” Here, positive valence will denote “pleasantness” and negative valence will denote “unpleasantness.” The verbal exemplars for valence vary slightly from test to test. Rather than create a new set of image stimuli for each individual valence IAT, we collected one, large consolidated set from an experimentally validated database [6] of low and high valence words (e.g. “rainbow,” “morgue”) commonly used in the valence IATs. To quantify norms, [6] asked human participants to rate these non-social words for “pleasantness” and “imagery” by in a controlled laboratory setting.

Because some of the words for valence do not correspond to physical objects, we collected images for verbal stimuli with high valence and imagery scores. We used exactly the same procedure as for all the other verbal stimuli (described above in Section 5.1). The full list of verbal valence stimuli can be found in Appendix A, and a specific algorithm for systematically selecting words with high imagery and extreme valence is included in our code.

6 Evaluation

We evaluate the validity of iEAT by comparing the results to ground truth bias scores obtained in prior work. More details on the analysis of these results are presented in Section 7. We obtain positive stereotype-congruent results for baseline biases that are widely accepted in society and NLP. Complementary to the null hypothesis testing, we measure how iEAT responds to true negatives to evaluate the specificity of the method.

Predictive Validity. We posit that iEAT results have predictive validity if they correspond to ground-truth IAT results for humans or WEAT results in word embeddings. In this paper, we validate the iEAT by replicating several human IATs as closely as possible (as described in Section 5) and comparing the results. We find that embeddings extracted from at least one of the two models we test display significant bias for 8 of the 15 ground-truth human IATs we replicate (Section 7). We also find evidence supporting each of the intersectional hypotheses listed in Section 3.2, which have also been empirically validated in a study with human participants [30].

Baselines. As a baseline, we replicate a “universal” bias test presented in the first paper introducing the IAT [32]: the association between flower vs. insects and pleasant vs. unpleasant. If human-like biases are encoded in unsupervised image models, we would expect a strong and statistically significant flower-insect valence bias, for two reasons: 1) as Greenwald et al. [32] conjecture, this test measures a close-to-universal baseline human bias; 2) our models (described in Section 4) achieve state-of-the-art performance when classifying simple objects including flowers and bees. Thus, the accuracy of the representations for these stimuli is verified12.

Specificity. To validate the specificity of our significance estimation, we created 1,000 random partitions of from the flower-insect test to evaluate true positive detection. 10.3% of these random tests resulted in a false positive at ; 1.2% were statistically significant false positives at , suggesting that our false positive rate is 1%, which signals an accurate methodology. Prior work on embedding associations does not evaluate the false positive signal rate. As a result, an approach validating that false positives are detected as false positives, complementary to the null hypothesis testing, is a contribution to understanding the specificity of embedding association methods.

7 Experiments and Results

In correspondence with the human IAT, we find several significant racial biases and gender stereotypes, including intersectional biases, shared by both iGPT and SimCLRv2 when pre-trained on ImageNet.

7.1 iEATs

Effect sizes corresponding to the magnitude of bias and -values of statistical significance from the permutation test for each bias type measurement are reported in Table 1. iGPT also contains weight and skin-tone biases, which we replicate as a machine bias test for the first time. We interpret these results below.

Widely Accepted Biases

First, we apply the iEAT to the widely accepted baseline Insect-Flower IAT, which measures the association of insects and flowers with pleasantness and unpleasantness, respectively. As hypothesized, we find that embeddings from both models contain significant positive biases in the same direction as the human participants, associating flowers with pleasantness and insects with unpleasantness, with . (Table 1).13 Notably, the magnitude of bias is greater for SimCLRv2 (effect size , ) than for iGPT (effect size , ). In general, SimCLRv2 embeddings contain stronger biases than iGPT embeddings, but do not contain as many kinds of bias. We conjecture that because SimCLRv2 transforms images before training (including color distortion and blurring) and is more architecturally complex than iGPT [16], its embeddings become more suitable for concrete object classification as opposed to implicit social patterns.

Racial Biases

Both models display statistically significant racial biases, including both valence and stereotype biases. The racial attitude test, which measures the differential association of images of European Americans vs. African Americans with pleasantness and unpleasantness, show no significant biases. But embeddings extracted from both models exhibit significant bias for the Arab-Muslim valence test, which measures the association of images of Arab-Americans vs. others with pleasant vs. unpleasant images. Also, embeddings extracted with iGPT exhibit strong bias large effect size (effect size , ) for the Skin Tone test, which compares valence associations with faces of lighter and darker skin tones. These findings relate to anecdotal examples of software that claim to make faces more attractive by lightening their skin color. Both iGPT and SimCLRv2 embeddings also associate of White people with tools and Black people with weapons in both classical and modernized versions of the Weapon IAT.14

Age Young Old Pleasant Unpleasant 6 55 iGPT 0.42 0.24 1.23
SimCLR 0.59 0.16 1.23
Arab-Muslim Other Arab-Muslim Pleasant Unpleasant 10 55 iGPT 0.86 0.02 0.33
SimCLR 1.06 0.33
Asian European American Asian American American Foreign 6 6 iGPT 0.25 0.34 0.62
SimCLR 0.47 0.21 0.62
Disability Disabled Abled Pleasant Unpleasant 4 55 iGPT -0.02 0.53 1.05
SimCLR 0.38 0.34 1.05
Gender-Career Male Female Career Family 40 21 iGPT 0.62 1.1
SimCLR 0.74 1.1
Gender-Science Male Female Science Liberal Arts 40 21 iGPT 0.44 0.02 0.93
SimCLR -0.10 0.67 0.93
Insect-Flower Flower Insect Pleasant Unpleasant 35 55 iGPT 0.34 0.07 1.35
SimCLR 1.69 1.35
Native European American Native American U.S. World 8 5 iGPT -0.33 0.73 0.46
SimCLR -0.19 0.65 0.46
Race European American African American Pleasant Unpleasant 6 55 iGPT -0.62 0.85 0.86
SimCLR -0.57 0.83 0.86
Religion Christianity Judaism Pleasant Unpleasant 7 55 iGPT 0.37 0.25 -0.34
SimCLR 0.36 0.26 -0.34
Sexuality Gay Straight Pleasant Unpleasant 9 55 iGPT -0.03 0.52 0.74
SimCLR 0.04 0.47 0.74
Skin-Tone Light Dark Pleasant Unpleasant 7 55 iGPT 1.26 0.73
SimCLR -0.19 0.71 0.73
Weapon White Black Tool Weapon 6 7 iGPT 0.86 0.07 1.0
SimCLR 1.38 1.0
Weapon (Modern) White Black Tool Weapon 6 9 iGPT 0.88 0.06 N/A
SimCLR 1.28 0.01 N/A
Weight Thin Fat Pleasant Unpleasant 10 55 iGPT 1.67 1.83
SimCLR -0.30 0.74 1.83
Originally a picture-IAT (image-only stimuli). Originally a mixed-mode IAT (image and verbal stimuli).
Table 1: iEAT tests for the association between target concepts vs. (represented by images each) and attributes vs. (represented by images each) in embeddings generated by an unsupervised model. Association effect sizes that represents the magnitude of bias, colored by conventional small (0.2), medium (0.5), and large (0.8) size, are reported alongside permutation -values. The original human IAT effect sizes, reproduced from Nosek et al. [56], are all statistically significant with ; they can be compared to our effect sizes in sign but not in magnitude.

Gender Biases

There are statistically significant gender biases in both models, though not for both stereotypes we tested. In the Gender-Career test, which measures the relative association of the category “male” with career attributes like “business” and “office” and the category “female” with family-related attributes like “children” and “home,” embeddings extracted from both models exhibit significant bias (iGPT effect size , , SimCLRv2 effect size , ). This finding parallels Kay et al. [43]’s observation that image search results for powerful occupations like CEO systematically under-represented women. In the Gender-Science test, which measures the association of “male” with “science” attributes like math and engineering and “female” with “liberal arts” attributes like art and writing, only iGPT displays significant bias (effect size , ).

Other Biases

We attempt to replicate several other tests measuring weight stereotypes and attitudes towards the elderly or people with disabilities. iGPT displays an additional bias (effect size , ) towards the association of thin people with pleasantness and overweight people with unpleasantness. We found no significant bias for the Native American or Asian American stereotype tests, the Disability valence test, or the Age valence test. For reference, the Age IAT has been successfully replicated with static word embeddings; the others have not been tested because they use solely image stimuli [12].

Likely, the target sample sizes for these tests are too low; all three of these tests use picture stimuli from the original IAT, which are all limited to fewer than 10 images. Replication with an augmented test set is left to future work. Note that lack of significance in a test, even if the sample size is sufficiently large, does not indicate the embeddings from either model are definitively bias-free. While these tests did not confirm known human biases regarding foreigners, people with disabilities, and the elderly, they also did not contradict any known human-like biases.

7.2 Intersectional Biases

Intersectional Valence

Intersectional valence tests with the iGPT embeddings are the most consistent with social psychology, exhibiting results predicted by the intersectionality, race, and gender hypotheses listed in Section 3 [30]. Overall, iGPT embeddings contain a positive valence bias towards White people and a negative valence bias towards Black people (effect size , ), as in the human Race IAT [56]. As predicted by the race hypothesis, the same bias is significant but less severe for both White males vs. Black males (iGPT effect size , ) and White males vs. Black females (iGPT effect size , ), and the White female vs. Black female bias is insignificant; race biases in general are more similar to the race biases between men. We hypothesize that as in text corpora, computer vision datasets are dominated by the majority social groups of men and white people.

As predicted by the gender hypothesis, our results also conform with the theory that females are associated with positive valence when compared to males [23], but only when those groups are White (iGPT effect size , ); there is no significant valence bias for Black females vs. Black males. This insignificant result might be due to the under-representation of Black people in the visual embedding space. The largest differential valence bias of all our tests emerges between White females and Black males; White females are associated with pleasant valence and Black males with negative valence (iGPT effect size , ).

Gender-Career (MF) Male Female Career Family 40 21 0.81
Gender-Career (WMBF) White Male Black Female 20 21 0.20 0.27
Gender-Career (WMBM) Black Male White Male Career Family 20 21 0.89
Gender-Career (WMWF) White Male White Female 20 21 0.97
Gender-Science (MF) Male Female Science Liberal Arts 40 21 -0.00 0.50
Gender-Science (WMBF) White Male Black Female 20 21 0.80
Gender-Science (WMBM) White Male Black Male Science Liberal Arts 20 21 0.49 0.06
Gender-Science (WMWF) White Male White Female 20 21 -0.37 0.88
Valence (BFBM) Black Female Black Male Pleasant Unpleasant 20 55 0.17 0.29
Valence (BW) White Black 40 55 1.16
Valence (FM) Female Male Pleasant Unpleasant 40 55 0.39 0.04
Valence (WFBF) White Female Black Female 20 55 1.51
Valence (WFBM) White Female Black Male Pleasant Unpleasant 20 55 1.46
Valence (WMBF) White Male Black Female 20 55 0.83
Valence (WMBM) White Male Black Male Pleasant Unpleasant 20 55 0.88
Valence (WMWF) White Female White Male 20 55 0.79
Table 2: iEAT tests for the association between intersectional group vs. (represented by images each) and attributes vs. (represented by images each) in embeddings produced by an unsupervised model. Association effect sizes , colored by conventional small (0.2), medium (0.5), and large (0.8) size, are reported alongside permutation -values.

Gender Stereotypes

We find significant but contradictory intersectional differences in gender stereotypes (Table 2). For Gender-Career stereotypes, the iGPT-encoded bias for White males vs. Black females is insignificant though there is a bias (effect size , ) for male vs. female in general. There is significant Gender-Career stereotype bias between embeddings of White males vs. White females (iGPT effect size , ), even higher than the general case; this result conforms to the race hypothesis, which predicts gender stereotypes are more similar to the stereotypes between Whites than between Blacks. The career-family bias between White males and Black males is reversed; embeddings for images of Black males are more associated with career and images of White men with family (iGPT effect size , ). One explanation for this result is under-representation; there are likely fewer photos depicting Black men with non-stereotypical male attributes.

Unexpectedly, the intersectional test of male vs. female (with equal representation for White and Black people) reports no significant Gender-Science bias, though the normative test (with unequal representation) does (Table 1). Nevertheless, race-science stereotypes do emerge when White males are compared to Black males (iGPT effect size , ) and, to an even greater extent, when White males are compared to Black females (iGPT effect size , ), confirming the intersectional hypothesis [30]. But visual Gender-Science biases do not conform to the race hypothesis; the gender stereotype between White males and White females is insignificant, though the overall male vs. female bias is not.

7.3 Origins of Bias

Bias in Web Images

Do these results correspond with our hypothesis that biases are learned from the co-occurrence of social group members with certain stereotypical or high-valence contexts? Both our models were pre-trained on ImageNet, which is composed of images collected from Flickr and other Internet sites [64]. Yang et al. [78] show that the ImageNet categories unequally represent of race and gender; for instance, the “groom” category may contain mostly White people. Under-representation in the training set could explain why, for instance, White people are more associated with pleasantness and Black people with unpleasantness. There is a similar theory in social psychology: most bias takes the form in-group favoritism, rather than out-group derogation [40]. In image datasets, favoritism could take the form of unequal representation and have similar effects. For example, one of the exemplars for “pleasantness” is “wedding,” a positive-valence, high imagery word [6]; if White people appear with wedding paraphernalia more often than Black people, they could be automatically associated with a concept like “pleasantness,” even though no explicit labels for “groom” and “White” are available during training.

Likewise, the portrayal of different social groups in context may be automatically learned by unsupervised image models. Wang et al. [73] find that in OpenImages (also scraped from Flickr) [46], a similar benchmark classification dataset, a higher proportion of “female” images are set in the scene “home or hotel” than “male” images. “male” is more often depicted in “industrial and construction” scenes. This difference in portrayal could account for the Gender-Career biases embedded in unsupervised image embeddings. In general, if the portrayal of people in Internet images reflects human social biases that are documented in cognition and language, we conclude that unsupervised image models could automatically learn human-like biases from large collections of online images.

Disparate Bias Across Model Layers

Model design choices might also have an effect on how social bias is learned in visual embeddings. We find that embedded social biases vary not only between models pre-trained on the same data but also within layers of the same model. In addition to the high quality embeddings extracted from the middle of the model, we tested embeddings extracted at the next-pixel logistic prediction layer of iGPT (Appendix C). We found that unlike the high quality embeddings reported in Table 1, next-pixel prediction embeddings do not exhibit the baseline Insect-Flower valence bias and only encode significant bias at the level for the Gender-Science and Sexuality IATs. To explain this difference in behavior, recall that the neural network used in iGPT learns different levels of abstraction at each layer; as an example, imagine that first layer encodes lighting particularly well, while the second layer begins to encode curves. The contradiction between biases in the middle layers and biases in the projection head are consistent with two previous findings: 1) bias is encoded disparately across the layers of unsupervised pre-trained models, as Bommasani et al. [9] show in the language domain; 2) in transformer models, the highest quality features for image classification, and possibly also social bias prediction, are found in the middle of the base network [15]. Evidently, bias depends not only on the training data but also on the choice of model.

Bias in Autoregression

Though the next-pixel prediction features contained very little significant bias, they may still propagate stereotypes in practice. As a qualitative case study, we collected indoor and outdoor portraits of male and female people using the search terms {“male”, “female”} combined with {“outdoor portrait,” “indoor portrait”}. We cropped the portraits below the neck and used iGPT to generate 8 different completions (with the temperature hyperparameter set to , following Chen et al. [15]). We found that iGPT completions of regular, business-like indoor and outdoor portraits of clothed women and men often feature large breasts and bathing suits: in 6 of the 10 portraits we tested, at least one of the 8 completions featured a bikini or low-cut top. This behavior could be a result of an increase in the sexualized portrayal of people, especially women, in images [31] and serves as a reminder of computer vision’s controversial history with Playboy centerfolds and objectifying images [42]. Figure 3 shows how the incautious and unethical application of a generative model like iGPT could produce fake, sexualized depictions of women (in this case, a politician).

(a) Cropped image of Alexandria Ocasio-Cortez (U.S. Rep., NY) in business attire sitting indoors. Photograph by David Pexton [1].
(b) 8 random autoregressive completions of the cropped images. The majority of images depict her in a bikini or low-cut top.
Figure 3: Example of sexualized image completion with iGPT, pre-trained on ImageNet.

8 Discussion

By testing for bias in unsupervised models pre-trained on a widely used large computer vision dataset, we show how biases may be learned automatically from images and embedded in general-purpose representations. Not only do we observe human-like biases in the majority of our tests, we also detect 4 of 5 human biases also replicated in natural language [12]. Caliskan et al. [12] show that artifacts of the societal status quo, such as occupational gender statistics, are imprinted in online text and mimicked by machines. We suggest that a similar phenomenon is occurring for online images. One possible culprit is confirmation bias [65], the tendency of individuals to consume and produce content close to the norm. In fact, self-supervised models exhibit the same tendency [3].

In addition to confirming human and natural language machine biases in the image domain, the iEAT measures visual biases that may implicitly affect human and machines but cannot be captured in text corpora. Foroni and Bel-Bahar [26] conjecture that in humans, picture-IATs and word-IATs measure different mental processes. More research is needed to explore biases embedded in images and investigate their origins, as Brunet et al. [10] suggest for language models. Tenney et al. [70] show that contextual representations learn syntactic and semantic features from the context. Voita et al. [72] explain the change of vector representations among layers based on the compression/prediction trade-off perspective. Advances in this direction would contribute to our understanding of the causal factors behind visual perception and biases related to cognition and language acquisition.

Our methods come with some limitations. The biases we measure are in large part due to patterns learned from the pre-training data, but ImageNet 2012 does not necessarily represent the entire population images currently produced and circulated on the Internet. Additionally, ImageNet 2012 is intended for object detection, not fine-grained distinction between people, and both our models were validated for non-person object classification. Recently, Yang et al. [78] proposed updates to improve fairness and representation in the ImageNet “person” category. Given the financial and carbon costs of the computation required to train highly parameterized models like iGPT, we did not train these neural models on the same large-scale corpora to compare how a model’s architecture learns biases. Complementary iEAT bias testing with unsupervised models pre-trained on an updated version of ImageNet could help quantify the effectiveness of dataset de-biasing strategies.

A model like iGPT, pre-trained on a more comprehensive private dataset from a platform like Instagram or Facebook, 15 could encode much more information about contemporary social biases. Clearview AI reportedly scraped over 3 billion images from Facebook, YouTube, and millions of other sites for their face recognition model [41]. One recent preprint submitted to ICLR 2021 [2] trained a very similar transformer model on Google’s JFT-300M, a 300 million image dataset scraped from the web [66]. Further research is needed to determine how architecture choices affect embedded biases and how dataset filtering and balancing techniques might help [75, 74]. Previous metric-based and adversarial approaches generally require labeled datasets [74, 73, 75]. Our method overcomes the limitations of requiring laborious manual labeling.

Though models like these may be useful for quantifying contemporary social biases as they are portrayed in vast quantities of images on the Internet, our results suggest the use of unsupervised pre-training on images at scale is likely to propagate harmful biases. Given the high computational and carbon cost of model training at scale, transfer learning with pre-trained models is an attractive option for practitioners. But our results indicate that patterns of stereotypical portrayal of social groups do affect unsupervised models, so careful research and analysis is needed before these models make consequential decisions about individuals and society. Our method can be used to assess task-agnostic biases contained in a dataset to enhance transparency [28, 51], but bias mitigation for unsupervised transfer learning is a challenging open problem.

9 Conclusions

We develop a principled method for measuring bias in unsupervised image models, adapting embedding association tests used in the language domain. With image embeddings extracted by state-of-the-art unsupervised image models pre-trained on ImageNet, we successfully replicate validated bias tests in the image domain and document several social biases. Our results suggest that unsupervised image models learn human biases from the way people are portrayed in images on the web. These findings serve as a caution for computer vision practitioners using transfer learning: pre-trained models may embed all types of harmful human biases from the way people are portrayed in training data, and model design choices determine whether and how those biases are propagated into harms downstream.

Appendix A Attribute Words

We selected the following words for high/low valence and high imagery from the scores collected by Bellezza et al. [6] in a laboratory experiment.

Positive words: baby, ocean, beach, butterfly, gold, rainbow, sunset, money, diamond, flower, sunrise
Negative words: devil, morgue, slum, corpse, coffin, jail, roach, funeral, prison, vomit, crash

Appendix B Stimuli sources

For every verbal stimulus used to collect image stimuli for the verbal and mixed-mode IATs, we recorded the verbal stimulus (word or phrase), search terms used to collect images, and the number of images collected in a CSV file along with our code at

Appendix C Tests with next-pixel features

In addition to the high-quality image classification features extracted from iGPT, we tested embeddings from the final layer of the model used to solve the next-pixel prediction task, as depicted in Figure 3. This logit layer, when taken as a set of probabilities with softmax or a similar function, are used to solve the next-pixel prediction task for unconditional image generation and conditional image completion [15]. Table 3 reports the iEAT tests results for these embeddings, which did not display the same correspondence with human bias as the embeddings for image classification.

Age Young Old Pleasant Unpleasant 6 55 0.38 0.38
Arab-Muslim Other Arab-Muslim Pleasant Unpleasant 10 55 0.06 0.42
Asian European American Asian American American Foreign 6 6 0.25 0.36
Disability Disabled Abled Pleasant Unpleasant 4 55 -0.65 0.76
Gender-Career Male Female Career Family 40 21 0.04 0.44
Gender-Science Male Female Science Liberal Arts 40 21 0.37 0.06
Insect-Flower Flower Insect Pleasant Unpleasant 35 55 -0.32 0.91
Native European American Native American U.S. World 8 5 0.32 0.26
Race European American African American Pleasant Unpleasant 6 55 -0.17 0.62
Religion Christianity Judaism Pleasant Unpleasant 7 55 0.29 0.30
Sexuality Gay Straight Pleasant Unpleasant 9 55 0.69 0.08
Skin-Tone Light Dark Pleasant Unpleasant 7 55 0.42 0.36
Weapon White Black Tool Weapon 6 7 -1.64 1.00
Weapon (Modern) White Black Tool Weapon 6 9 -1.19 0.98
Weight Thin Fat Pleasant Unpleasant 10 55 -0.84 0.97
Originally a picture-IAT (image-only stimuli). Originally a mixed-mode IAT (image and verbal stimuli).
Table 3: iEAT tests for the association between target concepts vs. (represented by images each) and attributes vs. (represented by images each) in embeddings for iGPT next-pixel prediction. Association effect sizes , colored by conventional small (0.2), medium (0.5), and large (0.8) size are reported alongside permutation -values.


  1. In fact, Nosek and Banaji [55] point out that images (e.g. faces) may be better exemplars because they do not suffer from certain confounding effects in words (e.g. infrequent names).
  2. We use an exact, non-parametric permutation test over all possible partitions. There are no normality assumptions about the distribution of the null hypothesis.
  3. Both models were tested on the Tensorflow version of ILSVRC 2012, available at
  4. [16] conjecture that is likely to strip useful information, such as object color and orientation, because it is invariant to data transformation.
  5. A few words were too abstract to be translated into image stimuli. These words are listed in Appendix B with a sample size of 0. Also, the Gender-Career IAT used specific male- and female-sounding names, rather than general exemplars like “man" or “father" as in the Gender-Science IAT. We used the general exemplars for both tests.
  6. In the original IATs, the target and attribute set sizes and (ranging from 5-15 stimuli each) are low enough to obtain statistically significant results. To account for varying portrayals for each verbal stimulus, we attempted to collect representative images for each individual stimulus, resulting in total category sizes and of 30-50 each. The precise number of image stimuli collected for each corresponding verbal stimulus is recorded in Appendix B.
  7. iGPT was validated with linear evaluation on CIFAR-100, indicating that the high-quality features are externally predictive of images from this dataset. Only 3 of over 105 IAT verbal stimuli appear in CIFAR-100; the rest were collected with Google Image Search.
  8. Some verbal stimuli (e.g. “salary") are difficult to express verbally without the use of symbols (e.g. a picture of cash). In these cases, we collected only the first image () that meets the criteria, preferring image stimuli corresponding to other, more visual cues and representations.
  9. This property is common to all the picture-IAT stimuli [56].
  10. If no images with white backgrounds appeared in the first 50 results, we searched for “[stimulus] + {white, plain} background."
  11. For example, for the Gender-Career test, which includes the “family" attribute, we chose only images of families with equal numbers of men and women.
  12. When trained on ImageNet, features extracted from iGPT can be used to train a linear image classifier with 88.5% accuracy on CIFAR-100; SimCLRv2 embeddings reach 89% accuracy [15].
  13. For all tests, significance could be increased by including more stimuli, at the risk of diluting the test set with less-representative images from farther down in the search results.
  14. Note that due to the original IAT design [56], some racial bias tests examine “European Americans" vs. “African Americans" while others test for “White vs. Black." These are technically different categories, but both test for racial bias.
  15. The largest version of iGPT (not publicly available) was pre-trained on 100 million additional unlabeled web images [15].


  1. (2019-07) Alexandria Ocasio-Cortez on the 2020 Presidential Race and Trump’s Crisis at the Border. The New Yorker. External Links: Link Cited by: 2(a).
  2. Anonymous (2021) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Submitted to International Conference on Learning Representations, External Links: Link Cited by: §8.
  3. E. Arazo, D. Ortego, P. Albert, N. E. O’Connor and K. McGuinness (2020) Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §8.
  4. P. Bachman, R. D. Hjelm and W. Buchwalter (2019) Learning Representations by Maximizing Mutual Information Across Views. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox and R. Garnett (Eds.), pp. 15535–15545. External Links: Link Cited by: §1.
  5. C. Basta, M. R. Costa-Jussà and N. Casas (2019) Evaluating the underlying gender bias in contextualized word embeddings. arXiv preprint arXiv:1904.08783. Cited by: §1, §2.
  6. F. S. Bellezza, A. G. Greenwald and M. R. Banaji (1986-05) Words high and low in pleasantness as rated by male and female college students. Behavior Research Methods, Instruments, & Computers 18 (3), pp. 299–303. External Links: Link, Document, ISSN 07433808 Cited by: Appendix A, §5.2, §7.3.1.
  7. S. L. Blodgett, S. Barocas, H. Daumé III and H. Wallach (2020) Language (Technology) is Power: A Critical Survey of” Bias” in NLP. arXiv preprint arXiv:2005.14050. Cited by: §1.
  8. T. Bolukbasi, K. Chang, J. Y. Zou, V. Saligrama and A. T. Kalai (2016) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon and R. Garnett (Eds.), pp. 4349–4357. External Links: Link Cited by: §1.
  9. R. Bommasani, K. Davis and C. Cardie (2020-07) Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4758–4781. External Links: Link, Document Cited by: §2, §2, §7.3.2.
  10. M. Brunet, C. Alkalay-Houlihan, A. Anderson and R. Zemel (2019-05) Understanding the Origins of Bias in Word Embeddings. In Proceedings of the 36th International Conference on Machine Learning, pp. 803–811. External Links: Link, ISSN 2640-3498 Cited by: §1, §8.
  11. J. Buolamwini and T. Gebru (2018) Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research, S. A. Friedler and C. Wilson (Eds.), Vol. 81, pp. 1–15. External Links: Link Cited by: §1.
  12. A. Caliskan, J. J. Bryson and A. Narayanan (2017) Semantics Derived Automatically from Language Corpora Contain Human-like Biases. Technical report Technical Report 6334, Vol. 356, Science. External Links: Link, Document Cited by: 3rd item, §1, §1, §2, §2, §2, §3.3, §3, §7.1.4, §8.
  13. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko (2020) End-to-End Object Detection with Transformers. arXiv preprint arXiv:2005.12872. Cited by: §1.
  14. G. H. Chen (2020-07) Deep Kernel Survival Analysis and Subject-Specific Survival Time Prediction Intervals. External Links: Link Cited by: 2nd item, §1, §1.
  15. M. Chen, A. Radford, R. Child, J. Wu, H. Jun, P. Dhariwal, D. Luan and I. Sutskever (2020) Generative Pretraining from Pixels. In Proceedings of the 37th International Conference on Machine Learning, Cited by: Appendix C, 2nd item, §1, §1, §3, §3, §4.1, §4.1, §4.1, §4.0.2, §7.3.2, §7.3.3, footnote 12, footnote 15.
  16. T. Chen, S. Kornblith, M. Norouzi and G. Hinton (2020) A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria. External Links: Link Cited by: §1, §1, §3, §4.2, §4.2, §4.2, §4.0.2, §7.1.1, footnote 4.
  17. T. Chen, S. Kornblith, K. Swersky, M. Norouzi and G. Hinton (2020) Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029. Cited by: §4.2, §4.2.
  18. K. Crenshaw (1990) Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stan. L. Rev. 43, pp. 1241. Cited by: §3.2.
  19. M. De-Arteaga, A. Romanov, H. Wallach, J. Chayes, C. Borgs, A. Chouldechova, S. Geyik, K. Kenthapadi and A. T. Kalai (2019) Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 120–128. Cited by: §2.
  20. I. Deborah Raji, T. Gebru, M. Mitchell, J. Buolamwini, J. Lee and E. Denton (2020) Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing. 7. External Links: Link, ISBN 9781450371100, Document Cited by: §1.
  21. J. Devlin, M. Chang, K. Lee and K. Toutanova (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
  22. J. Donahue and K. Simonyan (2019) Large scale adversarial representation learning. In Advances in Neural Information Processing Systems, pp. 10542–10552. Cited by: §1.
  23. A. H. Eagly, A. Mladinic and S. Otto (1991) Are women evaluated more favorably than men?: An analysis of attitudes, beliefs, and emotions. Psychology of Women Quarterly 15 (2), pp. 203–216. Cited by: §7.2.1.
  24. D. Erhan, A. Courville, Y. Bengio and P. Vincent (2010) Why does unsupervised pre-training help deep learning?. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 201–208. Cited by: §1.
  25. D. Erhan, P. Manzagol, Y. Bengio, S. Bengio and P. Vincent (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In Artificial Intelligence and Statistics, pp. 153–160. Cited by: §1.
  26. F. Foroni and T. Bel-Bahar (2010-03) Picture-IAT versus word-IAT: Level of stimulus representation influences on the IAT. European Journal of Social Psychology 40 (2), pp. 321–337. External Links: Document, ISSN 00462772 Cited by: §8.
  27. N. Garg, L. Schiebinger, D. Jurafsky and J. Zou (2018-04) Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences of the United States of America 115 (16), pp. E3635–E3644. External Links: Link, Document, ISSN 10916490 Cited by: §1.
  28. T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. Daumé III and K. Crawford (2018) Datasheets for datasets. arXiv preprint arXiv:1803.09010. Cited by: §8.
  29. A. Geiger, P. Lenz and R. Urtasun (2012) Are we ready for autonomous driving? the KITTI vision benchmark suite. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. External Links: ISBN 9781467312264, Document, ISSN 10636919 Cited by: §1.
  30. N. Ghavami and L. A. Peplau (2013) An intersectional analysis of gender and ethnic stereotypes: Testing three hypotheses. Psychology of Women Quarterly 37 (1), pp. 113–127. Cited by: 4th item, §3.2, §6, §7.2.1, §7.2.2.
  31. K. A. Graff, S. K. Murnen and A. K. Krause (2013) Low-cut shirts and high-heeled shoes: Increased sexualization across time in magazine depictions of girls. Sex roles 69 (11-12), pp. 571–582. Cited by: §7.3.3.
  32. A. G. Greenwald, D. E. McGhee and J. L. Schwartz (1998-06) Measuring Individual Differences in Implicit Cognition: The Implicit Association Test. Journal of Personality and Social Psychology 74 (6), pp. 1464–80. External Links: Link Cited by: 3rd item, §2, §2, Figure 2, §3.1, §3.3, §5.2, §5, §6.
  33. A. G. Greenwald, T. A. Poehlman, E. L. Uhlmann and M. R. Banaji (2009) Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity.. Journal of personality and social psychology 97 (1), pp. 17. Cited by: §3.1.
  34. A. G. Greenwald, B. A. Nosek and M. R. Banaji (2003-08) Understanding and Using the Implicit Association Test: I. An Improved Scoring Algorithm. Journal of Personality and Social Psychology 85 (2), pp. 197–216. External Links: Link, Document, ISSN 00223514 Cited by: §2, §3.1, §5.
  35. W. Guo and A. Caliskan (2020) Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. arXiv preprint arXiv:2006.03955. Cited by: §1, §2, §2.
  36. D. Harwell (2019-11) A face-scanning algorithm increasingly decides whether you deserve the job. Washington. External Links: Link Cited by: Figure 1, §1, §1.
  37. K. He, H. Fan, Y. Wu, S. Xie and R. Girshick (2020) Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738. Cited by: §1.
  38. K. He, X. Zhang, S. Ren and J. Sun (2016) Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. External Links: Link Cited by: §4.2.
  39. L. A. Hendricks, K. Burns, K. Saenko, T. Darrell and A. Rohrbach (2018) Women Also Snowboard: Overcoming Bias in Captioning Models. CoRR. External Links: ISBN 1803.09797v3, Document Cited by: §1.
  40. M. Hewstone, M. Rubin and H. Willis (2002) Intergroup bias. Annual review of psychology 53 (1), pp. 575–604. Cited by: §7.3.1.
  41. K. Hill (2020-01) The Secretive Company That Might End Privacy as We Know It. External Links: Link Cited by: §1, §8.
  42. C. Iozzio (2016-02) The Playboy Centerfold That Revolutionized Image-Processing Research. The Atlantic. External Links: Link Cited by: §7.3.3.
  43. M. Kay, C. Matuszek and S. A. Munson (2015) Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15, New York, New York, USA, pp. 3819–3828. External Links: Link, ISBN 9781450331456, Document Cited by: §1, §1, §5, §7.1.3.
  44. A. Krizhevsky, G. Hinton and others (2009) Learning multiple layers of features from tiny images. Cited by: §4.0.1, item 1, §5.
  45. K. Kurita, N. Vyas, A. Pareek, A. W. Black and Y. Tsvetkov (2019-09) Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, Florence, Italy, pp. 166–172. External Links: Link, Document Cited by: §2.
  46. A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig and others (2018) The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982. Cited by: §7.3.1.
  47. V. Manjunatha, N. Saini and L. Davis (2019-02) Explicit Bias Discovery in Visual Question Answering Models. pp. 9554–9563. External Links: Document Cited by: §1.
  48. C. May, A. Wang, S. Bordia, S. R. Bowman and R. Rudinger (2019) On Measuring Social Biases in Sentence Encoders. In Proceedings of the 2019 Conference of the North, Stroudsburg, PA, USA, pp. 622–628. External Links: Link, Document Cited by: §2, §2, §3.3, §4.2.
  49. T. Mikolov, K. Chen, G. Corrado and J. Dean (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: §2.
  50. I. Misra and L. Van Der Maaten (2020) Self-Supervised Learning of Pretext-Invariant Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6707–6717. Cited by: §1.
  51. M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji and T. Gebru (2019) Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pp. 220–229. Cited by: §8.
  52. F. Nex and F. Remondino (2014) UAV for 3D mapping applications: A review. Vol. 6, Springer Verlag. External Links: Document, ISSN 1866928X Cited by: §1.
  53. B. A. Nosek, A. G. Greenwald and M. R. Banaji (2007) The Implicit Association Test at Age 7: A Methodological and Conceptual Review. In Automatic processes in social thinking and behavior, J. A. Bargh (Ed.), pp. 265–292. Cited by: §2.
  54. B. A. Nosek, M. R. Banaji and A. G. Greenwald (2002) Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics 6 (1), pp. 101–115. External Links: Link, Document, ISSN 10892699 Cited by: §3.1, §5.
  55. B. A. Nosek and M. R. Banaji (2001-12) The GO/NO-GO Association Task. Social Cognition 19 (6), pp. 625–664. External Links: Link, Document, ISSN 0278016X Cited by: §2, footnote 1.
  56. B. A. Nosek, F. L. Smyth, J. J. Hansen, T. Devos, N. M. Lindner, K. A. Ranganath, C. T. Smith, K. R. Olson, D. Chugh, A. G. Greenwald and M. R. Banaji (2007-11) Pervasiveness and correlates of implicit attitudes and stereotypes. European Review of Social Psychology 18 (1), pp. 36–88. External Links: Link, Document, ISSN 1046-3283 Cited by: §3.1, §5, §7.2.1, Table 1, footnote 14, footnote 9.
  57. J. Pennington, R. Socher and C. D. Manning (2014) Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §2.
  58. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365. Cited by: §1.
  59. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever (2019) Language models are unsupervised multitask learners. OpenAI Blog 1 (8), pp. 9. Cited by: §1, §1, §4.1.
  60. M. Raghavan and S. Barocas (2019) Challenges for mitigating bias in algorithmic hiring. URL https://www. brookings. edu/research/challenges-for-mitigating-bias-in-algorithmic-hiring. Cited by: §2.
  61. M. Raghavan, S. Barocas, J. Kleinberg and K. Levy (2020-01) Mitigating bias in algorithmic hiring: Evaluating claims and practices. In FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 469–481. External Links: Link, ISBN 9781450369367, Document Cited by: §1.
  62. B. Recht, R. Roelofs, L. Schmidt and V. Shankar (2019) Do imagenet classifiers generalize to imagenet?. arXiv preprint arXiv:1902.10811. Cited by: 3rd item.
  63. O. Russakovsky, J. Deng, Z. Huang, A. C. Berg and L. Fei-Fei (2013) Detecting avocados to zucchinis: what have we done, and where are we going?. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2064–2071. Cited by: §4.0.1.
  64. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and L. Fei-Fei (2015-12) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (3), pp. 211–252. External Links: Link, Document, ISSN 0920-5691 Cited by: 2nd item, §1, §4.2, §4.0.1, §7.3.1.
  65. S. Schweiger, A. Oeberst and U. Cress (2014) Confirmation bias in web-based search: A randomized online study on the effects of expert information and social tags on information search and evaluation. Journal of Medical Internet Research 16 (3). External Links: Link, Document, ISSN 14388871 Cited by: §8.
  66. C. Sun, A. Shrivastava, S. Singh and A. Gupta (2017) Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pp. 843–852. Cited by: §1, §8.
  67. L. Sweeney (1997) Weaving technology and policy together to maintain confidentiality (vol 25, pg 2, 1997). Journal Of Law Medicine & Ethics 25 (4), pp. 327 (English). External Links: ISSN 1073-1105 Cited by: §1.
  68. C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang and C. Liu (2018) A survey on deep transfer learning. In International conference on artificial neural networks, pp. 270–279. Cited by: §1.
  69. Y. C. Tan and L. E. Celis (2019) Assessing Social and Intersectional Biases in Contextualized Word Representations. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox and R. Garnett (Eds.), pp. 13230–13241. External Links: Link Cited by: §1, §2, §2, §3.2.
  70. I. Tenney, P. Xia, B. Chen, A. Wang, A. Poliak, R. T. McCoy, N. Kim, B. Van Durme, S. R. Bowman, D. Das and others (2019) What do you learn from context? probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316. Cited by: §8.
  71. A. Vaswani, G. Brain, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin (2017) Attention Is All You Need. In 31st Conference on Neural Information Processing Systems (NIPS), Cited by: §4.1.
  72. E. Voita, R. Sennrich and I. Titov (2019) The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. arXiv preprint arXiv:1909.01380. Cited by: §8.
  73. A. Wang, A. Narayanan and O. Russakovsky (2020) REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets. In European Conference on Computer Vision, Cited by: §1, §7.3.1, §8.
  74. T. Wang, J. Zhao, M. Yatskar, K. Chang and V. Ordonez (2019) Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5310–5319. Cited by: §8.
  75. Z. Wang, K. Qinami, I. C. Karakozis, K. Genova, P. Nair, K. Hata and O. Russakovsky (2020) Towards fairness in visual recognition: Effective strategies for bias mitigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8919–8928. Cited by: §8.
  76. B. Wilson, J. Hoffman and J. Morgenstern (2019-02) Predictive Inequity in Object Detection. External Links: Link Cited by: §1.
  77. K. Xu, B. Nosek and A. Greenwald (2014-03) Data from the Race Implicit Association Test on the Project Implicit Demo Website. Journal of Open Psychology Data 2 (1), pp. e3. External Links: Link, Document Cited by: §5.1.
  78. K. Yang, K. Qinami, L. Fei-Fei, J. Deng and O. Russakovsky (2020) Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, New York, NY, USA, pp. 547–558. External Links: Link, ISBN 9781450369367, Document Cited by: §7.3.1, §8.
  79. J. Zhao, T. Wang, M. Yatskar, V. Ordonez and K. W. Chang (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 2979–2989. External Links: ISBN 9781945626838, Document Cited by: §1, §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description