Discovering and Interpreting Conceptual Biases in Online Communities

Discovering and Interpreting Conceptual Biases in Online Communities


Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy. This capability of word embeddings has been successfully exploited as a tool to quantify and study human biases. However, previous studies only consider a predefined set of conceptual biases to attest (e.g., whether gender is more or less associated with particular jobs), or just discover biased words without helping to understand their meaning at the conceptual level. As such, these approaches are either unable to find conceptual biases that have not been defined in advance, or the biases they find are difficult to interpret and study. This makes existing approaches unsuitable to discover and interpret biases in online communities, as such communities may carry different biases than those in mainstream culture. This paper proposes a general, data-driven approach to automatically discover and help interpret conceptual biases encoded in word embeddings. We apply this approach to study the conceptual biases present in the language used in online communities and experimentally show the validity and stability of our method.

I.2 Artificial Intelligence, I.2.7 Natural Language Processing, O.8.15 Social science methods or tools, O.9 Ethical/Societal Implications

1 Introduction

Linguistic biases have been the focus of human language analysis for quite some time [24, 6, 25]. Recently, it has been proven that machine learning approaches applied to corpora of human language, such as word embeddings, learn human-like semantic biases [9, 12]. Beyond the value this has towards creating fairer Artificial Intelligence  [5, 52, 48, 16], it has also enabled compelling methods for social scientists to study human language and biases [11, 54, 47, 18, 28]. That is, by looking at the biases learned by a machine learning model, it is possible to study human biases in more detail. For instance, by training word embeddings models and analyzing them, Garg et al. [18] were able to study the dynamics and the evolution of predefined biases related to gender and religion in the United States across 100 years.

Existing approaches to study biases in word embeddings (see Section 2 for details), however, have one of the following limitations. That is, either: i) they are only able to attest whether arbitrarily predefined biases exist or not, so they are unable to discover what the actual most salient biases might be in a model; or ii) they only discover biased words, which is of limited value when trying to make sense of the biases discovered and what they mean at a conceptual level. These limitations make current approaches unsuitable to study biases in the language of online communities. This is because such communities often exhibit biases distinct from those in mainstream culture [32], so anticipating potential biases in advance is difficult; and because understanding the meaning of those biases is needed to properly interpret and study them [1, 46, 13]. Studying the biases of online communities is very important to tackle the social problems they have been associated with, such as radicalization [32] and discrimination against some types of users, particularly those users with protected attributes such as gender, religion, ethnicity, social class, etc. [46, 3].

In this paper, we present a general, data-driven approach to discover conceptual linguistic biases in word embeddings. Concepts of interest, e.g. those related to protected attributes, such as gender, religion and the like, are used to discover the most frequent and biased words towards them (e.g. frequent words more biased towards ‘women’ than ‘men’). Importantly, and as the main contribution of our work, these words are then categorized through semantic clustering and semantic analysis, and the resulting clusters are labeled and ranked. All of this allows our method to help make sense of the most important biases at a conceptual level, both in general and in a more detailed manner, using the vocabulary of the online community studied.

The paper is structured as follows: Section 2 discusses related work and details the gap this paper addresses. Section 3 introduces the notion of bias strength in word embeddings, and the preliminaries on which our method bases. We present our approach to discover and interpreting conceptual biases, together with the different bias ranking in Section 4. We apply our method to discover conceptual biases in models trained using the corpora of two online communities (/r/TheRedPill, /r/Atheism) on the English-speaking discussion platform Reddit in Section 5. We provide an extensive evaluation of our method in Section 6, showing the stability of the biases discovered and the effect of the different parameters of our model and those used to train the embeddings model, and demonstrating the validity of our method by applying it to the general-purpose Google News pre-trained model and comparing our method with arbitrarily predefined biases in that model attested by previous work. Finally, we finish with some concluding remarks and pointers to the available datasets, code and demo of the tool developed based on our method in Section 7.

2 Related Work

In this section, we discuss two main streams in the most related work to this paper. One stream includes those works attesting predefined conceptual biases, and the other stream includes those works discovering biased words in word embeddings.

Works attesting conceptual biases in word embeddings measure the association between predefined concepts usually related to known stereotypes – for instance, whether men are more often associated with a professional career while women are more often associated with family [12, 18, 44, 55, 37, 31, 29, 28, 10]. An example of measure used to this end is the Word Embeddings Association Test (WEAT) [12], inspired by the Implicit Association Test (IAT) [22], widely used in psychology and the social sciences to study stereotypes [27]. In particular, all the concepts involved (e.g. men/women and career/family) are represented via sets of words, and WEAT compares distances in the word embeddings model between those sets of words using cosine similarity. While being able to attest biases between arbitrary, predefined concepts is of high value, it nonetheless requires all the involved concepts to be defined in advance, which may not be possible in online communities, as they may exhibit biases and stereotypes that are not based on those in mainstream culture [32]. In contrast, in our work, we only consider as input the attribute concepts of interest, e.g. men/women, and our method then discovers all other concepts that are most associated with these attribute concepts.

Works discovering biased words overcome some of the limitations of attesting by enumerating all the words that are biased towards others in a word embeddings model [9, 53, 45, 21, 8]. However, the resulting lists of biased words do not explain what these biases mean or to what extent they are important in the context of a community, both of which are needed to properly interpret and study the biases discovered in a community [1, 46, 13]. Our previous work in this direction is presented in [17], in which we define a system to identify a set of words biased towards certain concepts, and organize them in categories in order to compare their biases. Although the approach gives an idea of what are the most biased words towards certain concepts in a community, and to which semantic categories they belong, it has several limitations. First, the bias measure used does not consider the frequency when selecting the set of biased words, which means that some of the biases may be rare and unrepresentative of the community. Second, the threshold used to select biased words is manually determined by analyzing the distribution of bias in the community, which makes it difficult to automate. Third, the analysis of biases performed does not allow for a link between the conceptual biases and the actual language used by the community. A further limitation is that the work neither considered the potential impact of stability, nor did it recognize the effect of different parameters in the model.

In this paper, we combine both frequency and bias to automatically identify the most salient words in a community. That is, words that are both frequently used and strongly biased towards the concepts of interest. Afterwards, these words are aggregated based on their semantic similarity, by clustering them using the word embeddings model. The resulting semantically-similar clusters are analyzed in two ways. First, we categorize the major discourse field each of the clusters belongs to and provide a relative frequency of the fields. Second, clusters are ranked based on the strength of their bias, their frequency, and their sentiment to offer a more detailed view of the biases discovered and how they are expressed in the language of the community in a more or less biased, frequent, or positive/negative way. The semantic categorization and different rankings provides a context that offers both a general and more detailed view of the biases at the conceptual level over the different dimensions of bias and in an automatic manner.

3 Preliminaries

The first step to be able to discover conceptual biases towards the attribute concepts of interest (e.g. men/women), is to discover the specific words that are biased towards the attribute concepts. To do this, we need to measure the bias between words in a word embeddings model. Given a word embeddings model built from a text corpora, and two sets of words representing the attribute concepts (e.g. men/women) one wants to discover biases towards, a common approach in the literature1 is to leverage the cosine similarity between embeddings to identify words close to an attribute concept (e.g. men) and far from the other attribute concept (e.g. women) as we detail below.

Bias Strength. Let = and = be two sets of words that represent two different attribute concepts we want to discover biases towards — e.g. {she, daughter, her, …, mother} and {he, son, him, …, father} describing the concepts women and men — and and the centroids of and respectively, estimated by averaging the embedding vectors of all words in each attribute concept; we say that a word is biased towards with respect to when the cosine similarity between the embedding of is higher for than for :


Larger absolute values of correspond to stronger biases, positive values of bias indicate that word is biased towards , and negative values of indicate that word is biased towards . By applying the bias strength equation to the entire community’s vocabulary, one is able to discover all the words that are biased towards attribute concepts with respect to and vice versa.

4 Discovering and Interpreting Conceptual Biases

Beyond discovering individually biased words, as stated in the previous section, our approach aims to discover and help understand the concepts behind the biased words, which prior work does not do. To this aim, we follow two main steps.

In the first step, described in detail in Section 4.1, we consider not only the strength of bias but also its frequency, in order to elicit the most common and frequently biased words in a community. After this, we aggregate all these resulting words into conceptual biases, by clustering the most common biased words based on their embedding similarity.

In the second step, described in detail in Section 4.2, we focus on facilitating the interpretation of the conceptual biases. We do so in two main ways: i) by classifying every conceptual bias into a major discourse field, which further abstracts the meaning of the biases and facilitates a general understanding of the types of biases present in the community; and ii) by creating rankings of the conceptual biases based on the strength, frequency and sentiment of the biases discovered to offer a more detailed view and nuanced analysis.

4.1 Discovering Conceptual Biases

We are not only interested in the strength of the bias of the biased words, but also if biased words are commonly used, hence frequent in the vocabulary. This has too main purposes: i) to focus on the representative biases of a community [1]; and ii) to avoid instability due to low-frequency words [2].

Bias Salience

Let a set for words forming the vocabulary of a word embeddings model. We determine the salience of a word towards an attribute set w.r.t an attribute set , by combining the normalized bias from Equation 1 with the normalized frequency rank2 of the word in the corpus—denoted by and assigning the most frequent word in rank 1 and the least frequent word rank . Words with higher values of salience will be both frequent and biased towards the attribute set with respect to the attribute set . Therefore, salience is computed as follows:


Even when knowing the strength and frequency of biased words, considering each of them as a separate unit is not enough to analyse and understand biases at a more conceptual level. There is a need to semantically combine related terms under broader rubrics in order to facilitate the comprehension of the biases. We start by selecting the most salient words, which are then aggregated through k-means clustering on the word embeddings.

Most Salient Words

We order all by salience, using Equation 2, towards an attribute set with respect to an attribute set (e.g. the set of words representing women with respect to the set for words representing men), and select the most salient words to focus on the most prominent biases.

In order to do this, we consider the distribution of salience values for all the words in the vocabulary (denoted by ) and we consider standard deviations plus the mean of the salience distribution () as the threshold to select the most salient words — we show later, in the extensive experiments conducted and described in Section 6.2, a characterization of how our method behaves with different . That is, we select words biased towards each attribute set with salience greater or equal than . All the words in the vocabulary with salience towards with respect to higher than a threshold form the set . In a similar manner, we create the set of salient words by considering salience towards with respect to . These words are then used for the semantic clustering, as detailed below.

Semantic Word Clustering

For each set of the most salient words in and and their embeddings, we use k-means to aggregate the most semantically similar words into clusters, based on the distance between embeddings. For each set of word embeddings associated with the most salient words (denoted by and ), we apply k-means setting the number of clusters to all values within the interval, and we select the partition that maximizes the silhouette score [41].3 We repeat the clustering times in order to obtain the best partition by considering different k-means random initialisation variables —we show later experimentally how our method behaves with different . Notice that salient words and are clustered per separate hence resulting in two different partitions, and respectively.

After clustering, we apply WEAT [12] between all clusters from both partitions and and attribute concepts and , and only keep these clusters of each partition that return significant p-values towards each respective attribute concept when compared with all clusters from the other partition. In this way, we make sure that the conceptual biases represented in every cluster are relevant and strongly associated to each attribute concept.

The resulting sets of clusters for partitions and represent the most salient and representative conceptual biases towards the attribute concepts and in the community.

4.2 Interpreting Conceptual Biases

In this section, we focus on obtaining a general understanding of the clusters found in a given partition to facilitate the interpretation of the discovered conceptual biases. Since the most salient words are grouped into semantically meaningful clusters, here we aim to organize and understand the meaning of the different clusters and facilitate the comparison between the biases towards each attribute concept. We do so in two main ways: by first categorizing the clusters via semantic tagging and analyzing the frequencies of the semantic tags, and second, by ranking the clusters of a partition based on different metrics. By combining these two ways, our method is able to offer both a general and a detailed view of the conceptual biases of a community.

Categorising Conceptual Biases

To give an overview of the conceptual biases in a community, we conduct a semantic analysis. In particular, we tag every cluster in a partition with the most frequent semantic fields (domains) among its words. That is, for each cluster in a partition , we first obtain the semantic domain associated with each word (denoted by , where is the set of all semantic domains). We also obtain the multiset of semantic domains associated with the cluster as the sum of the semantic domains associated with its words, i.e. , and then we select the semantic domain that is most frequent among all the words in the cluster as the cluster tag, i.e. , where denotes the multiplicity of semantic domain in the multiset of semantic domains associated with cluster . Regarding the specific semantic domains considered, we base on the UCREL Semantic Analysis System (USAS) [39], which has a multi-tier structure with 21 major discourse fields subdivided in more fine-grained semantic domains such as People, Relationships, Power, Ethics and it has been extensively and successfully used for many tasks, such as the automatic content analysis of discourses [51] and as a translator assistant [42].

After the semantic tagging, we then analyse the frequency of the different semantic tags in a given partition . In particular, we compute the multiset formed by the tag of each cluster in the partition, denoted by , and calculate the relative frequency of each semantic domain as , where is the total number of semantic domain tags in a partition , and is the multiplicity of the semantic domain in the multiset . This allows us to get a general idea of the conceptual biases in a partition. For instance, this could show that for a given partition (e.g. the clusters of the most salient biases towards men), the relative frequency of Power () is higher than the relative frequency of Relationships () — as we, in fact, see in one of the datasets (/r/TheRedPill) explored in the application of our method to Reddit communities later on, as described in Section 5.

Ranking Conceptual Biases

While showing the relative frequencies of the categories of the conceptual biases found is informative as an abstract, a birds-eye view of the biases in a community, the distinctions marked may be coarse-grained [40]. Following the example above, even though Power may be a frequent category, we do not actually know how this is expressed in the language of the community, nor how strongly or frequently biased these clusters are towards men. It also does not indicate whether the biases are typically expressed in a positive or negative way.

In this section, we present a more detailed analysis of the conceptual biases found. Instead of looking at an aggregate view of categories of clusters as in the previous section, we focus on establishing different methods to rank each of the clusters discovered. In particular, we define three different metrics to prioritise the clusters of a partition considering: i) the frequency of the bias; ii) the strength of the bias; iii) and the sentiment polarity of the bias. The rankings offer three different but complementary views to help understand the biases found in the community.

We define the rankings based on the clusters of each partition. In particular, given a cluster in a partition , we define the following metrics:

  1. , which measures the aggregated frequency in the text corpus of the words within cluster , therefore establishing a ranking of the most common biases:


    where is the frequency of word in the text corpus.

  2. , which orders the clusters based on the average bias strength of the words in the cluster with respect to the attribute concepts and (see Equation 1). This method assigns higher scores to the clusters that are more biased, and it is useful to identify relevant biases towards an attribute concept when compared to another:


    where is the size of the cluster.

  3. , which orders the clusters based on the average sentiment of the words in the cluster. We particularly consider rankings of both the most positive and most negative biases. Although strong negative polarities might be indicative of perilous biases towards a specific population, the fact that a cluster/word is not tagged with a negative sentiment does not exclude it from being discriminatory in certain contexts. We particularly define as follows:


    where returns a value corresponding to the sentiment polarity of a word , with -1 being strongly negative and +1 strongly positive. Note that our model is agnostic to the sentiment analysis model used and different tools may be used [15].

5 Implementation on Reddit

In this section, we use our method to discover biases in textual corpora collected from two Reddit communities, /r/TheRedPill and /r/Atheism. Although both communities are suspected to have gender and religion biases respectively [50], the actual biases and the form they take – e.g., what the concepts are that are more biased towards women than towards men – are unknown. Table I summarizes the datasets, including the protected attribute (P.Attr) we discover biases towards, the quantity of unique Authors, Comments and Words, words per comment (Wpc), and vocabulary Density (ratio of unique words to total number of words). The sets of words representing the attribute concepts gender and religion are taken from previous work [18, 36] and listed in Appendix B.

Dataset P.Attr Years Authors Comments Words Wpc Density
/r/TheRedPill gender 2012-18 106,161 2,844,130 59,712 52.58
/r/Atheism religion 2008-09 699,994 8,668,991 81,114 38.27
TABLE I: Datasets of Reddit communities used in this paper.

The datasets were collected using the Pushshift data platform [7]. The two Reddit models were trained using an Intel i5-9600K @3.70GHz with 32GB RAM, and an NVIDIA TitanXP GPU. To create an embedding model for each corpus, we first preprocess each comment by removing special characters, splitting text into sentences, and transforming all words to lowercase. Then, for each subreddit, we train a skip-gram word2vec4 word embeddings model, using embeddings of 200 dimensions, words with at least 10 occurrences, a 4-word window and 100 epochs as training parameters, and (four standard deviations) and repetitions as parameters for our method. In Section 6.2, we offer an extensive analysis varying all these parameters, which shows similar results at the conceptual level regardless of the parameter choice. For the sentiment polarity, we used the nltk sentiment analysis system [26]. All our code is available publicly (see Section 7).

5.1 /r/TheRedPill

Fig. 1: Relative frequency of semantic categories for conceptual biases towards women (left) and men (right) in /r/TheRedPill.
Fig. 2: Top-5 clusters, labelled with their most frequent word, biased towards women and men in /r/TheRedPill, ranked by most frequent (), strong (), sentimentally positive (), and sentimentally negative (-) bias.

The /r/TheRedPill community defines itself as a forum for the ’discussion of sexual strategy in a culture increasingly lacking a positive identity for men’ [50]. It has been connected to the online Manosphere [20, 34], a term used to describe a collection of predominantly web-based misogynist ideologies associated with the far-right and alt-right. The name of the subreddit is a reference to the 1999 film The Matrix: ’swallowing the red pill,’ in the community’s parlance, signals the acceptance of an alternative social framework in which men, not women, have been structurally disenfranchised in the west. In response, men must protect themselves against a ‘misandrist’ culture and the feminising of society [32, 30]. We applied our method to /r/TheRedPill in order to be able to discover the exact conceptual biases related to gender present in the community as well as the shape these biases take, i.e., how these biases are expressed in the community’s language.

After applying our method to /r/TheRedPill to discover biases towards women and men considering nouns, adjectives and verbs, we obtain the most salient words towards both attribute concepts, with sizes 216 and 194 respectively. These were then clustered into 93 and 102 clusters, with a maximum of 10 and 8 words, a minimum of 1, and a mean of 2.32 and 1.90 words with a standard deviation of 1.78 and 1.33 for women and men, respectively. The WEATs performed were all with p-values between and .

Figure 1 shows the distribution5 of the categories among clusters for women and men. The most frequent biases towards women refer to relationships (and particularly sexual relationships, used to tag the 26.3% of the total number of conceptual biases discovered), appearance (Clothes and Judgement of Appearance, adding up to 12.3%) and objects (Objects generally, 7%). On the other hand, the most frequent biases towards men refer to Personal Names (adding up to 33% of the conceptual clusters biased towards men, discussed below in the detailed analysis), Power (15.8%), Warfare (10.5%) and Violence/Crime (adding up to 14%, including the categories Calm/violent/angry and Crime). The semantic categorization of discovered biases shows a very different picture of the biases found towards the two genders: the most frequent semantic labels used to tag conceptual biases towards women are predominantly related to objectification, i.e., appearance and sex, while men are predominantly described in relation to positions of power and strength, i.e., becoming successful agential subjects in the realm of dating and sex.

A more detailed view is shown in Figure 2, which compares the top-5 clusters for women and men in /r/TheRedPill, labeled using the most frequent word in the cluster, and ranked by the frequency, strength, and sentiment of the bias. In each figure, clusters are shown ranked from left to right starting with the highest ranked cluster and ignoring clusters with the same stem to show different conceptual biases. The size of the cluster shows the aggregated frequency of its words (being cluster ‘sex’ the most frequent, with words within it appearing more than 176K times in /r/TheRedPill). The y-axis shows the average salience of the words in the cluster; color represents the average cluster sentiment (green-positive, red-negative, yellow-neutral).

For women, beyond clusters labeled ‘sex’ or ‘slut’, which signal obvious biased language towards women through objectification, we can see several clusters with particular jargon such as ‘ons’ (one night stand), ‘chick’, ‘plate’ (a man or woman who is used for the purpose of sex), or ‘flirty’. On the other hand, the appearance of ‘commitment’ and ‘exclusive’ refers to other kinds of female-associated behavior discussed by members of /r/TheRedPill, which could be a valuable conceptual addition for social science research. For men, the picture is quite different. The top most salient clusters biased towards men stand out due to their focus on strength, power and violence, concurring with the general categories identified previously. Many of the men-biased top clusters, such as ‘inspiration’, ‘genius’, ‘leader’, and even ‘trump’ (found in the same cluster with other personal names associated with leadership, such as ‘obama’), represent role models related with strength and power, while others clearly allude to violence like ‘killing’, ‘beat’, etc. Finally, the most frequent cluster for men ‘redpill’ also contains the word ‘pua’ (referring to pick up artists, or to the controversial PUA group Real Social Dynamics), which are also considered masculinist role models in Manosphere discourse [20].

Our findings in the analysis – i.e., that women are objectified whilst men engage in articulations of aggrieved manhood – are in line with and confirm qualitative studies from the social sciences on /r/TheRedPill [20, 34], while also offering additional suggestions regarding the community’s discursive particularities. This suggests that our method finds relevant conceptual biases, and that it enables an innovative exploration of the relations between words and concepts, as constructed by the community itself. We further evaluate the validity of our method in a quantitative way in Section 6.3.

5.2 /r/Atheism

The /r/Atheism subreddit is a large community that calls itself ‘the web’s largest atheist forum’, on which ‘[a]ll topics related to atheism, agnosticism and secular living are welcome’, with suspected biases towards religions. After applying our method considering nouns, adjectives and verbs, to /r/Atheism to discover biases towards Islam and Christianity (the two largest religions), we obtained the most salient words towards both attribute concepts, with sizes 516 and 381 respectively. These were then clustered into 188 and 178 clusters, with a maximum of 19 and 17 words, a minimum of 1 word, and a mean of 2.74 and 2.14 words per cluster with a standard deviation of 3.03 and 2.10 for Islam and Christianity, respectively. The WEATs performed were all with p-values ranging from to .

Figure 3 shows the distribution of the semantic categories among conceptual biases for Islam and Christianity. The figure shows a clear difference between the two attribute concepts Islam and Christianity. The conceptual biases for Islam are categorised in normative terms such as Warfare (20.8%), Calm/Violent/Angry (14.3%), and Crime (10.4%), together with names (aggregating Geographical names, Personal names and Other proper names, and adding up to 39% of all conceptual biases for Islam; more on this later in the detailed analysis). Clusters about Christianity, on the other hand, fall mainly under the more descriptive and generic category of Religion (with almost half of the conceptual biases for Christianity being tagged with this semantic label), followed by Personal names (13.1%), Speech acts (9.3%) and General Ethics (6.5%). These broad groupings suggest a difference in evaluative orientation towards the two religions.

A more detailed view is shown in Figure 4, which compares the top-5 clusters for Islam and Christianity, ranked with the different rankings defined in Section 4.2.2. In each figure, Islam-biased clusters are shown on top and Christian-biased clusters on the bottom, the y-axis corresponds with average salience of the cluster, size corresponds with frequency (cluster ‘evolution’ is the most frequent in /r/Atheism, with more than 259K hits), and color with average sentiment. Observing the top most frequent biased clusters for both religions in the frequency ranking (), the most frequent clusters biased towards Christianity contain general doctrinal concepts (e.g. ‘heaven’, ‘sin’, ‘teachings’). Half of the top 5 most frequent clusters biased towards Islam, however, are related to violence (‘violence’, ‘offensive’, ‘attack’). Moreover, the ‘violence’ cluster contains other strongly stereotyped terms such as ‘terrorism’, ‘jihad’, and ‘extremism’, suggesting anti-Muslim sentiments and Islamophobia. Further proof of this would need to be further substantiated through closer inspection of the context of these terms, but our approach does produce striking differences.

When we look at the strongest-biased clusters, we see that personal nouns describe the majority of the clusters biased towards Islam, such as ‘ali’, ‘abdul’, and ‘omar’. These refer to public figures associated with socio-political issues of Islam. The cluster ‘ali’ also includes the terms ‘hirsi’ and ‘ayaan’, thus referring to activist Ayaan Hirsi Ali, who is known for her critical stance on Islam and practices such as forced marriage and honor violence. The cluster ‘omar’ also includes the terms ‘sheikh’ and ‘ahmed’, which likely refer to Ahmed Omar Saeed Sheikh, a British militant who was found guilty for the 1994 kidnappings of Western tourists in India. These names, then, signal a similar concern for issues of hostility and violence we saw in the previous ranking. For Christianity, most of the clusters again refer to more generic concepts about religion such as ‘divinity’, or ‘nazareth’. This could indicate that the discourse on /r/Atheism, when dealing with Islam, revolves more around particular people and the news events they appear in, whereas Christian discourse is less topical and political, and revolves around theological concepts and concerns.

Finally, the distribution of sentiment across all clusters biased towards Islam is clearly more negative than Christian-biased clusters, and gives an idea on some of the most common biased concepts associated with both religions in /r/Atheism. The most negative clusters biased towards Islam are relatively frequent, biased and closely related (again) with violence, such as ‘violent’, ‘attacks’ or ‘insult’. On top of that, we only find one positive cluster (‘honour’) for Islam (shown in ranking), in contrast to the various positive clusters biased towards Christianity. Clustered words biased towards Christianity are slightly more positive, and refer to religious terms of reward or punishment, such as ‘divinity’, ‘heaven’, ‘gift’ and ‘sin’. In general, the results suggest broad socio-cultural perceptions and stereotypes that characterize the discourse in /r/atheism community and that frequently associate Islam-biased clusters to negative connotations in contrast to Christian-biased concepts.

Fig. 3: Frequency of semantic categories for conceptual biases towards Islam (left) and Christianity (right) in /r/Atheism.
Fig. 4: Top-5 clusters, labelled with their most frequent word, biased towards Islam and Christianity in /r/Atheism, ranked by most frequent (), strong (), sentimentally positive (), and sentimentally negative (-) bias.

6 Evaluation

In this section, we provide an extensive evaluation of our method considering different dimensions: the stability of the biases discovered, the influence the model and learning parameters may have on them, and the validity of the biases discovered. We particularly analyze the stability of our models in Section 6.1, and the importance of the different parameters used for our method and when training the embedding models in Section 6.2. We finally evaluate our approach on discovering conceptual biases in online communities in Section 6.3, by comparing the conceptual biases discovered by our method with the arbitrary, predefined biases attested by previous work in the Google News pre-trained model.

6.1 Stability

Word embeddings may be unstable, particularly when trained with smaller corpora, so that small changes during training (e.g. new data) may result in different vector descriptions for the same word [2, 43]. This could imply a problem for our method to analyze biases, since different vector descriptions could result in different sets of conceptual biases discovered and therefore influence the final analysis of biases of the online community. Although our method provides some stability mechanisms, such as considering the frequency of words and aggregating them into semantically-similar clusters, it is based on the same vector descriptions.

We tested the stability of the biases found in Sections 5.1 & 5.2, following [2], by training four new embedding models for each of the two datasets explored, randomly selecting 50% of the comments of the original models to reduce the training corpora (increasing the chance for instability), and using the same preprocessing and model parameters stated before. This resulted in eight new models trained with a random half of the comments: for /r/TheRedPill, and for /r/Atheism. For each of the new models, we applied our method as before.

Fig. 5: Overlap coefficient between the 4 bootstrapped models for /r/TheRedPill ( to ), and the original model , for women () and men ().

Figure 5 shows the overlap coefficient [49], also known as Szymkiewicz–Simpson, of the sets of semantic categories for the 4 new models for /r/TheRedPill ( to ) and its original model (), per attribute concepts women and men (shown as superscripts), used by more than 1% of the clusters to improve readability. The higher the overlap, the closer to one, the lower the overlap, the closer to zero. The figure shows that the overlap between women- and men-biased semantic tags is small, meaning that there is a clear difference between the semantic categories used to label the concepts biased towards both attribute concepts in all models. In addition, and when considering women and men attribute concepts per separate, the set of most frequent tags have an average overlap of 0.83 for both women and men with the original model . That indicates that a very similar set of semantic tags was used to label women-biased concepts (and men-biased concepts, separately) across all models. Finally, the set of biases towards the same attribute concept are very similar in all new models (), with an average label intersection of 0.8 for women (almost 41 out of 51 of the most frequent labels biased towards women are shared between all models on average), and 0.67 for men.

Fig. 6: Overlap coefficient between the 4 bootstrapped models for /r/Atheism ( to ), and the original model , for Islam () and Christianity ().

Figure 6 shows the overlap coefficient (also known as Szymkiewicz–Simpson) of the sets of category tags for the 4 new models for /r/atheism ( to ) and its original model(), per attribute concept Islam () and Christian () (shown as superscripts), used by more than 1% of the clusters to improve readability. The higher the overlap, the closer to one (yellow color), the lower the overlap, the closer to zero (blue colour). The figure shows that the overlap between Islam and Christian biased labels is small, keeping the differences between them across models. On the other hand, the overlap coefficient between biases towards the same attribute concept is large, indicating that our approach is able to consistently identify similar conceptual biases towards Islam and Christian across models in /r/Atheism. In fact, the average overlap of conceptual biases towards Islam is 0.85, and 0.75 for Christian, which means that, on average, the models share 19 (out of 22) and 22 (out of 29) conceptual clusters towards Islam and Christian, respectively. Therefore, given the results obtained, we can conclude that our method is able to pick up consistent biases, mitigating many stability issues associated with word embeddings.

6.2 Parameter Influence

We now provide a more detailed analysis on the effects that different training parameters have on the models created and the final set of discovered conceptual biases to complement the experiments on stability presented above. For this, and due to a lack of space, we focus on the /r/TheRedPill community. Recall that /r/TheRedPill has the least number of unique words and the highest vocabulary density among the two communities explored, so it is the most prone to suffer from stability issues that can cause differences in the final sets of conceptual biases discovered given changes in the dataset and or the parameters used.

Fig. 7: Overlap coefficient between the set of conceptual biases of the different models for /r/TheRedPill.

In order to study the effect of all the parameters used, we trained five new models and ran nine new executions with different: i) training parameters, varying the size of the training windows, epochs, and embedding dimensions; and ii) varying the methodology parameters, such as using different values of for clustering, and different values of used to determine the salience thresholds to select the most relevant words to discover biases from (see Section 4.1 for details about both parameters and how our method uses them). When training the new models, only one parameter was changed at a time to study its influence ceteris paribus. That is, all of the other parameters were set to the default values used to train the original /r/TheRedPill model (using a window size of 4, embedding dimension of 200, 100 epochs, , and using a salience threshold of standard deviations – see Section 5 for all the details). The specific ranges of the parameters used in the different executions ran are presented below:

  • Window sizes (w): We trained two new models, and , utilizing window sizes of 3 and 5 (instead of the window size of 4 used in the original models), respectively, to evaluate the effect of window sizes in the resulting set of discovered conceptual biases.

  • Embedding dimension (d): Models , and , were trained with 100 and 300 dimensions (instead of 200), respectively, to study the effect of embedding dimensionality when discovering conceptual biases.

  • Epochs (e): Model was trained using 200 epochs (instead of the 100 epochs of the original model), to study the effect of a longer training period.

  • Cluster repetitions (): Models and were trained using the same training parameters as the original model, but using a of 100 and 300, cluster iterations, to select the partition with higher silhouette score in order to study its effect in the final set of conceptual biases (instead of the used originally in Section 5).

  • Salience thresholds (s): Models and were trained with the same parameters as the original model, but using and standard deviations (instead of the used originally in Section 5) to select the set of most salient words towards each attribute concept. The salience threshold determines the quantity of salient words considered for the analysis, and plays an important role when determining the less frequent conceptual biases of the community: a smaller salient threshold would consider more words to analyze and would thus provide broader coverage of a community’s biases, whereas a bigger salient threshold would only return the most salient part of the vocabulary. As such, different implementation scenarios call for different salience thresholds. For instance, a general overview of a community’s bias would require a small one, whereas tracing extreme, less frequent cases of hate speech may require bigger values.

Figure 7 shows the overlapping coefficient between the sets of frequent conceptual biases for women (represented with the superscript ) and men (represented with the superscript ) between the different models for /r/TheRedPill6 and the original model presented in Section 5.1, represented as .

The results show that: i) an important overlap coefficient among the sets of biases towards the same gender for both women and men, indicating that biases towards women (and men) are similar among the different models, and 2) a strong difference between women and men biases in all models, indicating that the sets of discovered conceptual biases towards the two genders are different in /r/TheRedPill. The most frequent conceptual biases towards women have an average overlap coefficient of 0.67, meaning that all models share on average between 9 and 10 labels (out of 14), while men biased clusters obtain an average overlap coefficient of 0.82, meaning that they share, on average, between 11 and 12 out of the 14 most frequent biases for men. In fact, on a closer look at the specific biases, the most frequent conceptual biases repeat regardless of the model, including ’Relationship: Intimate/sexual’ and ’Judgement of appearance (pretty etc.)’ for women, and ’Power, organizing’ and ’Warfare, defence and the army; Weapons’ for men.

Focusing now on the influence of specific parameters, the results suggest that, among the different training parameters tested (training epochs, window sizes, and embedding dimensions), low embedding dimensionality () has the biggest impact when determining the set of conceptual biases of a community, coinciding with previous studies that suggest there is a sweet spot for the dimensionality of word vectors [33]. Even in this case, the differences between women and men are still very marked. Importantly, and as expected, changing the for the salience threshold also affects the resulting sets of conceptual biases as a consequence of clustering a different selection of salient words. Again, proportions between women and men are kept, but as expected the more salient words considered () the more differences with the original. As explained before, however, with lower values of we are including potentially less strong and frequent biases, so acts as a zoom in/out mechanism here, with higher values allowing to focus on the most salient (strong and frequent) biases of a community. Next, has little impact on the final set of biases when compared with the other models, especially the original, obtaining on average overlap coefficients higher than 0.83 for both women and men, and up to 1 when compared with the original model. The different window sizes () and epochs () also have little impact , with overlaps ranging from 0.71 to 0.91 when compared to the original model .

6.3 Validation

Finally, to validate our method, we apply it to the large, widely-studied Google News pre-trained model, and compare the results with predefined biases that had been attested by previous work in this model [18, 12]. The aim was to see whether our method would discover, among others, those predefined biases attested by prior work to validate that it finds relevant biases. The Google News model contains 300-dimensional vectors for 3 million words and phrases, trained on part of the US Google News dataset7. Previous research on this model attested the predefined gender biases and stereotypes that associate women with family and arts, and men with career, science and maths [18, 12].

We applied our method to the Google News pre-trained model, following the steps we describe in Section 4. We first estimate the salience of the words in the vocabulary of the Google News model and select the most salient words towards women and men, creating and with sizes 1545 and 1271, respectively. The partition with higher silhouette score, using = 200, clustered women-salient words in 508 clusters, and men-salient words in 552. Using WEAT to compare all women clusters with men clusters returns significant p-values for all comparison combinations, with p-values ranging between and .

We provide details of the biases discovered with our method in the Google News model in Appendix A. Here, we focus on the validation aspects, reporting whether our method also found, among all discovered biases, the arbitrary biases that had been attested by previous works. For this, we manually map the word sets used by previous work related to career, family, arts, science and maths to USAS semantic categories (see Appendix B for the complete mapping), and analyze the frequency of the discovered clusters tagged with them.

The semantic tags related to career are strongly biased towards men, as identified in previous works, with double the number of clusters for men (48) than women (25), containing words such as ‘boss’ (for men) and ‘chairwoman’ (for women). Family-related clusters are strongly biased towards women, with three times as many clusters for women (29) than for men (9). Words include references to ‘mothering’ and ‘brides’ (women), and also ‘dad’ (men). Arts is also strongly biased towards women, with 5 clusters for women compared to 1 cluster for men, including words such as ‘knitting’ and ‘crochet’ for women, and ‘blacksmith’ for men. Tags related to science and maths are not frequent among the set of most salient words in this model, but they are still more frequent among men-biased clusters than for women, with 3 and 1 clusters, respectively. This analysis shows that: i) our method discovers relevant biases, as it discovers similar biases attested by previous research; ii) as detailed in Appendix A, our method discovers additional biases related to gender in Google News not reported by previous work.

7 Conclusions

In this paper, we introduce and validate a data-driven methodology for discovering conceptual bias in linguistic corpora. Through the use of word embeddings and similarity metrics, which leverage the vocabulary used within specific communities, we are able to discover the biased concepts, in the form of clusters of words semantically related, towards different social groups when compared against each other. The resulting clusters abstract the inherent biases into more general (and stable) structures that allow a better understanding of the dispositions of these communities towards protected features. We discovered and analyzed gender and religion biases in two Reddit communities, and showed their stability. We also validated our method on the Google News dataset, confirming that it also discovers the arbitrary biases that had been attested by previous works.

Quantifying language biases through our approach has several societal advantages. As a general diagnostic, our method can help in understanding and measuring social problems and stereotypes towards certain populations in communities with more precision and clarity [1]. It can also promote discussions on the language used by particular communities. This is especially relevant considering the radicalisation of interest-based communities outside of mainstream culture [32]. Our approach allows us to discover a broad categorical overview of biases, as well as a detailed analysis of these biases, as expressed in a community’s own language. As such, we can trace language in cases where researchers do not know the specific linguistic forms employed in the community. This is all the more relevant given that online discourse communities are characterized by diachronic variability, with new users, topics and forms of dialect being introduced over time. As a bottom-up approach, our method can be used to monitor and account for these transformations. Finally, our method could underpin tools to help administrative bodies of web platforms to discover and trace biases in online communities to decide which do not conform to content policies.

It is finally important to note that, even though our method is automated, it is designed to be used with a human in the loop. This includes the need to consider in conjunction the more general and specific views provided by our method, and the wider context surrounding the conceptual biases found, which is important to adequately interpret them. For instance, in /r/TheRedPill, the cluster casual contains words that carry positive sentiment score, but these words are part of a discourse about having casual relationships with women in order to conquer as many of them as possible – objectifying them in the process. This is something that becomes clear when looking at the relative frequency of semantic categories discovered, with the most frequent ones including Relationship Intimate/Sexual, Appearance and Objects, among others. These findings do not offer conclusive evidence of stereotyping, but they can be valuable assets for the work of social scientists and industry experts who are dealing with language biases in their communities.

Code, Datasets and Tool

To facilitate follow-up work, all the code and datasets will be available here.8 The code repository contains detailed instructions to replicate the experiments in the paper. Also, an interactive demo of our tool is here.9


The research reported in this article was funded by the EPSRC under grant EP/R033188/1, as part of Discovering and Attesting Digital Discrimination (DADD) –

Appendix A Google News Analysis

Figures 8 and 9 show the general distribution of conceptual biases among all clusters biased towards women and men, respectively, showing only labels most frequent than 1% of the total label frequency, and ignoring those clusters without a conceptual label for both women (273) and men (260). The general analysis of the community’s conceptual biases shows that some of the most common biases towards women in Google News are related with physical appearance (such as Clothes, Anatomy and Physiology, Personal Care , and Judgment of Appearance) and relationships. On the other hand, the most frequent men-biased labels are related with Power, Violence, Religion and sports.

Fig. 8: Conceptual biases in GoogleNews for Women.
Fig. 9: Conceptual biases in GoogleNews for Men.

Figures 10 and 11 show the top 5 clusters for women and men in Google News by rank. Women-biased clusters are shown on top and men-biased clusters on the bottom, the y-axis corresponds with average salience of the cluster, size corresponds with frequency, and colour with average sentiment. Although we do not have space for details, one can see more in the language of the community the clusters relating to the Just to give a few examples, women-biased clusters are often related with physical appearance (for instance clusters ‘modelling’, ‘lipsticks’, ‘beautiful’, or ‘cutest’), while men-biased clusters are often related with leadership and strength (e.g. ‘hero’, ‘duty’, or ‘wonderboy’), with the most negative clusters towards men are related to sports, being ‘hooliganism’, and ‘penalty’.

Fig. 10: Top-5 and clusters in GoogleNews.
Fig. 11: Top-5 and clusters in GoogleNews.

Appendix B Attribute and Other Sets

/r/TheRedPill attribute concepts From [36]. Female (women): sister, female, woman, girl, daughter, she, hers, her. Male (men): brother, male, man, boy, son, he, his, him.

/r/Atheism attribute concepts From [18]. Islam words: allah, ramadan, turban, emir, salaam, sunni, koran, imam, sultan, prophet, veil, ayatollah, shiite, mosque, islam, sheik, muslim, muhammad. Christianity words: baptism, messiah, catholicism, resurrection, christianity, salvation, protestant, gospel, trinity, jesus, christ, christian, cross, catholic, church

Google News attribute and target sets From [18]. Women: sister, female, woman, girl, daughter, she, hers, her. Men: brother, male, man, boy, son, he, his, him. Career words: executive, management, professional, corporation, salary, office, business, career. Family: home, parents, children, family, cousins, marriage, wedding, relatives. Math: math, algebra, geometry, calculus, equations, computation, numbers, addition. Arts: poetry, art, sculpture, dance, literature, novel, symphony, drama. Science: science, technology, physics , chemistry, Einstein, NASA, experiment, astronomy.

Google News set of USAS labels related with WEAT experiments: Career: I Money & commerce in industry, S7 Power, organizing. Family: S4 Kin, S2 People. Arts: C1 Arts and crafts. Science: Y Science and technology in general. Mathematics: N2 Mathematics.


  1. Note that there are many alternative bias definitions in the literature [53, 4], such as the direct bias measure [9], the relative norm bias metric [18], and others, all with similar results as shown by previous research [18, 4].
  2. The reason to use the frequency rank instead of raw frequencies is that, since word frequencies are known to follow the Zipf’s law [35], the difference in frequency is large between the most frequent words, which could then skew saliency towards frequency rather than having a balance between strength and frequency of the biases. Therefore, we use the word frequency rank instead of raw frequency to adequately smooth its importance in the equation. Note that this also ensures that when we select the most salient words later, we discard those that may have very similar frequencies but different ranks at the tail of the frequency distribution.
  3. We use silhouette for its simplicity and visual clues, but our method is agnostic to the clustering algorithm and other alternatives, such as [23, 19], could also be used in conjunction with our method.
  4. Using word2vec allows us to validate our method against the widely-used word2vec Google News pre-trained model (Section 6). However, our method can easily be extended to other models like ELMo [38] or BERT [14].
  5. Note that for the sake of clarity and readability, only categories tagging at least 2% of the clusters are shown in the distribution.
  6. Note that for the sake of clarity, we only considered labels used by more than 1% of the clusters in each partition, selecting 14 labels for each model and attribute concept on average, in order to compare relatively meaningful and frequent biases in the community.
  8. Placeholder for github repository


  1. R. Abebe, S. Barocas, J. Kleinberg, K. Levy, M. Raghavan and D. G. Robinson (2020) Roles for computing in social change. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT), pp. 252–260. Cited by: §1, §2, §4.1, §7.
  2. M. Antoniak and D. Mimno (2018) Evaluating the Stability of Embedding-based Word Similarities. Transactions of the Association for Computational Linguistics 6, pp. 107–119. Cited by: §4.1, §6.1, §6.1.
  3. X. F. Aran, J. M. Such and N. Criado (2019) Attesting biases and discrimination using language semantics. In Responsible Artificial Intelligence Agents workshop of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), External Links: 1909.04386 Cited by: §1.
  4. P. Badilla, F. Bravo-Marquez and J. Pérez (2020) WEFE: the word embeddings fairness evaluation framework. In International Joint Conference on Artificial Intelligence (IJCAI), Cited by: footnote 1.
  5. S. Barocas, K. Crawford, A. Shapiro and H. Wallach (2017) The problem with bias: from allocative to representational harms in machine learning. special interest group for computing. Information and Society (SIGCIS). Cited by: §1.
  6. S. A. Basow and S. A. Basow (1992) Gender : stereotypes and roles. Brooks/Cole Pub. Co., Pacific Grove, Calif. (English). External Links: Link Cited by: §1.
  7. J. Baumgartner, S. Zannettou, B. Keegan, M. Squire and J. Blackburn (2020) The pushshift reddit dataset. arXiv preprint arXiv:2001.08435. Cited by: §5.
  8. M. H. Bodell, M. Arvidsson and M. Magnusson (2019) Interpretable word embeddings via informative priors. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6324–6330. Cited by: §2.
  9. T. Bolukbasi, K. Chang, J. Y. Zou, V. Saligrama and A. T. Kalai (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, pp. 4349–4357. Cited by: §1, §2, footnote 1.
  10. M. Brunet, C. Alkalay-Houlihan, A. Anderson and R. Zemel (2019) Understanding the origins of bias in word embeddings. In International Conference on Machine Learning, pp. 803–811. Cited by: §2.
  11. V. Bryson (2016) Feminist political theory. Macmillan International Higher Education. External Links: ISBN 1137439068 Cited by: §1.
  12. A. Caliskan, J. J. Bryson and A. Narayanan (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356 (6334), pp. 183–186. External Links: Document, 1608.07187, ISSN 10959203 Cited by: §1, §2, §4.1.3, §6.3.
  13. A. Creese and A. Blackledge (2019) Stereotypes and chronotopes: the peasant and the cosmopolitan in narratives about migration. Journal of Sociolinguistics. Cited by: §1, §2.
  14. J. Devlin, M. Chang, K. Lee and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: footnote 4.
  15. R. Feldman (2013) Techniques and applications for sentiment analysis. Communications of the ACM 56 (4), pp. 82–89. Cited by: item 3.
  16. X. Ferrer, T. van Nuenen, J. M. Such, M. Coté and N. Criado (2020) Bias and discrimination in ai: a cross-disciplinary perspective. arXiv preprint arXiv:2008.07309. Cited by: §1.
  17. X. Ferrer, T. van Nuenen, J. M. Such and N. Criado (2020) Discovering and categorising language biases in reddit. In International AAAI Conference on Web and Social Media (ICWSM 2021) (forthcoming), External Links: 2008.02754 Cited by: §2.
  18. N. Garg, L. Schiebinger, D. Jurafsky and J. Zou (2018) Word embeddings quantify 100 years of gender and ethnic stereotypes. PNAS 2018 115 (16), pp. E3635–E3644. Cited by: Appendix B, Appendix B, §1, §2, §5, §6.3, footnote 1.
  19. V. Garg and A. T. Kalai (2018) Supervising unsupervised learning. In Advances in Neural Information Processing Systems, pp. 4991–5001. Cited by: footnote 3.
  20. D. Ging (2019) Alphas, Betas, and Incels: Theorizing the Masculinities of the Manosphere. Men and Masculinities 22 (4), pp. 638–657. External Links: Document, ISSN 15526828 Cited by: §5.1, §5.1, §5.1.
  21. H. Gonen and Y. Goldberg (2019) Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 609–614. Cited by: §2.
  22. A. G. Greenwald, D. E. McGhee and J. L. Schwartz (1998) Measuring individual differences in implicit cognition: the implicit association test.. Journal of personality and social psychology 74 (6), pp. 1464. Cited by: §2.
  23. G. Hamerly and C. Elkan (2004) Learning the k in k-means. In Advances in neural information processing systems, pp. 281–288. Cited by: footnote 3.
  24. D. L. Hamilton and T. K. Trolier (1986) Stereotypes and stereotyping: An overview of the cognitive approach.. In Prejudice, discrimination, and racism., pp. 127–163. External Links: ISBN 0-12-221425-0 (Hardcover) Cited by: §1.
  25. J. Holmes and M. Meyerhoff (2008) The handbook of language and gender. Vol. 25, John Wiley & Sons, Hoboken. External Links: ISBN 0470756705 Cited by: §1.
  26. C. J. Hutto and E. Gilbert (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media, Cited by: §5.
  27. A. K. Kiefer and D. Sekaquaptewa (2007) Implicit stereotypes and women’s math performance: how implicit gender-math stereotypes influence women’s susceptibility to stereotype threat. Journal of experimental social psychology 43 (5), pp. 825–832. Cited by: §2.
  28. A. C. Kozlowski, M. Taddy and J. A. Evans (2019) The geometry of culture: analyzing the meanings of class through word embeddings. American Sociological Review 84 (5), pp. 905–949. Cited by: §1, §2.
  29. K. Kurita, N. Vyas, A. Pareek, A. W. Black and Y. Tsvetkov (2019) Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 166–172. Cited by: §2.
  30. J. LaViolette and B. Hogan (2019) Using platform signals for distinguishing discourses: The case of men’s rights and men’s liberation on Reddit. ICWSM 2019, pp. 323–334. Cited by: §5.1.
  31. T. Manzini, L. Y. Chong, A. W. Black and Y. Tsvetkov (2019) Black is to criminal as caucasian is to police: detecting and removing multiclass bias in word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 615–621. Cited by: §2.
  32. A. Marwick and R. Lewis (2017) Media Manipulation and Disinformation Online. Data & Society Research Institute, pp. 1–104. Cited by: §1, §2, §5.1, §7.
  33. O. Melamud, D. McClosky, S. Patwardhan and M. Bansal (2016) The role of context types and dimensionality in learning word embeddings. arXiv preprint arXiv:1601.00893. Cited by: §6.2.
  34. J. Mountford (2018) Topic Modeling The Red Pill. Social Sciences 7 (3), pp. 42. External Links: Document Cited by: §5.1, §5.1.
  35. M. E. Newman (2005) Power laws, pareto distributions and zipf’s law. Contemporary physics 46 (5), pp. 323–351. Cited by: footnote 2.
  36. B. A. Nosek, M. R. Banaji and A. G. Greenwald (2002) Harvesting implicit group attitudes and beliefs from a demonstration web site.. Group Dynamics: Theory, Research, and Practice 6 (1), pp. 101. Cited by: Appendix B, §5.
  37. O. Papakyriakopoulos, S. Hegelich, J. C. M. Serrano and F. Marco (2020) Bias in word embeddings. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 446–457. Cited by: §2.
  38. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer (2018) Deep contextualized word representations. In Proceedings of NAACL-HLT, pp. 2227–2237. Cited by: footnote 4.
  39. P. Rayson, D. Archer, S. Piao and A. M. McEnery (2004) The ucrel semantic analysis system.. In In proceedings of the workshop on Beyond Named Entity Recognition Semantic labelling for NLP tasks in association with 4th International Conference on Language Resources and Evaluation (LREC), pp. 7–12. Cited by: §4.2.1.
  40. P. Rayson (2008) From key words to key semantic domains. International journal of corpus linguistics 13 (4), pp. 519–549. Cited by: §4.2.2.
  41. P. J. Rousseeuw (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20, pp. 53–65. Cited by: §4.1.3.
  42. S. Sharoff, B. Babych, P. Rayson, O. Mudraya and S. Piao (2006) ASSIST: automated semantic assistance for translators. In Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations, pp. 139–142. Cited by: §4.2.1.
  43. D. Shiebler, L. Belli, J. Baxter, H. Xiong and A. Tayal (2018) Fighting redundancy and model decay with embeddings. arXiv preprint arXiv:1809.07703. Cited by: §6.1.
  44. A. Sutton, T. Lansdall-Welfare and N. Cristianini (2018) Biased embeddings from wild data: measuring, understanding and removing. In International Symposium on Intelligent Data Analysis, pp. 328–339. Cited by: §2.
  45. N. Swinger, M. De-Arteaga, N. T. Heffernan IV, M. D. Leiserson and A. T. Kalai (2019) What are the biases in my word embedding?. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 305–311. Cited by: §2.
  46. L. M. Tanczer (2016) Hacktivism and the male-only stereotype. New Media & Society 18 (8), pp. 1599–1615. Cited by: §1, §2.
  47. C. van Miltenburg (2016) Stereotyping and bias in the flickr30k dataset. In 11th workshop on multimodal corpora: computer vision and language processing, Cited by: §1.
  48. T. van Nuenen, X. Ferrer, J. M. Such and M. Coté (2020) Transparency for Whom? Assessing Discriminatory AI. IEEE Computer, pp. In press.. Cited by: §1.
  49. M. Vijaymeena and K. Kavitha (2016) A survey on similarity measures in text mining. Machine Learning and Applications: An International Journal 3 (2), pp. 19–28. Cited by: §6.1.
  50. Z. Watson (2016) Red Pill Men and Women, Reddit, And The Cult of Gender — Inverse. External Links: Link Cited by: §5.1, §5.
  51. A. Wilson and P. Rayson (1993) Automatic content analysis of spoken discourse: a report on work in progress. Corpus based computational linguistics, pp. 215–226. Cited by: §4.2.1.
  52. R. Zemel, Y. Wu, K. Swersky, T. Pitassi and C. Dwork (2013) Learning fair representations. In International Conference on Machine Learning, pp. 325–333. Cited by: §1.
  53. B. H. Zhang, B. Lemoine and M. Mitchell (2018) Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340. Cited by: §2, footnote 1.
  54. B. Zhao, J. Ondrich and J. Yinger (2006) Why do real estate brokers continue to discriminate? Evidence from the 2000 Housing Discrimination Study. Journal of Urban Economics 59 (3), pp. 394–419. External Links: Document, ISSN 00941190 Cited by: §1.
  55. J. Zhao, T. Wang, M. Yatskar, R. Cotterell, V. Ordonez and K. Chang (2019) Gender bias in contextualized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 629–634. Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description