Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Yusu Qian*
Tandon School
of Engineering
New York University
6 MetroTech Center
Brooklyn, NY, 11201
&Urwa Muaz*
Tandon School
of Engineering
New York University
6 MetroTech Center
Brooklyn, NY, 11201 Zhang
Center for
Data Science
New York University
60 Fifth Avenue
New York, NY, 10012
&Jae Won Hyun
Department of
Computer Science
New York University
251 Mercer St
New York, NY, 10012

Gender bias exists in natural language datasets which neural language models tend to learn, resulting in biased text generation. In this research, we propose a debiasing approach based on the loss function modification. We introduce a new term to the loss function which attempts to equalize the probabilities of male and female words in the output. Using an array of bias evaluation metrics, we provide empirical evidence that our approach successfully mitigates gender bias in language models without increasing perplexity by much. In comparison to existing debiasing strategies, data augmentation, and word embedding debiasing, our method performs better in several aspects, especially in reducing gender bias in occupation words. Finally, we introduce a combination of data augmentation and our approach, and show that it outperforms existing strategies in all bias evaluation metrics.

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Yusu Qian* Tandon School of Engineering New York University 6 MetroTech Center Brooklyn, NY, 11201                        Urwa Muaz* Tandon School of Engineering New York University 6 MetroTech Center Brooklyn, NY, 11201                        Ben Zhang Center for Data Science New York University 60 Fifth Avenue New York, NY, 10012                        Jae Won Hyun Department of Computer Science New York University 251 Mercer St New York, NY, 10012

1 Introduction

Natural Language Processing (NLP) models are shown to capture unwanted biases and stereotypes found in the training data which raise concerns about socioeconomic, ethnic and gender discrimination when these models are deployed for public use (Lu et al., 2018; Zhao et al., 2018).

There are numerous studies that identify algorithmic bias in NLP applications. Lapowsky (2018) showed ethnic bias in Google autocomplete suggestions whereas Lambrecht and Tucker (2018) found gender bias in advertisement delivery systems. Additionally, Zhao et al. (2018) demonstrated that coreference resolution systems exhibit gender bias.

Language modelling is a pivotal task in NLP with important downstream applications such as text generation (Sutskever et al., 2011). Recent studies by Lu et al. (2018) and Bordia and Bowman (2019) have shown that this task is vulnerable to gender bias in the training corpus. Two prior works focused on reducing bias in language modelling by data preprocessing (Lu et al., 2018) and word embedding debiasing (Bordia and Bowman, 2019). In this study, we investigate the efficacy of bias reduction during training by introducing a new loss function which encourages the language model to equalize the probabilities of predicting gendered word pairs like he and she. Although we recognize that gender is non-binary, for the purpose of this study, we focus on female and male words.

Our main contributions are summarized as follows: i) to our best knowledge, this study is the first one to investigate bias alleviation in text generation by direct modification of the loss function; ii) our new loss function effectively reduces gender bias in the language models during training by equalizing the probabilities of male and female words in the output; iii) we show that end-to-end debiasing of the language model can achieve word embedding debiasing; iv) we provide an interpretation of our results and draw a comparison to other existing debiasing methods. We show that our method, combined with an existing method, counterfactual data augmentation, achieves the best result and outperforms all existing methods.

2 Related Work

Recently, the study of bias in NLP applications has received increasing attention from researchers. Most relevant work in this domain can be broadly divided into two categories: word embedding debiasing and data debiasing by preprocessing.

Word Embedding Debiasing

Bolukbasi et al. (2016) introduced the idea of gender subspace as low dimensional space in an embedding that captures the gender information. Bolukbasi et al. (2016) and Zhao et al. (2017) defined gender bias as a projection of gender-neutral words on a gender subspace and removed bias by minimizing this projection. Gonen and Goldberg (2019) proved that bias removal techniques based on minimizing projection onto the gender space are insufficient. They showed that male and female stereotyped words cluster together even after such debiasing treatments. Thus, gender bias still remains in the embeddings and is easily recoverable.

Bordia and Bowman (2019) introduced a co-occurrence based metric to measure gender bias in texts and showed that the standard datasets used for language model training exhibit strong gender bias. They also showed that the models trained on these datasets amplify bias measured on the model-generated texts. Using the same definition of embedding gender bias as Bolukbasi et al. (2016), Bordia and Bowman (2019) introduced a regularization term that aims to minimize the projection of neutral words onto the gender subspace. Throughout this paper,we refer to this approach as REG. They found that REG reduces bias in the generated texts for some regularization coefficient values. We argue that this method has two shortcomings. First, the bias definition is shown to be incomplete by Gonen and Goldberg (2019) and secondly, an embedding is not the sole source of gender bias in downstream applications. Instead of explicit geometric debiasing of the word embedding, we implement a loss function that minimizes bias in the output and thus adjust the whole network accordingly. For each model, we analyze the generated word embedding to understand how it is affected by output debiasing.

Data Debiasing

Lu et al. (2018) showed that gender bias in coreference resolution and language modelling can be mitigated through a data augmentation technique that expands the corpus by swapping the gender pairs like he and she, or father and mother. They called this Counterfactual Data Augmentation (CDA) and concluded that it outperforms the word embedding debiasing strategy proposed by Bolukbasi et al. (2016). CDA doubles the size of the training data and increases time needed to train language models. In this study, we intend to reduce bias during training without requiring an additional data preprocessing step.

3 Methodology

3.1 Dataset

For the training data, we use Daily Mail news articles released by Hermann et al. (2015). This dataset is composed of 219,506 articles covering a diverse range of topics including business, sports, travel, etc., and is claimed to be biased and sensational (Bordia and Bowman, 2019). For manageability, we randomly subsample 5% of the text. The subsample has around 8.25 million tokens in total.

3.2 Language Model

We use a pre-trained 300-dimensional word embedding, GloVe, by Pennington et al. (2014). We apply random search to the hyperparameter tuning of the LSTM language model. The best hyperparameters are as follows: 2 hidden layers each with 300 units, a sequence length of 35, a learning rate of 20 with an annealing schedule of decay starting from 0.25 to 0.95, a dropout rate of 0.25 and a gradient clip of 0.25. We train our models for 150 epochs, use a batch size of 48, and set early stopping with a patience of 5.

3.3 Loss Function

Language models are usually trained using cross-entropy loss. Cross-entropy loss at time step is

where is the vocabulary, is the one hot vector of ground truth and indicates the output softmax probability of the model.

We introduce a loss term , which aims to equalize the predicted probabilities of gender pairs such as woman and man.

and are a set of corresponding gender pairs, is the size of the gender pairs set, and indicates the output softmax probability. We use gender pairs provided by Zhao et al. (2017). By considering only gender pairs we ensure that only gender information is neutralized and distribution over semantic concepts is not altered. For example, it will try to equalize the probabilities of congressman with congresswoman and actor with actress but distribution of congressman, congresswoman versus actor, actress will not be affected. Overall loss can be written as

where is a hyperparameter and is the corpus size. We observe that among the similar minima of the loss function, encourages the model to converge towards a minimum that exhibits the lowest gender bias.

3.4 Model Evaluation

Language models are evaluated using perplexity, which is a standard measure of performance for unseen data. For bias evaluation, we use an array of metrics to provide a holistic diagnosis of the model behavior under debiasing treatment. These metrics are discussed in detail below.










(a) Occupation bias conditioned on gendered words


doctor is








(b) Occupation bias conditioned on occupations
Table 1: Example templates of two types of occupation bias

3.4.1 Co-occurrence Bias

Co-occurrence bias is computed from the model-generated texts by comparing the occurrences of all gender-neutral words with female and male words. A word is considered to be biased towards a certain gender if it occurs more frequently with words of that gender. This definition was first used by Zhao et al. (2017) and later adapted by Bordia and Bowman (2019). Using the definition of gender bias similar to the one used by Bordia and Bowman (2019), we define gender bias as

where is a set of gender-neutral words, and is the occurrences of a word with words of gender in the same window. This score is designed to capture unequal co-occurrences of neutral words with male and female words. Co-occurrences are computed using a sliding window of size 10 extending equally in both directions. Furthermore, we only consider words that occur more than 20 times with gendered words to exclude random effects.

We also evaluate a normalized version of which we denote by conditional co-occurrence bias, . This is defined as


is less affected by the disparity in the general distribution of male and female words in the text. Since the disparity between the occurrences of the two is also a form of bias, we report the ratio of occurrence of male and female words, , as

3.4.2 Causal Bias

Another way of quantifying bias in NLP models is based on the idea of causal testing. The model is exposed to paired samples which differ only in one attribute (e.g. gender) and the disparity in the output is interpreted as bias related to that attribute. Zhao et al. (2018) and Lu et al. (2018) applied this method to measure bias in coreference resolution and Lu et al. (2018) also used it for evaluating gender bias in language modelling.

Following the approach similar to Lu et al. (2018), we limit this bias evaluation to a set of gender-neutral occupations. We create a list of sentences based on a set of templates. There are two sets of templates used for evaluating causal occupation bias (Table 1). The first set of templates is designed to measure how the probabilities of occupation words depend on the gender information in the seed. Below is an example of the first set of templates:

Here, the vertical bar separates the seed sequence that is fed into the language models from the target occupation, for which we observe the output softmax probability. We measure causal occupation bias conditioned on gender as

where is a set of gender-neutral occupations and is the size of the gender pairs set. For example, is the softmax probability of the word where the seed sequence is He is a. The second set of templates like below, aims to capture how the probabilities of gendered words depend on the occupation words in the seed.

Causal occupation bias conditioned on occupation is represented as

where is a set of gender-neutral occupations and is the size of the gender pairs set. For example, is the softmax probability of man where the seed sequence is The doctor is a.

We believe that both and contribute to gender bias in the model-generated texts. We also note that is more easily influenced by the general disparity in male and female word probabilities.

3.4.3 Word Embedding Bias

Our debiasing approach does not explicitly address the bias in the embedding layer. Therefore, we use gender-neutral occupations to measure the embedding bias to observe if debiasing the output layer also decreases the bias in the embedding. We define the embedding bias, , as the difference between the Euclidean distance of an occupation word to male words and the distance of the occupation word to the female counterparts. This definition captures the embedding bias described by Bolukbasi et al. (2016) as

where is a set of gender-neutral occupations, is the size of the gender pairs set and is the word-to-vector dictionary.

3.5 Existing Approaches

We apply CDA where we swap all the gendered words using a bidirectional dictionary of gender pairs described by Lu et al. (2018). This creates a dataset twice the size of the original data, with exactly the same contextual distributions for both genders and we use it to train the language models.

We also implement the bias regularization method of Bordia and Bowman (2019) which debiases the word embedding during language model training by minimizing the projection of neutral words on the gender axis. We use hyperparameter tuning to find the best regularization coefficient and report results from the model trained with this coefficient. We later refer to this strategy as REG.

Baseline 0.531 0.282 1.415 117.845 1.447 97.762 0.528
REG 0.381 0.329 1.028 114.438 1.861 108.740 0.373
CDA 0.208 0.149 1.037 117.976 0.703 56.82 0.268
0.492 0.245 1.445 118.585 0.111 9.306 0.077
0.459 0.208 1.463 118.713 0.013 2.326 0.018
0.312 0.173 1.252 120.344 0.000 1.159 0.006
0.226 0.151 1.096 119.792 0.001 1.448 0.002
0.218 0.153 1.049 120.973 0.000 0.999 0.002
0.221 0.157 1.020 123.248 0.000 0.471 0.000
+ CDA 0.205 0.145 1.012 117.971 0.000 0.153 0.000
Table 2: Evaluation results for models trained on Daily Mail and their generated texts

4 Experiments

After training the baseline model, we implement our loss function and tune for the hyperparameter. We test the existing debiasing approaches, CDA and REG, as well but since Bordia and Bowman (2019) reported that results fluctuate substantially with different REG regularization coefficients, we perform hyperparameter tuning and report the best results in Table 2. Additionally, we implement a combination of our loss function and CDA and tune for . Finally, bias evaluation is performed for all the trained models. Causal occupation bias is measured directly from the models using template datasets discussed above and co-occurrence bias is measured from the model-generated texts, which consist of 10,000 documents of 500 words each.

4.1 Dataset Bias

and of our chosen subset of Daily Mail stories are 0.340 and 0.213, respectively.

4.2 Results

Results for the experiments are listed in table 2. From measurements using the described bias metrics, our method effectively mitigates bias in language modelling without a significant increase in perplexity. At value of 1, it reduces by 58.95%, by 45.74%, by 100%, by 98.52% and by 98.98%. Compared to the results of CDA and REG, it achieves the best results in both occupation biases, and , and . We notice that all methods result in around 1, indicating that there are near equal amounts of female and male words in the generated texts. In our experiments we note that with increasing , the bias steadily decreases and perplexity tends to slightly increase. This indicates that there is a trade-off between bias and perplexity.

REG is not very effective in mitigating bias when compared to other methods, and fails to achieve the best result in any of the bias metrics that we used. But REG results in the best perplexity and even does better than the baseline model in this respect. This indicates that REG has a slight regularization effect. Additionally, it is interesting to note that our loss function outperforms REG in even though REG explicitly aims to reduce gender bias in the embeddings. Although our method does not explicitly attempt geometric debiasing of the word embedding, the results show that it results in the most debiased embedding as compared to other methods. Furthermore, Gonen and Goldberg (2019) emphasizes that geometric gender bias in word embeddings is not completely understood and existing word embedding debiasing strategies are insufficient. Our approach provides an appealing end-to-end solution for model debiasing without relying on any measure of bias in the word embedding. We believe this concept is generalizable to other NLP applications.

CDA achieves slightly better results for co-occurrence biases, and , and results in a better perplexity. With a marginal difference, our results are comparable to those of CDA for both co-occurrence biases and both models seem to have similar bias mitigation effects. However, our method does not require a data augmentation step and allows training of an unbiased model directly from biased datasets. For this reason, it also requires less time to train than CDA since its training data has a smaller size without data augmentation. Furthermore, CDA fails to effectively mitigate occupation bias when compared to our approach. Although the training data for CDA does not contain gender bias, the model still exhibits some gender bias when measured with our causal occupation bias metrics. This reinforces the concept that some model-level constraints are essential to debiasing a model and dataset debiasing alone cannot be trusted.

Finally, we note that the combination of CDA and our loss function outperforms all the methods in all measures of biases without compromising perplexity. Therefore, it can be argued that a cascade of these approaches can be used to optimally debias the language models.

5 Conclusion and Discussion

In this research, we propose a new approach for mitigating gender bias in neural language models and empirically show that our method outperforms existing methods. Our research also highlights the fact that debiasing the model with bias penalties in the loss function is an effective method. We emphasize that loss function based debiasing is powerful and generalizable to other downstream NLP applications. The research also reinforces the idea that geometric debiasing of the word embedding is not a complete solution for debiasing the downstream applications but encourages end-to-end approaches to debiasing.

Future work includes designing a context-aware version of our loss function which can distinguish between the unbiased and biased mentions of the gendered words and only penalize the biased version, as well as mitigating racial bias which brings more challenges.

6 Acknowledgment

We are grateful to Sam Bowman for helpful advice, Shikha Bordia, Cuiying Yang, Gang Qian, Xiyu Miao, Qianyi Fan, Tian Liu, and Stanislav Sobolevsky for discussions, and reviewers for detailed feedback.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description