CRAB: Class Representation Attentive BERT for Hate Speech Identification in Social Media
In recent years, social media platforms have hosted an explosion of hate speech and objectionable content. The urgent need for effective automatic hate speech detection models have drawn remarkable investment from companies and researchers. Social media posts are generally short and their semantics could drastically be altered by even a single token. Thus, it is crucial for this task to learn context-aware input representations, and consider relevancy scores between input embeddings and class representations as an additional signal. To accommodate these needs, this paper introduces CRAB (Class Representation Attentive BERT), a neural model for detecting hate speech in social media. The model benefits from two semantic representations: (i) trainable token-wise and sentence-wise class representations, and (ii) contextualized input embeddings from state-of-the-art BERT encoder. To investigate effectiveness of CRAB, we train our model on Twitter data and compare it against strong baselines. Our results show that CRAB achieves 1.89% relative improved Macro-averaged F1 over state-of-the-art baseline. The results of this research open an opportunity for the future research on automated abusive behavior detection in social media.
1 Introduction and Related Work
Twitter is one of the popular social media platforms in which people post several hundred million tweets on daily basis. Twitter similar to other existing social networks, greatly suffers from the range of violence, hate speech and human right abuse imposed on specific groups or individuals Founta et al. (2018). Hence, it is imperative to protect the users by taking pro-active steps and develop algorithms to automatically identify hate messages, and prevent them from spreading.
Essentially, there are two steps associated with automatic hate speech detection task: (i) Annotated data collection (ii) Model development. For the first step, leveraging crowd-sourcing is one of the most common approaches; for the second one, researchers have leveraged variety of Natural Language Processing (NLP) techniques. Pereira-Kohatsu et al. gathered annotated tweets through crowd-sourcing and introduced a social network analyzer which allows researchers monitor hate speech in tweets. The authors formulated abusive tweet identification as a text classification problem and developed several NLP techniques to accomplish this goal. Similarly, in this paper, we tackle hate speech detection task in the setting of text classification task.
Text classification is one of the fundamental NLP tasks used in social media analysis. Traditional text classification task mainly relies on vector space models created on hand-crafted features such as Term Frequency-Inverse Document Frequency (TF-IDF) and n-gram Zhang et al. (2011); Wang and Manning (2012). Gaydhani et al. applied TF-IDF feature extraction technique followed by traditional machine learning models on tweets to detect hate speech. Although these techniques have been effective in social media mining, they suffer from vocabulary mismatch and ambiguity Croft et al. (2010). Later, deep neural models such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) Kim (2014); Zhang et al. (2015); Zahiri and Choi (2017); Ravuri and Stolcke (2015) have mitigated above-mentioned shortcomings by learning dense text representation with minimal hyper-parameter tuning. In the context of social media analysis, Gambäck and Sikdar utilized different variations of CNN based models to assign each tweet with one of the predefined labels. One crucial limitation in these classifiers is they do not take class representations into consideration. Du et al. resolved this limitation by introducing interaction mechanism which computes matching score between encoded input Qiao et al. (2018) and classes, then calculated scores were utilized to predict the class.
More recently, Vaswani et al. developed the transformer model by using stacked self-attention and fully connected layers. The authors demonstrated the effectiveness of this model in capturing long term dependencies from text sequences similar to recurrent networks. However, owing to it’s feed-forward architecture it could be trained in a more efficient way compared to typical RNNs. Inspired by this work, Devlin et al. proposed BERT language model. BERT is a multi-layer bi-directional transformer trained on a very large-scaled unlabeled corpora to learn text representations. Fine-tuned BERT encoder has improved text classification task performance by a substantial margin on the benchmark dataset Sun et al. (2019).
Although, state-of-the-art transformer-based models have shown promising results in text classification, similar to CNN and RNN based models, they do not incorporate information embedded in the class representations. Inspired by Du et al., we introduce CRAB, an interaction-based classifier which relies on the similarity scores between encoded input and class representations. Our framework embraces three parts: input representation layer, class representation layers, and aggregation layer. The input representation layer projects tweets into token-level and sentence-level dense embedding spaces. Class representation layers map classes into latent representations and let the network interact with the encoded input and determine the similarity scores between them. In other words, these layers are trained to learn the matching scores between classes and each part of the input in an end to end fashion. Finally, aggregation layer combines the matching scores computed in previous layers and infer the class label. In this paper, we use the annotated Twitter data gathered by Founta et al. (2018). In summary, the contributions of this paper are as follows:
A new model which leverages matching scores between trainable class representations and encoded input data to detect hate speech.
We perform extensive experiments on Twitter data to show our proposed model outperforms several strong baselines.
2 Model Overview
In this section we introduce our proposed model, CRAB. The overall architecture of CRAB is illustrated in Figure 1. The objective of this model is to take the tweets and classify them into one of predefined classes (Multi-class Classification). More specifically, given the training set ( is the n-th training example and is its corresponding label structured as a one hot vector), the goal of the classifier is to learn such that empirical risk is minimized by N observations:
Loss function L is a continuous function that penalizes training error. In this work, we employ cross entropy loss function. CRAB is comprised of: (i) representation layer (section 2.1) (ii) token-wise class representation layer (section 2.2) (iii) sentence-wise class representation layer (section 2.3) (iv) aggregation layer (section 2.4).
2.1 Input Representation Layer
The purpose of representation learning layer is to generate a contextualized fixed-sized embedding vectors for the tweets. We utilize BERT encoder to vectorize the input data. BERT can provide a more sophisticated text representation by learning from left and right of a token’s context in all layers. CRAB takes all the BERT embedding generated from final block of the transformer. More precisely, this layer encodes the input tweets into matrix where is the embedding vector for the i-th token, N is input text length, and is embedding dimension.
2.2 Token-wise Class Representation Layer
This layer is devised to allow the neural model to learn how the encoded classes should attend to every single token in the input. To this end, we introduce a multi-head token-wise class representation network A. Each head in this block learns the interactions between the encoded input tokens and classes independently. This layer takes as an input; is concatenation of all final layer’s hidden states except the hidden state corresponds to the first token (special token [CLS]). Similar to previous studies Du et al. (2019), this layer calculates the matching scores between classes and input data using dot product operation:
Where , and ,; number of classes is shown as c, and m is number of class representation heads.
2.3 Sentence-wise Class Representation Layer
Sentence-wise trainable class representation matrix is depicted as in Figure 1. Given the sentence embedding, as an input to this layer, S is tuned to learn sentence-level class representation during the training process. Similar to sub-section 2.2, here, we also apply dot product to compute matching scores (shown as w) between the sentence embedding e, and sentence-wise class representation S:
|RNN Lai et al. (2015)||77.10||64.40||63.90||64.84|
|CNN Kim (2014)||77.17||64.04||62.59||65.95|
|EXAM Du et al. (2019)||77.34||62.75||60.80||66.36|
|CRAB-4 w/o SA (Ours)||81.86||69.97||68.31||72.27|
|CRAB-4 (Ours)||82.03 (RI: +0.9%)||70.20 (RI: +1.89%)||68.56 (RI: +1.57%)||72.54 (RI: +1.81%)|
2.4 Aggregation Layer
where is LeakyReLu activation function Xu et al. (2015) and indicates concatenation operator. Also, multi-dimensional array is shown as in Figure 1. In above equations, and are trainable weights. Finally, as denoted in equation 8, token-level and sentence-level matching scores are combined and passed to a softmax layer to predict the class labels:
In this section, we describe our dataset, pre-processing steps, metrics, baseline models, hyper-parameter settings, and experimental procedures utilized to evaluate CRAB.
We trained our model with tweets collected by Founta et al.
3.2 Data Pre-processing
Tweets are full of emojis, emoticons, hashtags, and website links. To further clean tweets and yet preserve the useful information as much as possible, we develop a pipeline which maps emotionally similar emojis and emojicons into same special tokens. Likewise, the website links are also replaced with a special token. Generally, tweets are short and people do not follow grammatical rules in them. Therefore, we do not apply stop words removal, stemming or lemmatization as these techniques are often imperfect and could lead to information loss.
3.3 Setup Details
BERT was initialized with the pre-trained weights and we fine-tuned them for our downstream task during the training process. We chose batch size of 32 and number of neurons in the transformations and were 64 and 128 respectively. Model was implemented by Pytorch Paszke et al. (2017) with a single NVIDIA P100 GPU.
3.4 Baselines and Metrics
As listed in Table 2, we compare our approach with several baselines to evaluate effectiveness of our proposed model. The input to Naive Bayes classifier is TF-IDF Zhang et al. (2011) feature vectors. As part of our neural model baselines we employ CNN, RNN as well as EXAM Du et al. (2019); the word embedding size of all aforementioned neural models are set to 200. We also include classification performances of feeding BERT’s special token [CLS] embedding (BERT-CLS) and average pooled BERT’s embedding vectors (BERT-Avg-P) to a linear classifier. In both cases the outputs of last hidden layer would be sent to the classifiers. Given that our class distribution is imbalanced, we consider Macro-Averaged F1, Precision, Recall as well as Accuracy to quantify the prediction performance of the models.
3.5 Empirical Results and Discussion
Table 2 shows performances of different variations of our model, CRAB-n, as well as the baselines. Bold numbers denote the best performance and underlined numbers are the best performance among the baselines only. Letter n in CRAB-n indicates number of heads in token-wise class representation layer. During our experiment we tried various . To evaluate effectiveness of sentence-wise class representation layer, we also report performance of CRAB-4 with this layer removed (depicted as CRAB-4 w/o SA in Table 2). As shown in Table 2, CRAB-4 consistently outperformed all baselines and other CRAB variations. Our model, CRAB-4, achieved 1.89% relative improved Macro-averaged F1 score and 0.9% relative improved accuracy compared to BERT-Avg-P and BERT-CLS respectively. In terms of Macro-averaged precision and recall, there is relative improvement of 1.81% and 1.57% over BERT-Avg-P, respectively. To further analyze performance of CRAB-4 and understand the way it handles imbalanced classes, we conducted error analysis on each class. Compared to our baseline BERT models, we noticed CRAB-4 obtained 1% Macro-F1 boost in the first two major classes and for the two minor classes it gained 2% increase. We hypothesize that, minor classes benefited the most from this architecture. It is worth mentioning our model can be extended to multi-label classification by simply replacing softmax with sigmoid layer. We emphasize that in our proposed architecture, input representation layer is not just limited to BERT and any other form of transformer-based encoder can be used instead.
4 Conclusion and Future Work
In this paper, we introduced CRAB, a neural model to identify hate speech from Twitter data. CRAB incorporates both word and class information from tweets into the hate speech identification process. CRAB significantly outperformed the state-of-the-art BERT-based baseline by 1.89% on relative Macro F1. Our future work includes evaluating effectiveness of this model in the extreme multi-class and multi-label problems and adopting CRAB for other online abusive behavior detection tasks.
- Search engines: information retrieval in practice. Vol. 520, Addison-Wesley Reading. Cited by: §1.
- Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
- Explicit interaction model towards text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 6359–6366. Cited by: §1, §1, §2.2, Table 2, §3.4.
- Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §1, §1, §3.1.
- Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online, pp. 85–90. Cited by: §1.
- Detecting hate speech and offensive language on twitter using machine learning: an n-gram and tfidf based approach. arXiv preprint arXiv:1809.08651. Cited by: §1.
- Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. Cited by: §1, Table 2.
- Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence, Cited by: Table 2.
- Automatic differentiation in pytorch. Cited by: §3.3.
- Detecting and monitoring hate speech in twitter. Sensors 19 (21), pp. 4654. Cited by: §1.
- A new method of region embedding for text classification.. In ICLR, Cited by: §1.
- Recurrent neural network and lstm models for lexical utterance classification. In Sixteenth Annual Conference of the International Speech Communication Association, Cited by: §1.
- How to fine-tune bert for text classification?. In China National Conference on Chinese Computational Linguistics, pp. 194–206. Cited by: §1.
- Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §1.
- Baselines and bigrams: simple, good sentiment and topic classification. In Proceedings of the 50th annual meeting of the association for computational linguistics: Short papers-volume 2, pp. 90–94. Cited by: §1.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. Cited by: §2.4.
- Emotion detection on tv show transcripts with sequence-based convolutional neural networks. arXiv preprint arXiv:1708.04299. Cited by: §1.
- A comparative study of tf* idf, lsi and multi-words for text classification. Expert Systems with Applications 38 (3), pp. 2758–2765. Cited by: §1, §3.4.
- Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657. Cited by: §1.