Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach
Toxic online content has become a major issue in today’s world due to an exponential increase in the use of internet by people of different cultures and educational background. Differentiating hate speech and offensive language is a key challenge in automatic detection of toxic text content. In this paper, we propose an approach to automatically classify tweets on Twitter into three classes: hateful, offensive and clean. Using Twitter dataset, we perform experiments considering n-grams as features and passing their term frequency-inverse document frequency (TFIDF) values to multiple machine learning models. We perform comparative analysis of the models considering several values of n in n-grams and TFIDF normalization methods. After tuning the model giving the best results, we achieve 95.6% accuracy upon evaluating it on test data. We also create a module which serves as an intermediate between user and Twitter.
In the past 10 years, we have seen an exponential growth in the number of people using online forums and social networks. Every 60 seconds, there are 510,000 comments generated on Facebook  and around 350,000 tweets generated on Twitter . The people interacting on these forums or social networks come from different cultures and educational backgrounds. At times, difference in opinions lead to verbal assaults. Moreover, unchecked freedom of speech over the web and the mask of anonymity that the internet provides incites people to use racists slurs or derogatory terms. This can lower the self-esteem of people, leading to mental illness and a negative impact on the society as a whole. Furthermore, toxic language can take various forms, such as cyberbullying, which was one of the major reasons behind suicide . This issue has shown to be increasingly important in the last decade and detecting or removing such content manually from the web is a tedious task. So there is a need of devising an automated model that is able to detect such toxic content on the web.
In order to tackle this issue, firstly we must be able to define toxic language. We broadly divide toxic language into two categories: hate speech and offensive language. Similar approach was used in the studies  and . According to Wikipedia, hate speech is defined as “any speech that attacks a person or group on the basis of attributes such as race, religion, ethnic origin, national origin, gender, disability, sexual orientation, or gender identity.” We define offensive language as the text which uses abusive slurs or derogatory terms.
In this paper, we propose an approach to devise a machine learning model which can differentiate between these two aspects of toxic language. We choose to detect hate speech and offensive text on Twitter platform. By using publicly available Twitter datasets we train our classifier model using n-gram and term frequency-inverse document frequency (TFIDF) as features and evaluate it for metric scores. We perform comparative analysis of the results obtained using Logistic Regression, Naive Bayes and Support Vector Machines as classifier models. Our results show that Logistic Regression performs better among the three models for n-gram and TFIDF features after tuning the hyperparameters. We also make use of Twitter Application Programming Interface (API) to fetch public user tweets from Twitter for detecting tweets containing hate speech or offensive language. Additionally, we create a module which serves as an intermediate between the user and Twitter.
Ii Related Work
Various machine learning approaches have been made in order to tackle the problem of toxic language. Majority of the approaches deal with feature extraction from the text. Lexical features such as dictionaries  and bag-of-words  were used in some studies. It was observed that these features fail to understand the context of the sentences. N-gram based approaches were also used which shows comparatively better results .
Although lexical features perform well in detecting offensive entities, without considering the syntactical structure of the whole sentence, they fail to distinguish sentences’ offensiveness which contain same words but in different orders . In the same study, the natural language process parser, proposed by Stanford Natural Language Processing Group, was used to capture the grammatical dependencies within a sentence.
Linguistic features such as parts-of-speech has also been used in hate speech detection problem, as shown in ; these approaches consist in detecting the category of the word, for instance, personal pronoun (PRP), Verb non-3rd person singular present form (VBP), Adjectives (JJ), Determiners (DT), Verb base forms (VB).
There have been several studies on sentiment-based methods to detect abusive language published in the last few years. One example is the work  which applies sentiment analysis to detect bullying in tweets and use Latent Dirichlet Allocation (LDA) topic models  to identify relevant topics in these texts. Also studies have been conducted for Detection of harassment on Web 2.0 
More recently, distributed word representations, also referred to as word embeddings, have been proposed for a similar purposes . Deep learning techniques are recently being used in text classification and sentiment analysis using paragraph2vec approach . Convolutional Neural Network (CNN) based classification, which refers to the generation of a CNN for text classification, is being used as seen in , where they experimented with a system for Twitter hate-speech text classification based on a deep-learning, CNN model.
Iii Proposed Approach
The review on the related work done in this field shows that the models trained after extracting N-gram features from text give better results . Also, the TFIDF approach on the bag-of-words features also show promising results . Based on the review of features and the prominent classifiers used for text classification in the past work, we decided to extract n-grams from the text and weight them according to their TFIDF values. We feed these features to a machine learning algorithm to perform classification. Given the set of tweets, the aim of this work is to classify them into three categories: hateful, offensive and clean.
The dataset that we have generated is a combination of three different datasets. The first dataset is publicly available on Crowdflower111https://data.world/crowdflower/hate-speech-identification, which was used in  and . This dataset contains tweets that have been manually classified into one of the following classes: “Hateful”, “Offensive” and “Clean”. The second dataset is also publicly available on Crowdflower222https://data.world/ml-research/automated-hate-speech-detection-data, which consists the tweets with same classes as described previously. The third dataset is published on Github333https://github.com/ZeerakW/hatespeech and used in the work  and . It consists of two columns: tweet-ID and class. In this dataset, tweets corresponding to the tweet-ID are classified into one of the following three classes: “Sexism”, “Racism” and “Neither”.
Iii-B Data Preprocessing
In the data preprocessing stage, we combine the three datasets used for this work. The tasks involves removal of unnecessary columns from the datasets and enumerating the classes. For the third dataset, we retrieve the tweets corresponding to the tweet-ID present in the dataset. We use Twitter API for this purpose. The classes “Sexism” and “Racism” in this dataset are both considered as hate speech according to the definition.
We convert the tweets to lowercase and remove the following unnecessary contents from the tweets:
We use the Porter Stemmer algorithm to reduce the inflectional forms of the words.
After combining the dataset in proper format, we randomly shuffle and split the dataset into two parts: train dataset containing 70% of the samples and test dataset containing 30% of the samples.
Iii-C Feature Extraction
We extract the n-gram features from the tweets and weight them according to their TFIDF values. The goal of using TFIDF is to reduce the effect of less informative tokens that appear very frequently in the data corpus. Experiments are performed on values of ranging from one to three. Thus, we consider unigram, bigram and trigram features. The formula that is used to compute the TFIDF of term present in document is:
Also, both L1 and L2 (Euclidean) normalization of TFIDF is considered while performing experiments. L1 normalization is defined as:
where in the total number of documents. Similarly, L2 normalization is defined as:
We feed these features to machine learning models.
We consider three prominent machine learning algorithms used for text classification: Logistic Regression, Naive Bayes and Support Vector Machines. We train each model on training dataset by performing grid search for all the combinations of feature parameters and perform 10-fold cross-validation. The performance of each algorithm is analyzed based on the average score of the cross-validation for each combination of feature parameters. The performance of these three algorithms is compared.
Further, the hyperparameters of two algorithms giving best results are tuned for their respective feature parameters, which gives the best result. Again, 10-fold cross validation is performed to measure the results for each combination of hyperparameters for that model. The model giving the highest cross-validation accuracy is evaluated against the test data. We have used scikit-learn in Python for the purpose of implementation.
|N-gram Range + TFIDF Norm||Accuracy|
|(1,1) + L1||0.842||0.816||0.802|
|(1,2) + L1||0.878||0.801||0.823|
|(1,3) + L1||0.890||0.794||0.841|
|(1,1) + L2||0.862||0.878||0.862|
|(1,2) + L2||0.913||0.901||0.884|
|(1,3) + L2||0.926||0.918||0.901|
Fig. 1 shows that all the three algorithms perform significantly better for the L2 normalization of TFIDF. However, SVM performs poorly as compared to Naive Bayes and Logistic Regression for L2 normalization. TABLE I shows that the best result for Naive Bayes, 92.6%, is obtained using n-gram range up to three and TFIDF normalization L2. Similarly, Logistic Regression performs better for the same set of feature parameters achieving 91.3% accuracy. Since both of these values are comparable, we tune both Naive Bayes and Logistic Regression, for the n-gram range up to three and TFIDF normalization L2.
TABLE II shows the results after tuning the Naive Bayes algorithm. We have considered the smoothing prior for tuning. considers the features which are not present in the training set and in turn prevents zero probabilities. Technically, is called Laplace smoothing and is called Lidstone smoothing. Naive Bayes performs better for the value 0.1 giving 93.4% accuracy.
TABLE III shows the performance after tuning the Logistic Regression algorithm. Here, we have considered the regularization parameter and the optimization algorithms (solvers) – liblinear, newton-cg and saga – for performance tuning. The model with settings and solver liblinear gives the best accuracy 95.1%.
|Regularization + Solver||Accuracy|
|10 + liblinear||0.949|
|10 + newton-cg||0.948|
|10 + saga||0.948|
|100 + liblinear||0.951|
|100 + newton-cg||0.950|
|100 + saga||0.950|
Comparing the best accuracy for Naive Bayes and Logistic Regression, we conclude that Logistic Regression performs better. Therefore, we evaluate Logistic Regression on test data with the settings: n-gram range 1-3, TFIDF normalization L2, and optimization algorithm liblinear. The classification scores are shown in TABLE IV.
It is observed that the recall for offensive text is relatively low, 0.93. This means that 7% of the tweets that are actually offensive have been misclassified by the model. Also, the precision for the hateful class is 0.94, which signifies that 6% of the tweets that are either clean or offensive have been classified as hateful. On the other hand, the recall for clean class is 0.98, which is significantly better.
In addition to the classification scores, we also computed the confusion matrix for the test results which is shown in TABLE V. The key point to notice here is that 4.8% of the tweets that are offensive have been classified as hateful. Improvements can be done in this area to further increase the scores of the model. The final testing accuracy of the model is obtained to be 95.6%.
V Interfacing with Twitter
Our final model is configured to interface with Twitter through the use of Twitter API particularly to collect data tweets via Twitter REST API. In python, the library Tweepy helps add this functionality with simplicity. Twitter APIs, besides basic information such as the tweet text and the author of the tweet, returns data structure contains additional information which can be used to provide further analysis. For each maximum 140 character tweet, API returns a JSON document containing several items of metadata presented as key and value pairs, out of which id and text are most important for the sake of this study.
We also create an application which acts as a module between the user and Twitter. The architecture of the application is shown in Fig. 2. Through our module, we are able to filter out hateful and offensive tweets being posted by an individual as well as classify the tweets posted on the user home timeline, with the only limitation being twitter read request rate limiter of 15 minutes.
In this paper, we proposed a solution to the detection of hate speech and offensive language on Twitter through machine learning using n-gram features weighted with TFIDF values. We performed comparative analysis of Logistic Regression, Naive Bayes and Support Vector Machines on various sets of feature values and model hyperparameters. The results showed that Logistic Regression performs better with the optimal n-gram range 1 to 3 for the L2 normalization of TFIDF. Upon evaluating the model on test data, we achieved 95.6% accuracy. It was seen that 4.8% of the offensive tweets were misclassified as hateful. This problem can be solved by obtaining more examples of offensive language which does not contain hateful words. The results can be further improved by increasing the recall for the offensive class and precision for the hateful class. Also, it was seen that the model does not account for negative words present in a sentence. Improvements can be done in this area by incorporating linguistic features.
-  Zephoria.com, 2018. [Online]. Available: https://zephoria.com/top-15-valuable-facebook-statistics/. [Accessed: 22- Jun- 2018].
-  “Twitter Usage Statistics - Internet Live Stats”, Internetlivestats.com, 2018. [Online]. Available: http://www.internetlivestats.com/twitter-statistics/. [Accessed: 22- Jun- 2018].
-  S. Hinduja and J. Patchin, “Bullying, Cyberbullying, and Suicide”, Archives of Suicide Research, vol. 14, no. 3, pp. 206-221, 2010.
-  H. Watanabe, M. Bouazizi and T. Ohtsuki, “Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection”, IEEE Access, vol. 6, pp. 13825-13835, 2018.
-  T. Davidson, D. Warmsley, M. Macy and I. Weber, “Automated Hate Speech Detection and the Problem of Offensive Language”, in International AAAI Conference on Web and Social Media, 2017.
-  S. Liu and T. Forss, “New classification models for detecting Hate and Violence web content,” 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), Lisbon, 2015, pp. 487-495.
-  P. Burnap and M. Williams, “Us and them: identifying cyber hate on Twitter across multiple protected characteristics”, EPJ Data Science, vol. 5, no. 1, 2016.
-  C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad and Y. Chang, “Abusive Language Detection in Online User Content”, Proceedings of the 25th International Conference on World Wide Web - WWW ’16, 2016.
-  E. Greevy and A. Smeaton, “Classifying racist texts using a support vector machine”, Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR ’04, 2004.
-  Y. Chen, Y. Zhou, S. Zhu and H. Xu, “Detecting Offensive Language in Social Media to Protect Adolescent Online Safety”, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, 2012.
-  D. Blei, A. Ng, M. Jordan and J. Lafferty, “Latent dirichlet allocation”, Journal of Machine Learning Research, vol. 3, p. 2003, 2003.
-  D. Yin, Z. Xue, L. Hong and B. Davison, “Detection of harassment on Web 2.0,” in the Content Analysis in the Web 2.0 Workshop, 2009.
-  T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space”, CoRR, vol. abs/1301.3781, 2013
-  S. Yuan, X. Wu, and Y. Xiang, “A two phase deep learning model for identifying discrimination from tweets”, in International Conference on Extending Database Technology, 2016, pp. 696-697.
-  B. GambÃ¤ck and U. Sikdar, “Using Convolutional Neural Networks to Classify Hate-Speech”, 2017
-  Z. Waseem and D. Hovy, “Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter”, Proceedings of the NAACL Student Research Workshop, 2016.