KnowBias: Detecting Political Polarity in Long Text Content

KnowBias: Detecting Political Polarity in Long Text Content

Aditya Saligrama
MIT PRIMES/Weston High School
77 Vassar Street
Cambridge, MA 02139

We introduce a classification scheme for detecting political bias in long text content such as newspaper opinion articles. Obtaining long text data and annotations at sufficient scale for training is difficult, but it is relatively easy to extract political polarity from tweets through their authorship; as such, we train on tweets and perform inference on articles. Universal sentence encoders and other existing methods that aim to address this domain-adaptation scenario deliver inaccurate and inconsistent predictions on articles, which we show is due to a difference in opinion concentration between tweets and articles. We propose a two-step classification scheme that utilizes a neutral detector trained on tweets to remove neutral sentences from articles in order to align opinion concentration and therefore improve accuracy on that domain. Our implementation is available for public use at



Rising bias in news media, along with the formation of filter bubbles on social media, where content with the same political slant is repeatedly shared, have contributed to severe partisanship in the American political environment in recent years [6, 3]. We aim to increase awareness of this heightened polarization by alerting users to the political bias in the content they consume.

In this work, we discuss an NLP-based approach that predicts political bias on long text such as news articles independent of metadata such as content origin or authorship. Annotating polarity on long documents at sufficient scale for training is infeasible since doing so requires that humans read each article and manually determine polarity. On the other hand, tweets can be easily gathered in high volume and can be annotated based on authorship.

We envision an approach where we transfer knowledge from tweets to long text at test time. While previous work has attempted to analyze tweets for political sentiment [2], there is no research on domain adaptation from short to long documents in this context. There has been research on filtering text for the purposes of deriving justifiable predictions [4], but not for domain adaptation for our target problem. Universal sentence encoders [1] provide good text representations regardless of target task and we would expect training a classifier on these to provide good performance on all text; however, this approach delivers inaccurate and inconsistent predictions.

We show that this poor performance is due to the existence of neutral, apolitical sentences in articles that dilute opinion concentration compared to tweets. Our proposed method alleviates this issue by using a neutral detector trained on tweets to remove neutral sentences before predicting bias, improving prediction accuracy and consistency. Our work summarizes \citeauthor2019arXiv190500724S (\citeyear2019arXiv190500724S).

Predicting Polarity in Text Content

Figure 1: Proposed two-step classification scheme that tokenizes sentences in long documents and uses a neutral detector to filter out neutral sentences. Subsequently, it fuses remaining sentences to make a final prediction via a polarity classifier. Red sentences are polarized; black bold sentences are removed by the neutral detector.

Data collection We train on political tweets due to the aforementioned ease in collecting and annotating them at scale and aim to transfer this knowledge to longer articles. Our polarity data consisted of roughly 150,000 tweets from 28 Twitter verified politicians or media personalities across the political spectrum. 80% of these samples were used for training and 20% were used as testing. We also sampled a set of roughly 80,000 neutral tweets from the Twitter general stream in order to train the neutral detector.

Baseline approach We use a sentence embedding suite to convert tweets to machine-readable, high dimensional vectors that preserve semantic meaning in vector space. We specifically use the Google Universal Sentence Encoder [1] as it offers good semantic representation regardless of the target task. We trained a deep neural network with two hidden layers on these sentence embeddings. While an 83% test accuracy was achieved on the Twitter test set, we noticed poor performance on long-form articles, with inconsistent and inaccurate predictions.

Opinion concentration We note that a primary stylistic difference between tweets and long-form articles is the existence of neutral and apolitical sentences in the latter medium. These sentences help article flow and cohesion, but also dilute the concentration of opinion compared to tweets. We hypothesize that this difference in opinion concentration is responsible for poor performance on long-form articles. We test this hypothesis by obtaining a set of neutral, apolitical sentences from the Twitter general stream and then augmenting them into the political test data. As demonstrated in Figure 2, accuracy decreases noticeably with the addition of augmented neutral sentences.

Task One-Step Two-Step
Twitter Political - Acc. 82.27% 82.42%
Twitter Crowdsourced - Acc. 86.00% 86.00%
Twitter Crowdsourced - 0.65 0.65
Articles Crowdsourced - Acc. 66.67% 75.00%
Articles Crowdsourced - 0.52 0.69
Table 1: In bold are the experiments on long articles. Knowledge is transferred from learning on tweets at test time. All classifiers were DNN models with two hidden layers.

Neutral detector After identifying the dilution of opinion concentration as responsible for accuracy degradation on long-form articles, we propose the addition of a classifier to detect and remove neutral sentences. We train a second deep neural network on the sentence embeddings of 80,000 tweets sampled from the general Twitter stream as well as the political samples, obtaining a high 95.63% accuracy.

Two-step classification scheme We propose a two-step classification scheme in order to improve prediction quality on long-form articles as demonstrated in Figure 1. On any data passed to the system for inference, we first tokenize it into individual sentences. On each of these sentences, we use the neutral detector to mark and remove all neutral sentences. We then fuse the remaining sentences back together, aligning opinion concentration to that of tweets, and then use the main baseline classifier to predict polarity.

Figure 2: Degradation of accuracy after neutral sentence augmentation with One-Step vs. Two-Step classification approaches. The Two-Step method degrades gracefully relative to One-Step method as a result of removal of augmented sentences by neutral detector.


Datasets We tested our approach on a number of datasets. The first, Twitter Political, is a simple 20% split of the obtained political tweet data consisting of 20,000 samples that were labeled on authorship. We also selected a separate set of 50 tweets and 24 articles for which we collected crowdsourced annotations from 79 respondents.

Accuracy On the long-form articles, accuracy significantly increased to 75% from 66.7% due to the two-step method. However, we did not expect our accuracy to substantially change on the Twitter datasets with the two-step method as the opinion concentration remains the same due to the lack of neutral sentences. Indeed, this is true, with no significant improvement in accuracy.

Spearman-Rho To verify prediction consistency, we computed the Spearman-rho rank correlation [5] against crowd opinions. Table 1 shows that the proposed system ( is far more consistent in assigning predictions with respect to crowdsourced predictions on articles than the baseline one-step method ().

Conclusions & Future Work

We introduced a two-step classification method to detect polarity in text content without the use of metadata or user details. By evening opinion concentration using a neutral detector to remove apolitical sentences, our method performs well on both tweets and long-form articles. Future work may involve exploring the problem of time shift, where positions on new issues are not accurately represented by predictions if the training data is too stale. This reinforces the need for continuous model updates.


  • [1] D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. St. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y. Sung, B. Strope, and R. Kurzweil (2018-03) Universal Sentence Encoder. arXiv e-prints, pp. arXiv:1803.11175. External Links: 1803.11175 Cited by: Introduction, Predicting Polarity in Text Content.
  • [2] D. Demszky, N. Garg, R. Voigt, J. Zou, M. Gentzkow, J. Shapiro, and D. Jurafsky (2019-04) Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings. arXiv e-prints, pp. arXiv:1904.01596. External Links: 1904.01596 Cited by: Introduction.
  • [3] J. Kelly and C. François (2018) This is what filter bubbles actually look like. MIT Technology Review. Cited by: Introduction.
  • [4] T. Lei, R. Barzilay, and T. Jaakkola (2016) Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 107–117. Cited by: Introduction.
  • [5] J. H. McDonald (2015) Spearman rank correlation. Handbook of Biological Statistics. Cited by: Experiments.
  • [6] R. D. Renka (2010-03) Political bias in the news media. Southeast Missouri State University. External Links: Link Cited by: Introduction.

Acknowledgement The author thanks Prof. Kai-Wei Chang (UCLA) for comments and suggestions in writing this paper.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description