A neural network to classify metaphorical violence on cable news

A neural network to classify metaphorical violence on cable news


I present here an experimental system for identifying and annotating metaphor in corpora. It is designed to plug in to Metacorps, an experimental web app for annotating metaphor. As Metacorps users annotate metaphors, the system will use user annotations as training data. When the system is confident, it will suggest an identification and an annotation. Once approved by the user, this becomes more training data. This naturally allows for transfer learning, where the system can, with some known degree of reliability, classify one class of metaphor after only being trained on another class of metaphor. For example, in our metaphorical violence project, metaphors may be classified by the network they were observed on, the grammatical subject or object of the violence metaphor, or the violent word used (hit, attack, beat, etc.).


1 Introduction

Metaphor is often thought of as a decorative instrument of literature. It is more scientifically productive to understand metaphor a basic human cognitive capacity used to represent relationships between concepts. Mathematics, for example, may be understood as an elaborate metaphorical scaffolding where “embodied” concepts are everywhere, such as the fact that multiplying by rotates a point in the complex plane by radians. Rotation is something we do every time we drive a car, and in many other situations, so in that way is physically intuitive because we have been rotating different physical objects our entire lives [1, 2, 3]. In this paper we present a system to identify metaphorical violence (MV) on cable news surrounding presidential elections1. Just as our intuitions about physical transformations of physical objects help us reason about abstract mathematics, our intuitions about violence serve as heuristics for rather more complex political events (for a good discussion of the importance of heuristics to cognition, see [4]). Specifically, we present a neural network classifier for determining whether or not a phrase is MV or not. This effort is an important part of a larger effort to improve the throughput of an existing metaphor annotation software application we are developing called Metacorps2. Knowing what metaphors are said and when in a society is important because metaphors are representations of a society’s conceptual relationships. Large-scale observations of metaphor use are currently limited because it is a time-intensive task for humans to do. We demonstrate that even with a modest gold-standard dataset, a neural network system can distinguish, to around 85% accuracy, MV from non-MV. After introducing the motivation, we present the data, our methods, and analyze the performance of some candidate neural network classifiers. We close with a discussion of the promise and challenges of integrating this system with teams of human annotaters.

Evidence is growing that choice of metaphor, or whether or not to use metaphor, has real consequences on cognition and behavior in general [5] and for politics [6]. A recent behavioral experiment showed more specifically that when trait-aggressive people, who are more aggressive independent of context, are exposed to metaphorical violence in political speeches, their support for real violence against politicians increases. Metaphor use is revealing: spoken metaphors are either representative of the speaker’s conceptual system, or they are intended to cause the hearer to activate the metaphor’s conceptual links, or both. So, if you want people to take climate change more seriously, you would do better to frame it as a “war” instead of a “race” [7]. While the previous claim is supported by data, context changes the production and effects of metaphor use. If we can use machine learning to identify and annotate metaphor, we will understand much better than we do now just when and why one metaphor is chosen over another.

One shortcoming in behavioral studies of metaphor is that cultural context is not part of the experiment—in other words, the experiment lack an important element of ecological validity. Linguists are increasingly looking to cultural context as an important factor for explaining metaphor use [8]. Another criticism says we should not be asking “if” choice of metaphor influences reasoning, but “when” does a particular metaphor have a causal effect on reasoning [9]? The contribution presented here helps solve this contextual shortcoming, in that our data is timeseries data, so current events such as the presidential debates are happening in the cultural background. Further context is ideological context, with cable news channel as proxy for ideology [10, 11]. More generally, the context is English-speaking American cable television news. Our analysis enables us to observe how metaphorical violence varies between cable news networks.

An example metaphorical phrase might be the headline “Bernie attacks Clinton Foundation in first debate.” On the contrary, “Terrorist attack kills US Ambassador to Libya” is clearly not a metaphor. Metaphor serves a pragmatic purpose: it is a much shorter, if incomplete, version of what might be the non-metaphorical way to explain that actually “Bernie claimed that the Clinton Foundation had wrongly accepted funds.” In a concurrent project, we have developed and fit a simple statistical model to the frequency of metaphorical violence (MV) use, and found that MV usage increases in frequency for all networks, but with different timing in 2012 and 2016 [12].

While we are able to get novel and interesting results with human-generated annotations, machine learning could help us increase our signal power. For the study just mentioned, in 2012 and 2016 we had to limit our corpus to include only the top two most-watched shows on each of the three networks. As mentioned, we limited ourselves there to three violent words. This is because manual annotation is a time-consuming process. Often, relevant parts of the episode must be watched to finish an annotation, which currently requires navigation of the TVNA website. The first use of Metacorps was to increase the efficacy of human coders by streamlining the annotation process. But by building our own annotation web application and data model, we can incorporate a new service whose prototype is introduced in this report: a neural network that classifies potential metaphorical violence as metaphor or not. The next section details the machine learning task, describes the gold standard dataset and the training/validation/test datasets.

2 Data

Metacorps provides a data model and web interface for annotating metaphor in a corpus. In its present state, all corpora are pre-processed subcorpora of the closed captions hosted and curated by the TV News Archive. The TV News archive http://archive.org/tv/details provides video, audio, closed captioning, and rich metadata for millions of hours of television news from cable news channels, studied here, and local news. They provide an HTTP API that provides JSON search results and episode metadata. To be specific, episode refers to a single showing of a particular show, like The O’Reilly Factor. Reruns are independent episodes and must be excluded. A show’s metadata includes links to its data, which may be in the form of video, audio, or closed captions. This study uses the closed captions. To programatically acquire data and build transcripts from closed captionings we used Python software iatv available on GitHub (http://github.com/mtpain/iatv).

Gold standard annotations were created using the Metacorps Python software package and web application. Trained annotators use the web application to indicate which phrases of potential metaphor actually are metaphor, then fill in more information, such as who the subject and object are of the metaphorical violence. Potential instances of metaphorical violence are found by searching for key violent words; in this study those words are hit, beat, and attack. The corpus we searched was of transcripts from the top two shows on each of the three cable networks MSNBC, CNN, and Fox News from the months September to November, 2012.

To build the corpus, we used iatv to download the transcripts from the desired shows in the desired timeframe. Metacorps provides a data model built on MongoDB and its object-document mapper mongoengine for Python. Transcripts and other metadata are persisted. Other elements in the data model provide fields to annotate these base transcripts, which are linked. Human annotaters have suggested and agreed upon whether or not a phrase is metaphorical violence. See Figure 1 for a sketch of the Metacorps data flow and where our new classifier fits in. Metacorps also provides a data exporter that created the gold standard dataset viomet-2012.csv used to train, validate, and test our model. This dataset has 2538 rows, 791 of which were classified as metaphor, a prevalence of about 31%. This dataset first was split into 80% pre-training and 20% test datasets. The test dataset was left alone, but the pre-training set was split once more: 80% of the pre-training rows became training and the rest validation rows. To create a balanced training set, metaphorical rows were resampled with replacement and added to the training set until there were an equal number of metaphor and non-metaphor rows.

Figure 1: Schematic of Metacorps: the annotation tool-set that connects cable news transcripts to annotators, annotations to meaningful data tables, and soon annotations and analyses to our neural network classifier.

3 Methods

To accomplish this MV classification task, we experimented with deep feed forward neural networks, encouraged by success on a similar task reported in [13]. Each hidden layer had 500 nodes, and we tested 1-, 2-, 4-, and 6-hidden-layer models. To regularize we use dropout and early stopping. Stochastic gradient descent with momentum minimized the cross entropy loss. The inputs were vectors with 3300 elements: eleven words represented by their word2vec embeddings, which come from the Google News pre-trained word2vec (download). Each word embedding vector is concatenated with the next to create the network input vector. We used the free and open source Python package gensim for loading the binary-formatted word2vec [14, 15] word embeddings [16]. Neural network construction, training, and testing was done primarily in TensorFlow [17, 18]. Performance analysis was done using Scikit-Learn [19], and data table reading, writing, and subsetting was done with Pandas [20]. More details can be found by examining the README and code for this project on GitHub.

To investigate the performance of the different hyperparameterizations over the number of layers and the learning rate, we executed twenty trials for every combination of the number of layers (1, 2, or 4) and learning rate (0.01 or 0.1). To do this we utilized the MERCED computing cluster hosted at UC Merced. We then calculated the average precision, sensitivity, specificity, and AUC for each of these; the results are described in the next section and summarized in Table 1.

4 Results

Looking to Table 1, we see promising results for a first attempt at using a neural network for classification with our cable news corpus. The maximum AUC reached was 0.92, with a specificity of .926, for the four-layer neural network using a learning rate of 0.01. While specificity is rather good here, the precision and sensitivity are relatively lower. This is saying it is easier for the system to judge a true negative than a true positive. This is what we might expect given that we have more negative examples than positive examples.

sensitivity specificity precision auc
N layers learning rate
1 0.01 0.796 0.890 0.768 0.918
0.10 0.632 0.918 0.780 0.886
2 0.01 0.764 0.894 0.766 0.914
0.10 0.758 0.886 0.740 0.910
4 0.01 0.702 0.926 0.816 0.920
0.10 0.618 0.902 0.720 0.882
Table 1: The best performing model on three of the four measures we use here is four layers with a learning rate of 0.01, shown in bold. Only its sensitivity is outperformed by the single-layer trained with a 0.01 learning rate. A learning rate of 0.5 and six layers were also tried, but these often failed to converge, so they were excluded from this analysis. Average of five trials.

5 Discussion

At this point, we need to know much more than we do about what the system is doing. An immediate result we should obtain is model performance for subsets of the test set taken by violent word, and by network, e.g. answer the question “How well does the system predict metaphorical violence when the word attack is used compared to hit or beat?” What about for individual cable news networks? Can the system more accurately classify MV on MSNBC than Fox News? Clearly more work needs to be done to select optimal hyperparameters, as we have only shown two. We also need more than five trials to understand model performance.

It would be interesting to try different architectures. Long-short term memory models are often used for natural language processing applications, as in neural machine translation [21, 22]. However, the structure of the inputs is really a 2D structure, like an image. So convolutional neural networks may detect correspondences between co-occurring features of words that other network architectures would not. An architecture for detecting whether a pair of words are used metaphorically, called “supervised similarity networks,” was introduced in [23]. While pair detection like this is not directly applicable for our task, the results are promising, and this approach should be investigated more.

Also in [23], the authors explore using “cognitive” embeddings. There are many embeddings availalbe, so an interesting and important empirical study would be to test how well different word embeddings perform for this task. Perhaps word embeddings built from the corpus itself would outperform word embeddings from the Google News word2vec model, if we could produce quality embeddings—not necessarily an easy task.

While there are still many details to fill in, deep neural networks using TensorFlow and the Google News word embeddings seem to provide the capability to solve a real-world problem: that of discerning metaphorical statements from non-metaphorical statements. It is a demonstration of the power of word embeddings, and the representational capabilities of neural networks. While some put great effort into building detection systems based on a priori theories of how metaphor works [24], neural networks offer us the possibility not just of automated detection, but perhaps can even guide cognitive scientists towards a deeper understanding of how metaphorical meaning-making works in the brain. If nothing else, this system can already improve the throughput of metaphor annotation systems by suggesting a label that will be correct with high likelihood. At first human annotators will have to deal with correcting mistakes, but if many mouse clicks can be saved just the same, that will be a help.


  1. Code publically available: https://github.com/mt-digital/metacorps-nn
  2. https://github.com/mt-digital/metacorps


  1. Rafael E . Núñez, Laurie D . Edwards, and João Filipe Matos. Embodied Cognition as Grounding for Situatedness and Context in Mathematics Education. In Educational Studies in Mathematics, volume 39, pages 45–65. Kluwer Academic Publishers, Netherlands, 1999.
  2. Rafael E. Núñez. Mathematical Idea Analysis: What Embodied Cognitive Science Can Say about the Human Nature of Mathematics. In Conference of the International Group for the Psychology of Mathematics Education, Hiroshima, Japan, 2000.
  3. Martha W. Alibali, Mitchell J. Nathan, Matthew S. Wolfgram, R. Breckinridge Church, Steven a. Jacobs, Chelsea Johnson Martinez, and Eric J. Knuth. How Teachers Link Ideas in Mathematics Instruction Using Speech and Gesture: A Corpus Analysis. Cognition and Instruction, 32(1):65–100, 2014.
  4. Gerd Gigerenzer and Henry Brighton. Homo Heuristicus: Why Biased Minds Make Better Inferences. Topics in Cognitive Science, 1(1):107–143, 2009.
  5. George Lakoff. Mapping the brain’s metaphor circuitry: metaphorical thought in everyday reason. Frontiers in human neuroscience, 8(December):958, 2014.
  6. Teenie Matlock. Framing Political Messages with Grammar and Metaphor. American Scientist, 100:478–483, 2012.
  7. Stephen J Flusberg, Teenie Matlock, and Paul H Thibodeau. Metaphors for the War (or Race) against Climate Change. Environmental Communication, 0(0):1–15, 2017.
  8. Zoltán Kövecses. Metaphor, language and culture. Delta, 26:739–757, 2010.
  9. Gerard J. Steen, W. Gudrun Reijnierse, and Christian Burgers. When do natural language metaphors influence reasoning? A follow-up study to Thibodeau and Boroditsky (2013). PLoS ONE, 9(12):1–25, 2014.
  10. Pew Research Center. Political Polarization and Media Habits. (October), 2014.
  11. Gary King, Benjamin Schneer, and Ariel White. How the news media activate public expression and influence national agendas. Science, 358(November):776–780, 2017.
  12. Matthew A Turner, Paul P Maglio, and Teenie Matlock. Metaphorical Violence in Political Discourse https://osf.io/preprints/socarxiv/t8yg9/. socArXiV, 2018.
  13. Erik-Lân Do Dinh and Iryna Gurevych. Token-Level Metaphor Detection using Neural Networks. Proceedings of the Fourth Workshop on Metaphor in NLP, (June):28–33, 2016.
  14. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Neural Information Processing Systems, pages 1–9, 2013.
  15. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. Technical report.
  16. Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, 2010. ELRA.
  17. GoogleResearch. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015.
  18. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: A system for large-scale machine learning. 2016.
  19. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(Oct):2825–2830, 2011.
  20. Wes McKinney. Python for Data Analysis. O’Reilly, Sebastopol, CA, 2013.
  21. Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems 28 (NIPS 2015), 2015.
  22. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to Sequence Learning with Neural Networks. Nips, page 9, 2014.
  23. Marek Rei, Luana Bulat, Douwe Kiela, and Ekaterina Shutova. Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection. In EMNLP 2017, 2017.
  24. Ellen Dodge, Jisup Hong, and Elise Stickles. MetaNet: Deep semantic automatic metaphor analysis. (1991):40–49, 2015.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description